NOAA’s National Climatic Data Center (NCDC) is responsible for preserving, monitoring, assessing, and providing public access to weather data.

NCDC provides access to daily data from the U.S. Climate Reference Network / U.S. Regional Climate Reference Network (USCRN/USRCRN) via anonymous ftp at:

Datasets :

Readme :

All daily data are calculated over the station’s 24-hour LST day.

Field#  Name                           Units


1    WBANNO                         XXXXX

2    LST_DATE                       YYYYMMDD

3    CRX_VN                         XXXXXX

4    LONGITUDE                      Decimal_degrees

5    LATITUDE                       Decimal_degrees

6    T_DAILY_MAX                    Celsius

7    T_DAILY_MIN                    Celsius

8    T_DAILY_MEAN                   Celsius

9    T_DAILY_AVG                    Celsius

10   P_DAILY_CALC                   mm

11   SOLARAD_DAILY                  MJ/m^2

12   SUR_TEMP_DAILY_TYPE            X

13   SUR_TEMP_DAILY_MAX             Celsius

14   SUR_TEMP_DAILY_MIN             Celsius

15   SUR_TEMP_DAILY_AVG             Celsius

16   RH_DAILY_MAX                   %

17   RH_DAILY_MIN                   %

18   RH_DAILY_AVG                   %

19   SOIL_MOISTURE_5_DAILY          m^3/m^3

20   SOIL_MOISTURE_10_DAILY         m^3/m^3

21   SOIL_MOISTURE_20_DAILY         m^3/m^3

22   SOIL_MOISTURE_50_DAILY         m^3/m^3

23   SOIL_MOISTURE_100_DAILY        m^3/m^3

24   SOIL_TEMP_5_DAILY              Celsius

25   SOIL_TEMP_10_DAILY             Celsius

26   SOIL_TEMP_20_DAILY             Celsius

27   SOIL_TEMP_50_DAILY             Celsius

28   SOIL_TEMP_100_DAILY            Celsius

After going through wordcount mapreduce guide, you now have the basic idea of how a mapreduce program works. So, let us see a complex mapreduce program on weather dataset. Here I am using one of the dataset of year 2015 of  Austin, Texas . We will do analytics on the dataset and classify whether it was a hot day or a cold day depending on the temperature recorded by NCDC.

NCDC gives us all the weather data we need for this mapreduce project.

The dataset which we will be using looks like below snapshot.

Step 1
Download the complete project using below link.

Below is the complete code

Step 2
Import the project in eclipse IDE in the same way it was told in earlier guide and change the jar paths with the jar files present in the lib directory of this project.

Step 3
When the project is not having any error, we will export it as a jar file, same as we did in wordcount mapreduce guide. Right Click on the Project file and click on Export. Select jar file.

Give the path where you want to save the file.

Select the mail file by clicking on browse.

Click on Finish to export.

Step 4
You can download the jar file directly using below link


Download Dataset used by me using below link


Step 5
Before running the mapreduce program to check what it does, see that your cluster is up and all the hadoop daemons are running.

Command: jps

Step 6
Send the weather dataset on to HDFS.

Command: hdfs dfs -put Downloads/weather_data.txt /

Command: hdfs dfs -ls /

Step 7
Run the jar file.

Command: hadoop jar temperature.jar /weather_data.txt /output_hotandcold


Step 8
Check output_hotandcold directory in HDFS.

Now inside part-r-00000 you will get your output.

Depending my dataset , only two days crossed 35 in max temperature.

The Project has been successfully executed!!

Hadoop Project on NCDC ( National Climate Data Center – NOAA ) Dataset was last modified: by

Facebook Comments

Website Comments

  1. Chirag

    Think this code will have a problem with negative, so the date 20150303 should be classified as a cold day as both the max and min temp are -9999. However post running the map phase, the -9999 becomes 9999 and hence is classified as a hot day. I am not sure if anyone else encountered this as well?

  2. gorav

    do we need to firstly create an hadoop cluster before running the mapreduce program. and how to create that sir please help me out

  3. biswajit

    #add this line in the mapper class:
    public static final int MISSING=9999;

    #append to the if statements in the mapper class
    if (temp_Max > 35.0 && temp_Max !=MISSING)
    if (temp_Min < 10 && temp_Max !=MISSING)

  4. Yang Chen

    I run the code. But I got the following exceptions:

    17/10/13 19:51:08 INFO mapreduce.Job: map 100% reduce 100%
    17/10/13 19:51:08 INFO mapreduce.Job: Job job_1507908239110_0007 failed with state FAILED due to: Task failed task_1507908239110_0007_m_000000
    Job failed as tasks failed. failedMaps:1 failedReduces:0

    The failed task has the following information:

    Error: java.lang.NumberFormatException: For input string: “4 9” at sun.misc.FloatingDecimal.readJavaFormatString( at java.lang.Float.parseFloat( at MyMaxMin$ at MyMaxMin$ at at org.apache.hadoop.mapred.MapTask.runNewMapper( at at org.apache.hadoop.mapred.YarnChild$ at Method) at at at org.apache.hadoop.mapred.YarnChild.main(

    Could you please tell me how to figure this out?

Post a comment