Monday, October 17, 2016

Climate datasets for data analysis

For learners and practitioners of data processing projects (like database administration, database programming, etc) starting with a reliable and interesting dataset can be very useful.

This dataset must be syntactically correct, large enough to provide interesting results and if possible should be real data, to maintain the interest of the practitioner.
For the Apache Spark programming exercises, I have chosen to process climate data of some of the cities I have lived in. This data helps me to run several interesting analysis and
The data source I recommend is the National Centers for Environmental Information (NCEI) website. http://www.ncdc.noaa.gov/cdo-web/datasets 

This dataset is correctly formatted and has interesting metrics. The following notes will show how to get daily climate information for a city over a large date range.

Step 1 - Open the URL  http://www.ncdc.noaa.gov/cdo-web/datasets in your browser. You will see this page.




Step 2 - Click and expand the (+) sign at "Daily Summaries"


Step 3  - Click "Search tool"



Step 4 - Select "Daily Summaries", provide the date range, select "cities" in the "Search For" option and enter "Chennai" in the search term (you can enter your chosen city name here). Click Search.


Step 5 - In the results page, click the button "ADD TO CART" next to the city, or entity that you searched for.



Step 6 - Hover the mouse over the "Cart" icon in the top right. Click "VIEW ALL ITEMS".



Step 7 - In the next page, select the CSV option, verify the date range and click CONTINUE.


Step 8 - Next select the data columns that you need. Precipitation, temperature, etc. Click CONTINUE.


Step 9 - This is the final step. Verify the requested info and provide your email. Click SUBMIT ORDER button. You will receive an email with instructions to download the dataset.







No comments:

Post a Comment