Performing Analysis Of Meteorological Data

Sandeep S
4 min readOct 13, 2020

--

This article serves as a documentation of my first project — Performing Analysis of Meteorological Data for my internship program.

The main objective of this Data Analytics Internship is to transform the raw data into information and then convert it into knowledge. Since weather data is one of the most easily available data on the internet, it serves as a great starting point to understand fundamental data analytics concepts.

Dataset

The dataset has hourly temperature recorded for last 10 years starting from 2006–04–01 00:00:00.000 +0200 to 2016–09–09 23:00:00.000 +0200. It corresponds to Finland, a country in the Northern Europe.

Download the weather dataset from this Google drive link.

Goal

Transform the raw data into information and then convert it into knowledge. By -

  • Perform data cleaning,
  • Perform analysis for testing the given Null Hypothesis (H0) &
  • Write a descriptive blog with relevant visualizations to prove your point.

Null Hypothesis (H0)

“Has the apparent temperature & humidity compared monthly across 10 years of the data, indicate an increase due to Global warming.”

The H0 means we need to find whether the average Apparent temperature for the month of a month say April starting from 2006 to 2016 and the average humidity for the same period have increased or not. This monthly analysis has to be done for all 12 months over the 10 year period. So you are basically resampling your data from hourly to monthly, then comparing the same month over the 10 year period. Support your analysis by appropriate visualizations using matplotlib and / or seaborn library.

Implementation -

Step 1: Importing the Necessary Libraries & Data

Step 2: Data cleaning

2.1 Find all Missing values from the Dataset and fill them Accordingly

Here I’ve changed all missing values to ‘NaN’. To avoid any complications while analysis.

2.2 Change the format of data for better analysis

Converted the ‘Formatted Date’ column to standard Python datetime format for easier analysis.

Step 3: Resample data from hourly to month wise

The data in the dataset is hourly values, we resample the entire dataset to monthly values to meet our analysis requirements.

Step 4: Analysis plots of temperature & humidity over the range of years in the dataset

4.1 Variation in apparent temperature & humidity with time (in years)

4.2 Humidity month-wise with respect to time (in years)

4.3 Average apparent temperature month-wise with respect to time (in years)

Step 5: Monthly analysis for all 12 months over the 10 year period

Plots of all the months spanning over 10 years.

Conclusion -

  • No change in average humidity observable.
  • Increase in average apparent temperature can be seen in the year 2009 then again it dropped in 2010 then there was a slight increase in 2011 then a significant drop is observed in 2015 and again it increased in 2016.
  • According to Null Hypothesis (H0) both increases due to Global Warming is proven wrong here, and thus null hypothesis failed.

Thank you for reading!

--

--