Python training UGA 2017

A training to acquire strong basis in Python to use it efficiently

Pierre Augier (LEGI), Cyrille Bonamy (LEGI), Eric Maldonado (Irstea), Franck Thollard (ISTerre), Christophe Picard (LJK), Loïc Huder (ISTerre)

Practical session 1

File parsing and dictionary usage

Goal

The goal of this session is to end up with a script that computes some simple statistics from Meteo open data files. The file was modified and reduced for this exercice (just 1 station with data in just one year : 2016). In future, you can download other data here : https://public.opendatasoft.com/explore/dataset/donnees-synop-essentielles-omm/export/

Material

The file contains lines of the form:

ID OMM station,Date,Average wind 10 mn,Temperature,Humidity,Rainfall 3 last hours,Station

7761,2016-01-01T01:00:00+01:00,2.0,283.75,94,0.2,AJACCIO 7761,2016-01-01T04:00:00+01:00,2.2,283.95,91,0.2,AJACCIO 7761,2016-01-01T07:00:00+01:00,1.7,284.05,88,0.2,AJACCIO 7761,2016-01-01T10:00:00+01:00,1.6,287.05,75,0.2,AJACCIO 7761,2016-01-01T13:00:00+01:00,3.1,289.55,73,0.0,AJACCIO

This is a classic csv file with separated data by "," The first line is the header.

Information to extract

We want to compute, some statistics for this station

Warning

The temperature measurement is in kelvin (273,15 K $\leftrightarrow$ 0 °C)

Step 1: load data

Write a script with a function load_data() that

  • open the file
  • load data in one of the following structures (more details below):
    • 1.1 Single dictionnary
    • 1.2 Multiple structures
    • 1.3 Class instance (object-oriented)

1.1: Single dictionnary (pres07)

Load in the data in a single dictionnary of this structure:

{'Date': [wind,temperature,humidity,rainfall]}

For example

{
    '2016-01-01T01': [2.0,283.75,94,0.2],
    '2016-01-01T04': [2.2,283.95,91,0.2]
}

In this case, we can consider YYYY-MM-DDTHH as the key for the station dictionary.

Split each line and extract data.

Hint

You can use the method split from the str class.

In [1]:
s = "I am lucky"
l = s.split()
print(l)
['I', 'am', 'lucky']

1.2: Multiple structures (pres07)

Load the data in multiple dictionnaries or lists (one per field).

Example for dictionnaries:

You can use the following structure

wind = {'Date1': wind_value1, 'Date2: wind_value2, ...}
temperature = {'Date1': temperature1, 'Date2: temperature2, ...}
...

Example for lists:

You can use the following structure

dates = ['Date1', 'Date2', ...]
wind = [wind_value1, wind_value2, ...]
temperature = [temperature1, temperature2, ...]
...

1.3: Class instance (pres08)

Load the data in an instance of a class WeatherStation that you will define yourself. load_data() can therefore be a method of this class.

Hint :

This is very similar as 1.2. The only difference is that the structures storing the data are attributes of a class.

Step 2: Compute max temperature and average temperature for the station

Write 2 functions get_max_temperature() and get_average_temperature() that:

  • return a float

Step 3: Compute sum of the rainfall for one station

Write 1 function get_sum_rainfall() that sum the rainfall.

  • return a float

Be careful, some measurement have no rainfall data.

Step 4: Search max period without rainfall

Write 1 function period_without_rainfall()

  • return the beginning date, the ending date and the number of days without rainfall

Hint

This is the syntax to return multiple values in a function:

return date_min, date_max, period_max / 8

Step 5: How many hours with humidity rate < 60

Write 1 function get_hours_humidity(rate)

  • takes 1 parameter : the humidity rate
  • returns the number of days

Final remark: Pandas

To do such data analysis, ones should not use pure Python code without external library! The library Pandas has been written to do this in few lines:

In [3]:
import pandas

df = pandas.read_csv(
    '../TP/TP1_MeteoData/data/synop-2016.csv', sep=',', header=0)

# print(df.columns)
# print(df.age)
# print(df['Temperature'])
# print(df[(df['Station']=='AJACCIO')])
# print(df[(df['Station'] == 'AJACCIO')]['Rainfall 3 last hours'].sum())

temp = df[(df['Station'] == 'AJACCIO')]['Temperature'].mean()-273.15
print(f'The average of temperature at Ajaccio is {temp:.1f} °C.')
The average of temperature at Ajaccio is 16.3 °C.
In [ ]: