Python training UGA 2017

A training to acquire strong basis in Python to use it efficiently

Pierre Augier (LEGI), Cyrille Bonamy (LEGI), Eric Maldonado (Irstea), Franck Thollard (ISTerre), Christophe Picard (LJK), Loïc Huder (ISTerre)

Practical session 0

File parsing and simple computations

Goal

The goal of this session is to practice what we have seen in the first presentations:

  • write code in scripts,
  • use ipython and execute Python programs with the command python3,
  • use objects of simple types (numbers, str, list, etc.),
  • index and slice,
  • use loops and conditions,
  • try, except,
  • read and write in text files.

We will write scripts that read a file (or a set of files) with a predefined format and compute simple quantities (sum, average, number) from the values in the files.

Material

You will find a bunch of files in the directory TP/TP0_file_stats/data

Parsing file0.1.txt

Step 0.0

  • look at the content of file file0.1.txt,
  • compute basic statistics (sum, average, number) on the values.

This can be done through the following steps:

  • open the file,
  • iterate on the lines,
  • for each line, convert its values to a float,
  • update the current statistics.

Example of output

python3 step0.0.py 
file = "../data/file0.1.txt"
# on Windows:
# file = r"..\data\file0.1.txt"
# r like "raw" = no interpretation of the special characters ("\n", "\t", etc.)
# Such r-strings are also useful when we write Latex code in Python.
nb = 78; sum = 42.46; avg = 0.54

Parsing file0.1.txt

Step 0.1

Same as step 0.0 but define a function that does the processing:

In [3]:
def compute_stats(file_name):
    """ computes the statistics of data in file_name
    :param file_name: the name of the file to process
    :type file_name: str
    :return: the statistics
    :rtype: a tuple (number, sum, average)
    """
    pass

Parse files file0.1.txt, file0.2.txt and file0.3.txt

Step 0.2

Same as step 0.1 but process many files and print file base statistics and overall statistics:

python3 step0.2.py 
file = "../data/file0.1.txt"
nb = 78   ; sum = 42.46  ; avg = 1.84 
file = "../data/file0.2.txt"
nb = 100  ; sum = 53.29  ; avg = 0.53
file = "../data/file0.3.txt"
nb = 25   ; sum = 12.72  ; avg = 0.51 
# total over all files:
nb = 203  ; sum = 108.47 ; avg = 0.53

Parse a file with comments (file_with_comment_col0.txt)

Step 1.0

Now suppose the files contains some comments (i.e. lines starting with a '#').

Adapt previous script so that we do not consider these lines (see file file_with_comment_col0.txt).

Possible result:

python3 step1.0.py 
file = ../data/file_with_comment_col0.txt
nb = 100; total = 53.29; avg = 0.53

Parse a file with comments (file_with_comment_anywhere.txt)

Step 1.1

Now suppose the file contains comments in the middle of the line (see e.g. file_with_comment_anywhere.txt that contains some comments that mainly prevent the string to float conversion.

Adapt script 1.0 to handle this format.

Possible output:

python3 step1.1.py 
file = "../data/file_with_comment_col0.txt"
nb = 100  ; sum = 53.29  ; avg = 0.53 
file = "../data/file_with_comment_anywhere.txt"
nb = 96   ; sum = 51.65  ; avg = 0.54 
# total over all files:
nb = 196  ; sum = 104.93 ; avg = 0.54

Parsing more complicated files

Step 2.0

Now suppose the files are pre-formated with lines of the form

p1=0.7742 p2=0.74973 p3=0.77751
p1=0.7493 p2=0.34762 p3=0.44521
p1=0.4261 p3=0.88275 p2=0.74016
  • Write a function that checks that all lines are of the same form (i.e. three columns, or more checks, e.g. p1, p2, p3 exists).
  • Write to the screen the lines which contain error.

Possible output:

python3 step2.0.py 
checking ../data/file_mut_cols.txt
checking ../data/file_mut_cols_with_error.txt
line 8 contains only 1 fields, expecting 3
line 15 contains only 2 fields, expecting 3
line 20: keys do not match the required keys: problem with keys {'p2'}
line 23: keys do not match the required keys: problem with keys {'p3', 'p7'}

Parsing more complicated files and clean them

Step 2.1

We now want to clean the in file(s) by filling the missing values. In order to be able to check the cleaning, we want, for each input file, to generate a new file in which the missing values are felt with default values:

In [1]:
default_values = {'p1': 1, 'p2': 2, 'p3': 3}