A training to acquire strong basis in Python to use it efficiently
Pierre Augier (LEGI), Cyrille Bonamy (LEGI), Eric Maldonado (Irstea), Franck Thollard (ISTerre), Oliver Henriot (GRICAD), Christophe Picard (LJK), Loïc Huder (ISTerre)
In this part, you will be asked to generate various plots by using electrocardiogram data provided by SciPy. Some pointers will be given for the asked tasks but you are expected to make extensive use of:
The ECG data can be accessed through scipy.misc
:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy.misc
ecg_data = scipy.misc.electrocardiogram()
ecg_data
is a 1D NumPy array containing the values of the ECG signal (in mV) that were recorded at 360 Hz (https://docs.scipy.org/doc/scipy/reference/generated/scipy.misc.electrocardiogram.html).
The plot that we generated is a little barebone so let's improve it a bit.
time
that will give the seconds elapsed since the beginning of the ECG record.time
to plot the ECG signal again but with respect to time now. Again adjust the limits of your plot to have something meaningful to look at.Let's now try to combine data manipulation with the plots.
vlines
).subplots
. BONUS: try to have the Y axis shared in the subplots.We worked exclusively with line plots until now so let's try something else.
hist
matplotlib).bins
args so that we have one bar for every range of 0.2 between -4 and 4.scipy.optimize.curve_fit
to do a Gaussian fit of the obtained distribution. Plot the fit on the histogram with a black line. BONUS: Try to fit by a Lorentzian instead.If you still crave things to do, here are possible bonus exercices:
scipy.signal.savgol_filter
) and plot dots on them.scipy.fftpack
) to find dominant frequencies.nb_samples = len(ecg_data) # Or ecg_data.size as it is 1D
print(f'ecg_data has {nb_samples} samples (min: {np.min(ecg_data)}, max: {np.max(ecg_data)}). Mean: {np.mean(ecg_data)}.')
ecg_data has 108000 samples (min: -3.485, max: 3.65). Mean: -0.16510875
plt.figure()
plt.plot(ecg_data)
plt.xlim(1000, 2000)
# OR
f = plt.figure()
ax = f.add_subplot(111)
ax.plot(ecg_data)
ax.set_xlim(1000, 2000)
(1000, 2000)
The plot that we generated is a little barebone so let's improve it a bit.
time
that will give the seconds elapsed since the beginning of the ECG record.time
to plot the ECG signal again but with respect to time now. Again adjust the limits of your plot to have something meaningful to look at.sampling_frequency = 360.
time = np.arange(nb_samples)/sampling_frequency
f = plt.figure()
ax = f.add_subplot(111)
ax.plot(time, ecg_data, marker='o', markersize=2)
ax.set_xlim(9, 10.5)
ax.set_ylim(-1, 2)
ax.set_xlabel('Time (s)')
ax.set_ylabel('ECG (mV)')
Text(0, 0.5, 'ECG (mV)')
Let's now try to combine data manipulation with the plots.
vlines
).subplots
. (BONUS: try to have the Y axis shared in the subplots)i_max = np.argmax(ecg_data)
t_max = time[i_max]
print(f'Time of maximum is {t_max}')
dt = 2.
f = plt.figure(figsize=(14, 4))
ax_max = f.add_subplot(121)
ax_max.plot(time, ecg_data)
ax_max.set_xlim(t_max - dt, t_max + dt)
ax_max.vlines(t_max, -4, 4, color='red', linestyle='--')
ax_max.set_ylim(-4, 4)
i_min = np.argmin(ecg_data)
t_min = time[i_min]
print(f'Time of minimum is {t_min}')
ax_min = f.add_subplot(122, sharey=ax_max)
ax_min.plot(time, ecg_data)
ax_min.set_xlim(t_min - dt, t_min + dt)
ax_min.vlines(t_min, -4, 4, color='green', linestyle='--')
Time of maximum is 42.516666666666666 Time of minimum is 99.49722222222222
<matplotlib.collections.LineCollection at 0x7fd039155ba8>
We worked exclusively with line plots until now so let's try something else.
hist
matplotlib).bins
args so that we have one bar for every range of 0.2 between -4 and 4.scipy.optimize.curve_fit
to do a Gaussian fit of the obtained distribution. Plot the fit on the histogram with a black line. BONUS: Try to fit by a Lorentzian instead.delta = 0.2
f = plt.figure()
ax = f.add_subplot(111)
hist_data, bins, patch_lists = ax.hist(ecg_data, bins=np.arange(-4, 4, delta))
# Bins contains the edges of bars so we need an operation to get the middle of the bars:
middles = bins[:-1] + (bins[1:] - bins[:-1])/2.
ax.plot(middles, hist_data, linestyle='None', color='yellow', marker='P', label='Histogram data for fits')
import scipy.optimize
x_for_fit = middles
y_for_fit = hist_data
def gaussian(x, a, x0, sigma):
return a * np.exp(-((x- x0)/sigma)**2)
def lorentzian(x, a, x0, gamma):
return a/(np.pi*gamma)*(gamma**2/((x-x0)**2 + gamma**2))
parameters, covariances = scipy.optimize.curve_fit(gaussian, x_for_fit, y_for_fit)
parameters2, covariances2 = scipy.optimize.curve_fit(lorentzian, x_for_fit, y_for_fit)
print(parameters, covariances)
print(parameters2, covariances2)
X = np.arange(-4, 4, 0.05)
ax.plot(X, gaussian(X, *parameters), c='k', label='Gaussian fit (Meh.)')
ax.plot(X, lorentzian(X, *parameters2), c='magenta', label='Lorentzian fit (Better !)')
ax.legend()
[ 2.12147050e+04 -2.42983901e-01 5.00728220e-01] [[ 7.00972367e+05 9.08436030e-08 -1.10302477e+01] [ 9.08436030e-08 2.60351979e-04 -1.27604898e-12] [-1.10302477e+01 -1.27604898e-12 5.20703973e-04]] [ 2.36984903e+04 -2.43458176e-01 3.07113825e-01] [[ 2.26389471e+05 -7.31701495e-02 2.95517475e+00] [-7.31701495e-02 3.75995236e-05 -2.38886877e-06] [ 2.95517475e+00 -2.38886877e-06 7.67656344e-05]]
<matplotlib.legend.Legend at 0x7fd036e69470>