Pierre Augier (LEGI), Raphaël Bacher (Gipsa), Cyrille Bonamy (LEGI), Eric Maldonado (Irstea), Franck Thollard (ISTerre), Loïc Huder (ISTerre)
Given a problem (e.g. finding which element is missing in a list of elements), we want to code a solution that solves the problem. The classical workflow is:
design an algorithm that solves the problem.
evaluate the complexity of the algorithm (theoretical point of view)
If your data is large enough, a basic implementation of an algorithm with low complexity will run faster than a fined tuned implementation of a algorithm with high complexity
Looking for the missing element problem: we know that all the element for 0 to N should be present. We can compute the sum and calculate the difference between the computed sum and the mathematical sum. This algorithm access only once each element. It thus has an $O(N)$ complexity, where N is the number of elements.
An algorithm that checks if element e belongs to the list, for each e will has an $O(N^2)$ complexity and will be slower that the previous one for sufficient large value of N.
If your data has some specificity, take advantage of it.
There is a module timeit
in the standard library.
python3 -m timeit -s "import math; l=[]" "for x in range(100): l.append(math.pow(x,2))"
Problem: the module timeit
does not try to guess how many times to execute the statement.
In IPython, you can use the magic command %timeit
that execute a piece of code and stats the time it spends:
import math
l = []
%timeit for x in range(100): l.append(math.pow(x,2))
%timeit [math.pow(x,2) for x in range(100)]
l = []
%timeit for x in range(100): l.append(x*x)
%timeit [x*x for x in range(100)]
28.6 µs ± 1.19 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 19.6 µs ± 283 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 13 µs ± 462 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 7.45 µs ± 239 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
pyperf
is a more powerful tool but we can also do the same as with the module timeit
:python3 -m pyperf timeit -s "import math; l=[]" "for x in range(100): l.append(math.pow(x,2))"
def build_count_base(t):
d = {}
for s in t:
if s in d:
d[s] += 1
else:
d[s] = 1
return d
def build_count_set(t):
d = {k:0 for k in set(t)}
for s in t:
d[s] += 1
return d
def build_count_count(t):
d = {k:t.count(k) for k in set(t)}
return d
def build_count_excpt(t):
d = {}
for s in t:
try:
d[s] += 1
except:
d[s] = 1
return d
import collections
def build_count_counter(t):
return collections.Counter(t)
def build_count_defaultdict(t):
d = collections.defaultdict(int)
for k in s:
d[k] += 1
return d
s = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam tristique at velit in varius. Cras ut ultricies orci. Fusce vel consequat ante, vitae luctus tortor. Sed condimentum faucibus enim, sit amet pulvinar ligula feugiat ac. Sed interdum id risus id rhoncus. Nullam nisi justo, ultrices eu est nec, hendrerit maximus lorem. Nam urna eros, accumsan nec magna eu, elementum semper diam. Nulla tempus, nibh id elementum dapibus, ex diam lacinia est, sit amet suscipit nulla nibh eu sapien. Aliquam orci enim, malesuada in facilisis vitae, pharetra sit amet mi. Pellentesque mi tortor, sagittis quis odio quis, fermentum faucibus ex. Aenean sagittis nisl orci. Maecenas tristique velit sed leo facilisis porttitor. "
s = s*10000
len(s)
print(f"len(s) = {len(s)}, nbkeys {len(set(s))} base, count, count_count, except, colection.counter")
%timeit build_count_base(s)
%timeit build_count_set(s)
%timeit build_count_count(s)
%timeit build_count_excpt(s)
%timeit build_count_counter(s)
%timeit build_count_defaultdict(s)
len(s) = 7160000, nbkeys 33 base, count, count_count, except, colection.counter 1.08 s ± 82.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.09 s ± 42.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 338 ms ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.01 s ± 50.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 458 ms ± 12.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 940 ms ± 62.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
print("with split")
s2 = s.split()
print(f"len(s) = {len(s2)}, nbkeys {len(set(s2))} base, count, count_count, except, colection.counter")
%timeit build_count_base(s2)
%timeit build_count_set(s2)
%timeit build_count_count(s2)
%timeit build_count_excpt(s2)
%timeit build_count_counter(s2)
%timeit build_count_defaultdict(s2)
with split len(s) = 1100000, nbkeys 90 base, count, count_count, except, colection.counter 320 ms ± 64.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 201 ms ± 5.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 2.03 s ± 55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 172 ms ± 5.95 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 101 ms ± 2.71 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 904 ms ± 56.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The best performing algorithm depends on the feature of the input.
In our case: how many different words do we have in our vocabulary?
- 4 for ADN,
- 26 for letters,
- 60 Millions for natural texts
Evaluate the time execution of your script as a whole
Using the Unix command time
:
time myscript.py
Using the Unix program perf
perf myscript.py
Issues:
from time import perf_counter
l = []
t_start = perf_counter()
[math.pow(x,2) for x in range(100)]
print(f"elapsed time: {perf_counter() - t_start:.2e} s")
elapsed time: 3.23e-04 s
cProfile (https://docs.python.org/3.7/library/profile.html): deterministic profiling of Python programs.
2 steps: (1) run the profiler and (2) analyze the results.
Run the profiler
With an already written script python3 -m cProfile myscript.py
Much better, write a dedicated script using the module cProfile. See pyfiles/dtw_cort_dist/V0_numpy_loops/prof.py
Warning: profiling is much slower than a classical run, so do not profile with a long during setting
Analyze the results
The standard tool is pstats
(https://docs.python.org/3.7/library/profile.html#module-pstats)
Or visualize the results with gprof2dot
, SnakeViz
, pyprof2calltree
and kcachegrind
Example: pyprof2calltree -i prof.pstats -k
See http://pramodkumbhar.com/2019/01/python-deterministic-vs-statistical-profilers/
Advantage compared to deterministic profiling: very small overhead
More on profiling on a stackoverflow discussion:
https://stackoverflow.com/questions/582336/how-can-you-profile-a-python-script