Profiling
Detailed runtime measurements
The tools from the previous section can help us decide the question whether we need to optimize in the first place (is the total run time fast enough?) and it can guide when we want to replace code by a better-performing alternative (such as a more specialized numpy function instead of a general built-in function). They do not tell us what to optimize, though.
As a first example, let’s us have a look at the classical Fibonaci sequence , where each number is the sum of the two preceding numbers. This can be directly written down in a recursive function (note that for simplicity we leave away all error checking, e.g. for negative numbers):
def fibonacci(n):
if n < 2:
return n # fibonacci(0) == 0, fibonacci(1) == 1
else:
return fibonacci(n - 2) + fibonacci(n - 1)
This seems to work fine, but the runtime increases with n
in a dramatic fashion:
%time factorial(10)
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 52.7 µs
89
%time factorial(20)
CPU times: user 12 ms, sys: 0 ns, total: 12 ms
Wall time: 9.63 ms
10946
%time factorial(30)
CPU times: user 696 ms, sys: 0 ns, total: 696 ms
Wall time: 695 ms
1346269
%time factorial(35)
CPU times: user 7.49 s, sys: 24 ms, total: 7.52 s
Wall time: 7.52 s
14930352
To get an idea what is going on, we can use %prun
, which runs a command with Python’s built-in profiler:
% prun factorial(35)
29860706 function calls (4 primitive calls) in 10.621 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
29860703/1 10.621 0.000 10.621 10.621 <ipython-input-6-b471b8bf6ddb>:1(factorial)
1 0.000 0.000 10.621 10.621 {built-in method builtins.exec}
1 0.000 0.000 10.621 10.621 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
The output gives us three pieces of information for every function called during the execution of the command: the number of times the function was called (ncalls
), the total time spend in that function itself (tottime
) and the time spend in that function, including all time spend in functions called by that function (cumtime
). It is not surprising that all of the time is spend in the factorial
function (after all, that’s the only function we have) but the function got called 29860703 times! There is no way we are going to get a decent performance from this function without changing the approach fundamentally.
A better Fibonacci sequence
Do you know of a better way to write the Fibonacci function? Can you imagine specific ways of using that function where you would prefer yet another approach?
Optimizing a calculation by using a fundamentally different approach is called “algorithmic optimization” and it is the potentially most powerful way to increase the performance of a program. Whenever the runtime of a program appears to be slow, the first check should be whether there is a function that takes a lot of time and is called more often than expected (e.g. we calculate a measure on 1000 values and expect a function to be called about a 1000 times as well but it is called 1000*1000 times instead).
The output of a profiler run can easily become overwhelming in bigger projects. It can therefore be useful to use a graphical tool that represents this information in a more accessible and interactive way. One such tool is snakeviz that can also be used directly from ipython or a jupyter notebook.
Let’s have a another look at the equalize
function we used earlier:
def equalize(image, n_bins=256):
bins = numpy.linspace(0, 1, n_bins, endpoint=True)
hist, bins = numpy.histogram(image.flatten(), bins=bins, density=True)
cdf = hist.cumsum() / n_bins
# Invert the CDF by using numpy's interp function
equalized = numpy.interp(image.flatten(), bins[:-1], cdf)
# All this was performed on flattened versions of the image, reshape the
# equalized image back to the original shape
return equalized.reshape(image.shape)
We’ll again load our example image:
image = plt.imread('Unequalized_Hawkes_Bay_NZ.png')[:, :, 0]
With snakeviz installed, we can load the snakeviz
magic:
%load_ext snakeviz
We can then use %snakeviz
in the same way we previously used %prun
and it will show us a graphical representation of where the time was spent:
%snakeviz equalize(image)
This representation is useful for a global overview of the runtime of the various functions in a project. It might uncover functions that do not seem to be worth optimizing because of their short runtime but are in fact called from various places so that their total runtime is significant.
%prun
and %snakeviz
give information about which functions take up the most time, but they are less useful to see optimization potential in single functions. Another type of profiling is line-based profiling which measures the runtime of every line in one or more functions. This functionality is provided by the line_profiler package which also provides a magic extension for ipython and the jupyter notebook:
%load_ext line_profiler
The magic command is %lprun
and can be used in a similar way to %prun
and %snakeviz
, but in contrast to those other profiling functions it needs the specification of the functions of interest in addition:
%lprun -f equalize equalize(image)
Timer unit: 1e-06 s
Total time: 0.013282 s
File: <ipython-input-96-437a650e3f63>
Function: equalize at line 1
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1 def equalize(image, n_bins=256):
2 1 79 79.0 0.6 bins = numpy.linspace(0, 1, n_bins, endpoint=True)
3 1 8160 8160.0 61.4 hist, bins = numpy.histogram(image.flatten(), bins=bins, density=True)
4 1 18 18.0 0.1 cdf = hist.cumsum() / n_bins
5 # Invert the CDF by using numpy's interp function
6 1 5006 5006.0 37.7 equalized = numpy.interp(image.flatten(), bins[:-1], cdf)
7
8 # All this was performed on flattened versions of the image, reshape the
9 # equalized image back to the original shape
10 1 19 19.0 0.1 return equalized.reshape(image.shape)
Note that you can specify several -f function_name
arguments to get the information for more than one function with a single %lprun
call. For each line, the output specifies how often the line was executed (“Hits”), how long the execution of that line took in total (“Time”) and for each time it was executed (“Per Hit”), and finally how much of the total time of the function was spent in the respective line.