# Debugging

## Step-by-step debugging

## Learning Objectives

- Understand how to run a script under debugger control
- Understand how to use the debugger to stop execution at certain points
- Understand how to use the debugger to step through a program

The following assumes that we have the files `histograms.py`

and `do_equalization.py`

in our current directory, together with the example image `Unequalized_Hawkes_Bay_NZ.png`

. The file `histograms.py`

defines two functions a function `plot_histogram`

and a function `equalize`

. The function `plot_histogram`

plots the histogram of data values in an array and the cumulative distribution of this histogram on top of it. The equalize function is meant to perform the algorithm of histogram equalization on a grayscale image:

```
def equalize(image, n_bins=256):
"""
Perform histogram equalization on the given grayscale ``image`` (2D array
of intensity values between 0 and 1), using ``n_bins`` bins for the
histogram.
Returns the equalized image.
"""
bins = numpy.linspace(0, 1, n_bins, endpoint=True)
bins, hist = numpy.histogram(image.flatten(), bins=bins, density=True)
cdf = hist.cumsum() / n_bins
# Invert the CDF by using numpy's interp function
equalized = numpy.interp(image.flatten(), bins, cdf)
# All this was performed on flattened versions of the image, reshape the
# equalized image back to the original shape
return equalized.reshape(image.shape)
```

The main idea behind the algorithm is that we want to transform the image using a mapping that preserves the relative values of the intensities but use all the values in the intensity range for an (approximately) equal amount of pixels. It turns out that inverting the cumulative distribution function of the intensity values gives us such a mapping.

With that in mind, the algorithm has the following steps:

- Calculate the histogram of the grayscale values (between 0 and 1)
- Calculate the cumulative distribution of these values
- Interpret the cumulative distribution as a function and invert it, then use that inverted function to get the new intensity values.

As a result, the equalized image should have a flat histogram of intensity values and equivalently a linear cumulative distribution of intensity values from 0 to 1.

Let’s try to run the `do_equalization.py`

script which plots an example image and its histogram, equalizes it using the above function and plots the result:

`% run do_equalization.py`

```
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
[..]/do_equalization.py in <module>()
16
17 # Equalize the image histogram
---> 18 equalized = histograms.equalize(image)
19 plt.subplot(2, 2, 3)
20 plt.imshow(equalized, cmap='gray')
[..]/histograms.py in equalize(image, n_bins)
16 cdf = hist.cumsum() / n_bins
17 # Invert the CDF by using numpy's interp function
---> 18 equalized = numpy.interp(image.flatten(), bins, cdf)
19
20 # All this was performed on flattened versions of the image, reshape the
[..]/function_base.py in interp(x, xp, fp, left, right, period)
1269 return compiled_interp([x], xp, fp, left, right).item()
1270 else:
-> 1271 return compiled_interp(x, xp, fp, left, right)
1272 else:
1273 if period == 0:
ValueError: fp and xp are not of the same length.
```

OK, apparently that did not work. The arguments to `numpy.interp`

do not seem to have the same length. Let’s use the debugger to investigate this:

`% debug`

```
ipdb> xp.shape
(99,)
ipdb> fp.shape
(100,)
ipdb> u
> [...]/histograms.py(18)equalize()
16 cdf = hist.cumsum() / n_bins
17 # Invert the CDF by using numpy's interp function
---> 18 equalized = numpy.interp(image.flatten(), bins, cdf)
19
20 # All this was performed on flattened versions of the image, reshape the
ipdb> bins.shape
(99,)
ipdb> cdf.shape
(100,)
ipdb> q
```

So it seems that `bins`

and `cdf`

that we provide to `numpy.interp`

differ in their length by one. This is a typical thing to happen when dealing with histograms, since we sometimes use bins (N values) and sometimes bin edges (N+1 values). Let’s fix this, by removing the last value of the `cdf`

variable:

```
def equalize(image, n_bins=256):
bins = numpy.linspace(0, 1, n_bins, endpoint=True)
bins, hist = numpy.histogram(image.flatten(), bins=bins, density=True)
cdf = hist.cumsum() / n_bins
# Invert the CDF by using numpy's interp function
equalized = numpy.interp(image.flatten(), bins, cdf[:-1])
# All this was performed on flattened versions of the image, reshape the
# equalized image back to the original shape
return equalized.reshape(image.shape)
```

Let’s run the script again:

`% run do_equalization.py`

Um, well, the script now runs without an error message but the result does not look any good. We don’t have an error that we can debug in a post-mortem way so instead we start the complete script under debugger control:

`% run -d do_equalization.py`

```
Breakpoint 1 at [...]/examples/do_equalization.py:1
NOTE: Enter 'c' at the ipdb> prompt to continue execution.
> [...]/do_equalization.py(1)<module>()
1---> 1 import matplotlib.pyplot as plt
2
3 import histograms
4
5 if __name__ == '__main__':
ipdb>
```

As when using `%debug`

, we now have a debugger prompt but now we are at the start of the script. The most important commands are the following:

`n`

(`next`

): Execute the current line and step over it (i.e. stay on the same level)`s`

(`step`

): Execute the current line by stepping*into*functions called on the line`c`

(`continue`

): continue execution until the end of a script, an uncaught exception or a breakpoint (see below)`b`

(`breakpoint`

): create a new breakpoint at a given line number (and optionally in a given file)

Just pressing return at the prompt will execute the previous command again (useful for `n`

, `s`

, `c`

).

We are interested in the code happening in the `equalize`

function, so we’ll set a breakpoint at its first line and continue our execution until then:

```
ipdb> b histograms.py:20
Breakpoint 2 at [...]/histograms.py:20
ipdb> c
> [...]/histograms.py(20)equalize()
18 Returns the equalized image.
19 """
2--> 20 bins = numpy.linspace(0, 1, n_bins, endpoint=True)
21 bins, hist = numpy.histogram(image.flatten(), bins=bins, density=True)
22 cdf = hist.cumsum() / n_bins
ipdb> n
> [...]/histograms.py(21)equalize()
19 """
2 20 bins = numpy.linspace(0, 1, n_bins, endpoint=True)
---> 21 bins, hist = numpy.histogram(image.flatten(), bins=bins, density=True)
22 cdf = hist.cumsum() / n_bins
23 # Invert the CDF by using numpy's interp function
ipdb> bins[:10]
array([ 0. , 0.01010101, 0.02020202, 0.03030303, 0.04040404,
0.05050505, 0.06060606, 0.07070707, 0.08080808, 0.09090909])
ipdb> n
> [...]/histograms.py(22)equalize()
2 20 bins = numpy.linspace(0, 1, n_bins)
21 bins, hist = numpy.histogram(image.flatten(), bins=bins, density=True)
---> 22 cdf = hist.cumsum() / n_bins
23 # Invert the CDF by using numpy's interp function
24 equalized = numpy.interp(image.flatten(), bins, cdf[:-1])
ipdb> hist[:10]
array([ 0. , 0.01010101, 0.02020202, 0.03030303, 0.04040404,
0.05050505, 0.06060606, 0.07070707, 0.08080808, 0.09090909])
ipdb> bins[:10]
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
```

We have found the issue: we swapped the arguments for the return value of `numpy.histogram`

, storing the bin edges in the `hist`

variable and the histogram in `bin`

. Looking back at our previous fix for the inconsistent length of `bin`

and `cdf`

we could have realized back then that `bins`

should have 1 element more than `cdf`

and not the other way round… Fixing those issues leads to the following code:

```
def equalize(image, n_bins=256):
bins = numpy.linspace(0, 1, n_bins, endpoint=True)
hist, bins = numpy.histogram(image.flatten(), bins=bins, density=True)
cdf = hist.cumsum() / n_bins
# Invert the CDF by using numpy's interp function
equalized = numpy.interp(image.flatten(), bins[:-1], cdf)
# All this was performed on flattened versions of the image, reshape the
# equalized image back to the original shape
return equalized.reshape(image.shape)
```

Now our script runs and gives the expected result:

`% run do_equalization.py`

## Debug step by step

Call the `check_data`

function from the previous section with `numpy.nan`

as its `target`

argument. The result is not as expected, use the debugger to go through the code step by step and find the problem.