Extending Python in C: Modifying NumPy arrays in place
Iterating over masssive amounts of data can be extremely slow in Python.
Rewriting performance-critical code in C can improve performance by over 1000x
and is easy to do.
Suppose we have an image transformation algorithm that takes an image as input
and produces an image as output.
This code transforms each pixel value into the unit interval by dividing it by 255,
multiplies it with the values of all neighbor pixels, normalizes the result and transforms it
back into the RGB format by multiplying with 255.
The input image shall be a watermelon pizza.
And the following is the image produced by our algorithm.
A quick benchmark reveals that transforming one image with this algorithm takes over 9 seconds.
Way too slow. Let’s do some profiling.
> python -m cProfile -s time -o pixel.prof pixel.py
> python -c 'import pstats; p = pstats.Stats("pixel.prof"); p.sort_stats("time").print_stats(4)'
Thu Sep 10 10:41:33 2020 pixel.prof
9214391 function calls (8705887 primitive calls) in 10.604 seconds
Ordered by: internal time
List reduced from 1489 to 4 due to restriction <4>
ncalls tottime percall cumtime percall filename:lineno(function)
1 5.586 5.586 10.329 10.329 pixel.py:5(transform)
505284 1.359 0.000 2.737 0.000 /usr/lib/python3/dist-packages/numpy/linalg/linalg.py:2325(norm)
1515856/1010572 1.177 0.000 3.460 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
505286 0.371 0.000 0.371 0.000 {built-in method numpy.empty}
The profiling output reveals that most time is spent indeed in the transform function.
If you prefer a more graphical output, this can be achieved with gprofdot.