Speeding Up Python With Cython
Cython is a superset of Python that lets you significantly improve the speed of your code. You can add optional type declarations for even greater benefits. Cython translates your code to optimized C/C++ that gets compiled to a Python extension module.
In this tutorial you'll learn how to install Cython, get an immediate performance boost of your Python code for free, and then how to really take advantage of Cython by adding types and profiling your code. Finally, you'll learn about more advanced topics like integration with C/C++ code and NumPy that you can explore further for even greater gains.
Counting Pythagorean Triples
Pythagoras was a Greek mathematician and philosopher. He is famous for his Pythagorean theorem, which states that in a right-angled triangle, the sum of squares of the legs of the triangles is equal to the square of the hypotenuse. Pythagorean triples are any three positive integers a, b and c that such that
a² + b² = c². Here is a program that finds all the Pythagorean triples whose members are not greater than the provided limit.
import time def count(limit): result = 0 for a in range(1, limit + 1): for b in range(a + 1, limit + 1): for c in range(b + 1, limit + 1): if c * c > a * a + b * b: break if c * c == (a * a + b * b): result += 1 return result if __name__ == '__main__': start = time.time() result = count(1000) duration = time.time() - start print(result, duration) Output: 881 13.883624076843262
Apparently there are 881 triples, and it took the program a little less than 14 seconds to find it out. That's not too long, but long enough to be annoying. If we want to find more triples up to a higher limit, we should find a way to make it go quicker.
It turns out that there are substantially better algorithms, but today we're focusing on making Python faster with Cython, not on the best algorithm for finding Pythagorean triples.
Easy Boosting With pyximport
The easiest way to use Cython is to use the special pyximport feature. This is a statement that compiles your Cython code on the fly and lets you enjoy the benefits of native optimization without too much trouble.
You need to put the code to cythonize in its own module, write one line of setup in your main program, and then import it as usual. Let's see what it looks like. I moved the function to its own file called pythagorean_triples.pyx. The extension is important for Cython. The line that activates Cython is
import pyximport; pyximport.install(). Then it just imports the module with the count() function and later invokes it in the main function.
import time import pyximport; pyximport.install() import pythagorean_triples def main(): start = time.time() result = pythagorean_triples.count(1000) duration = time.time() - start print(result, duration) if __name__ == '__main__': main() Output: 881 9.432806253433228
The pure Python function ran 50% longer. We got this boost by adding a single line. Not bad at all.
Build Your Own Extension Module
While pyximport is really convenient during development, it works only on pure Python modules. Often when optimizing code you want to reference native C libraries or Python extension modules.
To support those, and also to avoid dynamically compiling on every run, you can build your own Cython extension module. You need to add a little setup.py file and remember to build it before running your program whenever you modify the Cython code. Here is the setup.py file:
from distutils.core import setup from Cython.Build import cythonize setup( ext_modules = cythonize("pythagorean_triples.pyx") )
Then you need to build it:
$ python setup.py build_ext --inplace Compiling pythagorean_triples.pyx because it changed. [1/1] Cythonizing pythagorean_triples.pyx running build_ext building 'pythagorean_triples' extension creating build creating build/temp.macosx-10.7-x86_64-3.6 gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/gigi.sayfan/miniconda3/envs/py3/include -arch x86_64 -I/Users/gigi.sayfan/miniconda3/envs/py3/include -arch x86_64 -I/Users/gigi.sayfan/miniconda3/envs/py3/include/python3.6m -c pythagorean_triples.c -o build/temp.macosx-10.7-x86_64-3.6/pythagorean_triples.o gcc -bundle -undefined dynamic_lookup -L/Users/gigi.sayfan/miniconda3/envs/py3/lib -L/Users/gigi.sayfan/miniconda3/envs/py3/lib -arch x86_64 build/temp.macosx-10.7-x86_64-3.6/pythagorean_triples.o -L/Users/gigi.sayfan/miniconda3/envs/py3/lib -o pythagorean_triples.cpython-36m-darwin.so
As you can see from the output, Cython generated a C file called pythagorean_triples.c and compiles it a platform-specific .so file, which is the extension module that Python can now import like any other native extension module.
If you're curious, take a peek at the generated C code. It is very long (2789 lines), obtuse, and contains a lot of extra stuff needed to work with the Python API. Let's drop the pyximport and run our program again:
import time import pythagorean_triples def main(): start = time.time() result = pythagorean_triples.count(1000) duration = time.time() - start print(result, duration) if __name__ == '__main__': main() 881 9.507064819335938
The result is pretty much the same as with pyximport. However, note that I'm measuring only the runtime of the cythonized code. I'm not measuring how long it takes pyximport to compile the cythonized code on the fly. In big programs, this can be significant.
Adding Types to Your Code
Let's take it to the next level. Cython is more than Python and adds optional typing. Here, I just define all the variables as integers, and the performance skyrockets:
# pythagorean_triples.pyx def count(limit): cdef int result = 0 cdef int a = 0 cdef int b = 0 cdef int c = 0 for a in range(1, limit + 1): for b in range(a + 1, limit + 1): for c in range(b + 1, limit + 1): if c * c > a * a + b * b: break if c * c == (a * a + b * b): result += 1 return result ---------- # main.py import time import pyximport; pyximport.install() import pythagorean_triples def main(): start = time.time() result = pythagorean_triples.count(1000) duration = time.time() - start print(result, duration) if __name__ == '__main__': main() Output: 881 0.056414127349853516
Yes. That's correct. By defining a couple of integers, the program runs in less than 57 milliseconds, compared to more than 13 seconds with pure Python. That's almost a 250X improvement.
Profiling Your Code
I used Python's time module, which measures wall time and is pretty good most of the time. If you want more precise timing of small code fragments, consider using the timeit module. Here is how to measure the performance of the code using timeit:
>>> import timeit >>> timeit.timeit('count(1000)', setup='from pythagorean_triples import count', number=1) 0.05357028398429975 # Running 10 times >>> timeit.timeit('count(1000)', setup='from pythagorean_triples import count', number=10) 0.5446877249924
timeit() function takes a statement to execute, a setup code that is not measured, and the number of times to execute the measured code.
I just scratched the surface here. You can do a lot more with Cython. Here are a few topics that can further improve the performance of your code or allow Cython to integrate with other environments:
- calling C code
- interacting with the Python C API and the GIL
- using C++ in Python
- porting Cython code to PyPY
- using parallelism
- Cython and NumPy
- sharing declarations between Cython modules
Cython can produce two orders of magnitude of performance improvement for very little effort. If you develop non-trivial software in Python, Cython is a no-brainer. It has very little overhead, and you can introduce it gradually to your codebase.
Additionally, don’t hesitate to see what we have available for sale and for study in the marketplace, and don't hesitate to ask any questions and provide your valuable feedback using the feed below.
Source: Tuts Plus