4 & 5 October 2012
Cape Town, South Africa
Would You Like To Be ASpeaker? Join Us! Click Here ToRegister

Stepping Through CPython

Larry Hastings

Professional software developer for 20 years. Pythonista since the late '90s. Core developer for 5 years. Release manager for Python 3.4.

Talk Outline

Ever wondered how CPython actually works internally? This talk will show you. We start with a simple Python program, then slowly step through CPython, showing in exhaustive detail what happens when it runs that program. Along the way we'll examine the design and implementation of various major CPython subsystems and see how they fit together. The audience should be conversant in C and Python.

The goal of the talk is to sufficiently familiarize the audience with CPython's internal structure such that a programmer versed in C and Python but having never dealt with an interpreter would be able to comfortably dive in and start hacking on CPython.

The program examined will be simple but deliberately designed to exercise most of CPython's runtime behavior. This will include loading modules implemented in C and in Python, loading bytecode cached on disk, and a cross-section of bytecodes. (For example, I only need to examine one of the BINARY_* math operands; I don't need to walk through every single one.)

Areas I expect to examine:

  • built-in modules, including ones that are automatically loaded before your program starts
  • bytecode, including
    • the various implementations of the inner loop (switch statement, labels-as-values)
    • the peephole optimizer
    • on-disk format
    • marshal
    • the magic version number
    • mention lnotab but probably skip the gory details
  • the stack machine
    • unwinding the stack after an exception (and producing tracebacks)
    • contrast CPython's approach with Stackless
  • All the possible fields of PyObject, an overview of fields in PyType
  • built-in types
    • the implementations of a few key internal types
    • list, dict, tuple, str, byte, int, bool, None
    • though not to the level of detail that Hettinger or Rhodes did in past talks
    • interned values
  • the GIL and reference counting
    • weakrefs
    • garbage collection
    • Py_TRASHCAN
  • CPython's small-block and arena allocators
  • The parser, though I don't want to spend a lot of time on it (runtime is where the fun is ;)
  • Internal utility functions like PyArg_Parse

I'll be giving the talk based on CPython 3.3.

Comment on this Talk

comments powered by Disqus

« Back to Schedule