The exec Statement and A Python Mystery

A few weeks ago I examined Python code objects using the dis module. In that post, I showed several examples of executing code objects created at runtime using the exec statement; here we’ll explore compile()’s compliment, exec, how to invoke it, and some of the quirks of using it.

Note that, like my post on code objects, the information here is only relevant to Python 2.x, specifically, Python 2.7. Unlike that post, at least one technique presented here will not work in Python 3.x — in 3.x, exec is a function, and modifications to locals will not propagate to the calling scope. (Thanks to comex for pointing this out on Hacker News.)

A Refresher: compile()

Before we dive in to exec, recall that compile() converts a string of Python source code into a code object, using the same machinery as the usual Python compilation process. It takes three arguments: the source code to compile, the filename of the source code (for which it is customary to use ““ for source code not originating in a file), and the "mode” of the compilation (for which “exec” is the most useful mode, at least for these sorts of experiments).

Make It Go

By itself, a code object is not terribly useful. Sure, it encapsulates all the information needed to execute some Python code, but it is just a noun, with no way to run itself.

>>> code_object = compile("""
... print "Hello, world"
... """, '<string>', 'exec')
>>> code_object()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'code' object is not callable

This isn’t terribly useful. Of course we can use the dis module to inspect the bytecode, and its attributes to find out what variables it uses, the source file from which this code object was created, etc, but what we really want is to run it. Enter the exec statement:

>>> exec code_object
Hello, world

That’s more like it!

Playing With exec

Imagine that we have a function which accepts a code object and an argument x:

>>> def exec_code_and_return_x(code_object, x):
...     exec code_object
...     return x

You might expect this function to return the value of x without modification — after all, no code in the function actually manipulates x in any way. However (depending on the code object), you can get some very unexpected outcomes:

>>> my_code_object = compile("""
... x = x + 1
... """, '<string>', 'exec')
>>> exec_code_and_return_x(my_code_object, 1):

But you won’t always get the unexpected results you expect:

>>> my_code_object = compile("""
... del x
... """, '<string>', 'exec')
>>> exec_code_and_return_x(my_code_object, 1)

So what’s going on here?

On exec and Scoping

Without any additional instructions, exec uses the current global and local namespaces to execute your code. This means that (as we saw above), code objects being exec’d can modify variables in scope when it is exec’d (as well, obviously, as any globals the code object happens to manipulate).

You can also customize (to a certain extent) the scope given to the code object by using the in “arguments” to exec:

>>> code_globals = {}
>>> code_locals = {}
>>> code_object = compile("""
... x = 1
... """, '<string>', 'exec')
>>> exec code_object in code_globals, code_locals
>>> print code_locals
{'x': 1}

Note that builtins, as their name suggests, are always available, even when execing with a empty scopes:

>>> code_globals = {}
>>> code_locals = {}
>>> code_object = compile("""
... print unicode("Hello, world")
... """, '<string>', 'exec')
>>> exec code_object in code_globals, code_locals
Hello, world
>>> code_globals.keys()

Solving Our Python Mystery

Returning to our mystery example from before:

>>> my_code_object = compile("""
... del x
... """, '<string>', 'exec')
>>> exec_code_and_return_x(my_code_object, 1)

Why can’t the code object delete the local variable x? Specifically, why is the above code different from this:

>>> def delete_local_then_return_it():
...     x = 1
...     del x
...     return x
>>> delete_local_then_return_it()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in delete_local_then_return_it
UnboundLocalError: local variable 'x' referenced before assignment

What we’re seeing here is the result of a Python performance optimization: rather than using a dictionary (or some other mapping object) for function locals, and Python bytecode instructions reference positions within this array. As a consequence of this optimization, dynamic code (such as that returned by compile(), or invocations of exec directly on a string) cannot modify the size of the locals array, since it is fixed at compile time.

Moreover, the implementation of exec, that is, of the EXEC_STMT opcode, arranges for a new Python stack frame to be pushed, which means that it has its own array of locals (those used by the code object being exec’d):

>>> import inspect
>>> def get_stack_depth():
...     frame = inspect.currentframe()
...     # begin depth at -1 to account for the extra
...     # frame created when calling get_stack-depth
...     depth = -1
...     while frame:
...             depth += 1
...             frame = frame.f_back
...     return depth
>>> def show_stack_depths():
...     print 'stack depth in function', get_stack_depth()
...     exec compile("""
...     print "stack depth in exec", get_stack_depth()
...     """, '<string>', 'exec')
>>> show_stack_depths()
stack depth in function 2
stack depth in exec 3

Code objects can, of course, modify the local variables on their own stack frame:

>>> def show_code_object_locals():
...     code_globals = {}
...     code_locals = {'x': 1}
...     exec compile("""del x""", '<string>', 'exec') in code_globals, code_locals
...     return code_locals
>>> show_code_object_locals()

Warnings and Gotchas

Using exec to run code dynamically at runtime is fun, but it is not without its costs.

First, I would be remiss if I did not strongly urge you never to use exec; Python provides no way (and, in fact, there is no general way) to ensure that the code you’re attempting to execute will do no harm to your system. Think of this like Python’s version of cross-site scripting attacks. If you are ever executing untrusted code submitted by end-users of your system, you’re doing it wrong.

Secondly, executing code in this fashion is significantly slower than simply calling regular Python code, as the latter code path is heavily optimized in the interpreter, whereas calling dynamic code is not.

Finally, if you do need to use exec, you may find that it is more convenient to pass a string or file object as the first argument — however, beware that this will incur the cost of compiling the source code each time the exec statement is encountered. If you are repeatedly execing the same piece of code, you will see significant performance improvement by first compile()ing and saving the resulting code object for re-use.

python, tinkering

Your thoughts:

you write:

Unlike that post, at least one technique presented here will not work in Python 3.x — in 3.x, exec is a function, and modifications to locals will not propagate to the calling scope.

exec() takes 3 params. if you pass __globals__ and __locals__ as the second and third params the changes will propagate.

— James Thiele, 2012-05-01 05:19 pm

There is no (or at least, I am not aware of any) __globals__ or __locals__ global names in Python 3. Perhaps you’re thinking of the __globals__ attribute of function objects?

If you pass globals() and locals(), then modifications will be made to the dictionary resulting from the call to locals(), but since it is a copy of the function’s locals into a dictionary, and not a reference to the actual locals (which is an array, even in Python 3), modifications there will not affect the calling scope, same as in Python 2.

— Dan Crosta, 2012-05-01 11:59 pm

I have an app where users are allowed to provide evaluated input in the form of a python literal. I found ast.literal_eval() as a way to simultaneously syntax-check and evaluate the input, and guarantee that the input had no executable function injected.

— dcortesi, 2012-05-03 02:52 am

Thank you for pointing out that interesting aspect of exec!

Why do changes on the locals in the exec'ed block reflect on the locals of the calling scope if the exec'ed block gets its own stack frame?

Kind regards, Alfe

— Alfe, 2012-05-03 08:56 am

I don’t think that the problem is as simple as the code exec runs executing in a new stack frame, because if it was just that the code couldn’t alter the function’s x at all (and it clearly does). What’s really happening is that exec is doing an incomplete job of copying the changes from its scope back into the function; it notices updated variables but not deleted variables.

(While exec can’t modify the size of the local variable array, nothing can; del x for function locals isn’t implemented by shrinking the array but by writing a special value to the relevant array slot (specifically a C NULL). Under some very carefully limited circumstances you can actually delete a function’s local variables from outside it, but this is deep magic.)

By the way, thank you for writing this entry; it has set me off on a fascinating exploration into the some obscure depths of Python 2 bytecodes and CPython implementation.

— Chris Siebenmann, 2012-05-03 11:25 am

@Chris Siebenmann, have you found where in CPython source you think it’s incompletely copying its changes back? I suspect you might be right about this, in retrospect, and I’d love to see where the bug is if you’ve found it. If not, I may dig in myself and write a follow-up.

@Alfe, see Chris Siebenmann’s comment just below yours, I think it answers your question better than I could.

— Dan Crosta, 2012-05-03 11:56 am

The code sequence to look at starts at the end of exec_statement() in Python/ceval.c, where it calls PyFrame_LocalsToFast() (which is in Objects/frameobject.c). In theory its second argument should be set to enable clearing of deleted variables, but in practice I believe that the frame object locals dictionary needs special magic to signal the deletion (see dict_to_map() in frameobject.c for the low level details) and that magic’s not present in the dictionary that exec_statement() uses.

(The only place where the special magic is available is in tracing and profiling hooks, which are the only code that calls PyFrame_LocalsToFast() with a clear argument of 1. But even then there are various issues that I ran into that make it peculiar and hard.)

Diving into the depths of the CPython interpreter is one of my hobbies (or perhaps obsessions). I take it further than I probably should, cf the Python section of my blog.

— Chris Siebenmann, 2012-05-05 03:20 pm

Comments are closed.