In this post, I present a layered approach to minimizing the inherent security risk when running untrusted Python code.

Step 1: Securing Python

Restrict the actions the code can perform. The method used differs depending on whether you use CPython or PyPy.

CPython

Use RestrictedPython to define a restricted subset of Python.

Use audit hooks (available since Python 3.8) to completely prevent certain actions.

import sys

def audit(event, args):
    if event == 'compile':
        sys.exit('nice try!')

sys.addaudithook(audit)

eval('5')

PyPy

Use sandboxing. It allows you to run arbitrary Python code in a special environment that serializes all input/output so you can check it and decide which commands are allowed before actually running them.

This is the most secure way of restricting Python.

Step 2: Securing the host OS

To protect your host OS, you have two options.

  • Virtualization such as KVM or VirtualBox (more secure)
  • Containerization such as LXD or Docker (much lighter)

In the case of containerization with Docker you may need to add AppArmor or SELinux policies for extra security. LXD already comes with AppArmor policies by default.

Make sure you run the code as a user with as little privileges as possible.

Rebuild the virtual machine/container for each user.

Whichever solution you use, don’t forget to limit resource usage (RAM, CPU, storage, network). Use cgroups if your chosen virtualization/containerization solution does not support these kinds of limits.

Step 3: Timeouts

Use timeouts on all layers to ensure that neither code nor container run longer than they are supposed to.

Step 4: Communication

If distinct pieces of untrusted code need to communicate with each other or with your host app, use a language-agnostic API such as REST or gRPC.

Conclusion

These measures help running untrusted code securely.