Yuzhe's Blog

yuzhes

Sandboxing Student Code in Serverless: A Threat Model

Today my MSc project officially kicked off. The premise sounds simple: run student code safely inside AWS Lambda. The constraints make it interesting.

The Problem

Lambda Feedback is a platform where students submit code and get it evaluated in real time. The backend uses serverless functions — AWS Lambda spins up a container, runs the code, returns the result.

For performance, Lambda reuses containers. A function that handled Student A’s submission five minutes ago might handle Student B’s next. Same filesystem, same process memory, same /tmp.

That’s a problem.

[Lambda Instance]
├── /tmp          ← writable, persistent across invocations
├── env vars      ← might contain secrets
├── process memory ← Python module globals survive warm starts
└── network       ← outbound open by default

Student A can write a file to /tmp. Student B can read it. In the worst case, Student A can exfiltrate the evaluator’s logic or poison the grading environment.

What We Can’t Do

Standard OS-level isolation is off the table:

Lambda already applies a seccomp-bpf filter of its own. We can layer on top of it, but we can’t go beneath it.

The Defense Matrix

Here’s what’s available and what each tool covers:

Attackseccomprlimitenv cleanup/tmp clear
Fork bomb
Memory bomb
Disk bomb
/tmp snooping
Env var leak⚠️
/proc reading⚠️
Reverse shell
Network exfil
setuid

The gaps: /proc reading and environment variable leakage. seccomp can’t block getenv() — that’s a memory read, not a syscall. And /proc filtering with BPF argument inspection is fragile.

90% coverage is achievable. The remaining 10% needs creativity.

Clever Workarounds

1. LD_PRELOAD Interception

No kernel access needed. Compile a shim that wraps open():

// Intercept file opens at the libc level
int open(const char *path, int flags, ...) {
    if (strstr(path, "/proc") || strstr(path, "/var/task"))
        return -EACCES;
    return real_open(path, flags, ...);
}
LD_PRELOAD=/lib/shimmy_sandbox.so python3 student_submission.py

Student code calls open("/proc/self/environ") → gets denied. No kernel changes. Works anywhere LD_PRELOAD isn’t stripped.

Downside: a determined student who knows about this can work around it (call syscall() directly). It’s defense-in-depth, not a hard boundary.

2. Environment Sanitization

The simplest fix for env var leaks:

clean_env = {
    "PATH": "/usr/bin:/usr/local/bin",
    "HOME": "/tmp/student",
    "LANG": "en_US.UTF-8",
    # Everything else stripped — no AWS_*, no secrets
}
subprocess.run(["python3", "submission.py"], env=clean_env)

Zero overhead. Should be the baseline for any approach.

3. WebAssembly (The Nuclear Option)

Run student code inside a WASM runtime. Pyodide compiles CPython to WASM; Wasmer/Wasmtime provide the host.

student code → Pyodide → WASM linear memory → Wasmtime

                                    No syscalls. No filesystem.
                                    Everything goes through host imports.

This solves everything — /proc, env vars, network, all of it. The WASM instance has no concept of the host filesystem.

The cost: Pyodide adds ~30MB and seconds of startup. For a platform that values fast feedback, that’s real. But it’s the only option that closes all the gaps.

For now: fork + seccomp + rlimit + env sanitization.

Lambda invocation
  └── fork() new process
        ├── Apply seccomp-bpf filter (deny dangerous syscalls)
        ├── Apply rlimit (CPU, memory, open files)
        ├── Clean env (strip AWS_*, keep only PATH/HOME/LANG)
        ├── Clear /tmp
        └── exec student code

This covers ~90% of the threat surface with low complexity, no root, and reasonable performance overhead.

WASM goes on the roadmap as the long-term path for languages where the toolchain supports it. Python is the priority — Pyodide is production-ready enough.

What’s Next

The interesting constraint here — userspace-only, no OS changes — forces creative solutions. That’s what makes it a research project rather than a configuration problem.