MareArts Computer Vision Study.: 2026

7/29/2026

The list that remembers: mutable default arguments in Python

Python · Language

def append_to(item, target=[]) looks like it starts from an empty list every time. It does not. One list is created when the def statement runs, and every call for the rest of the program's life shares it.

This is the most famous gotcha in Python, and it is famous for a reason: the code is short, the intent is obvious, and it is wrong. There is no error, no warning at runtime, and the symptom depends on what earlier callers happened to do — which makes it genuinely hard to track down when you meet it in a real codebase rather than in a blog post.

It is also a two-minute fix once you know the rule. Every output below is copied from a real run on CPython 3.10.

The bug — three calls, one list
Why it happens — the default lives on the function object
The shape you actually hit — two carts, one basket
The fix — the None sentinel
Variations and near-misses — or [], sentinel objects, dataclasses
What is safe, what is not
Letting a linter catch it — B006 vs B008
Cheat sheet

Part 1 · The bug

Three calls, one list

append.py

def append_to(item, target=[]):
    target.append(item)
    return target

print(append_to(1))
print(append_to(2))
print(append_to(3))

$ python append.py

[1]
[1, 2]        ← expected [2]
[1, 2, 3]     ← expected [3]

The first call behaves. The second reveals that target was not empty when the call started. The list is accumulating across calls, because it is the same list each time.

Part 2 · Why it happens

The default lives on the function object

A default argument expression is evaluated once, when the def statement executes — not on each call. The resulting object is stored on the function and handed to every call that omits the argument. So [] is not an instruction to make a fresh list; it is one list, made once.

def append_to(item, target=[]):
                           ~~
                           evaluated once, at def time —
                           the resulting list is stored on the function

   call 1 ──┐
   call 2 ──┼──▶ that one list ──▶ [1] ─▶ [1, 2] ─▶ [1, 2, 3]
   call 3 ──┘    (never replaced, never reset)

You do not have to take this on faith. The list is reachable through __defaults__:

show_default.py

def append_to(item, target=[]):
    target.append(item)
    return target

print(append_to.__defaults__)
append_to(1)
print(append_to.__defaults__)
append_to(2)
print(append_to.__defaults__)

$ python show_default.py

([],)
([1],)        ← the list attached to the function has changed
([1, 2],)

That is the whole story. The default is a piece of state hanging off the function, and target.append(item) mutates it. Nothing resets it between calls, because nothing was ever going to.

Note what is not the problem

Rebinding is fine. def f(n=0): n += 1 cannot leak, because n += 1 on an integer creates a new object and points the local name at it — the stored default is untouched. The bug needs mutation of the default object, which is why only mutable types are affected.

Part 3 · The shape you actually hit

Two carts, one basket

Nobody writes append_to in production. What people do write is a constructor with an empty collection as its default — and then the shared object is shared between instances:

cart.py

class Cart:
    def __init__(self, items=[]):
        self.items = items

a = Cart()
b = Cart()
a.items.append("apple")

print(b.items)
print(a.items is b.items)
print(Cart.__init__.__defaults__)

$ python cart.py

['apple']              ← a different cart, with someone else's apple in it
True                   ← it was the same list all along
(['apple'],)           ← and it is living on Cart.__init__

Two independent objects, one shared basket. Worse, the default is attached to the class, so the contamination outlives both instances and reaches every Cart() created later in the process.

Why it is hard to debug

The reproduction condition is "what did some earlier caller put in there", which is not visible from the failing line, is not in the traceback, and often depends on test ordering. A unit test that constructs one Cart passes. The suite fails only when another test ran first — and then passes again when you run it alone to investigate.

Passing the argument explicitly sidesteps it, which is part of why the bug hides so well: the code path that everyone tests is usually the one that supplies a value.

explicit argument, no sharing

c = Cart(items=[])       # a genuinely new list
c.items.append("pear")
print(c.items, a.items)  # ['pear'] ['apple']

Part 4 · The fix

The `None` sentinel

Make the default an immutable marker, and build the real object inside the body. The construction then happens per call, which is what you wanted in the first place.

broken

def append_to(item, target=[]):
    target.append(item)
    return target


class Cart:
    def __init__(self, items=[]):
        self.items = items

correct

def append_to(item, target=None):
    if target is None:
        target = []
    target.append(item)
    return target

class Cart:
    def __init__(self, items=None):
        self.items = [] if items is None else items

the fixed version

append_to(1) → [1]
append_to(2) → [2]
append_to(3) → [3]
append_to.__defaults__ → (None,)      ← nothing to accumulate into

None is a singleton and immutable, so storing it on the function is harmless. The important part is the is None check: it is what turns "one object, made once" into "a new object per call".

Type it honestly

If the codebase is annotated, the parameter now genuinely accepts None, and the annotation should say so:

annotated

def append_to(item: int, target: list[int] | None = None) -> list[int]:
    if target is None:
        target = []
    target.append(item)
    return target

On Python 3.9 and earlier, write Optional[List[int]] from typing, or add from __future__ import annotations to use the | syntax anyway.

Part 5 · Variations and near-misses

Three things worth knowing

`or []` is shorter and subtly different

target = target or [] is a common abbreviation. It fixes the sharing, but it also replaces any falsy argument the caller passed — including an empty list they expected you to fill:

or_trap.py

def with_or(item, target=None):
    target = target or []          # empty list from the caller gets discarded
    target.append(item)
    return target

def with_is_none(item, target=None):
    target = [] if target is None else target
    target.append(item)
    return target

mine = []
with_or("x", mine)
print("or      →", mine)

mine = []
with_is_none("x", mine)
print("is None →", mine)

$ python or_trap.py

or      → []            ← the caller's list was never touched
is None → ['x']

The or version quietly appended to a throwaway list and the caller's stayed empty. Same class of confusion for 0, "" and False. Prefer is None: it tests for exactly the thing you meant.

When `None` is itself a valid value

Sometimes None is meaningful data and you still need to distinguish "not supplied". Use a private sentinel object:

sentinel object

_MISSING = object()

def fetch(key, default=_MISSING):
    if default is _MISSING:
        raise KeyError(key)     # no default supplied
    return default              # default supplied — possibly None

print(fetch("k", None))         # None
fetch("k")                      # KeyError: 'k'

This is how dict.pop and friends behave, and why they can tell those two cases apart.

Dataclasses refuse to let you do it

@dataclass knows about this bug and raises at class-creation time rather than letting you ship it:

bad_dc.py

from dataclasses import dataclass

@dataclass
class Bad:
    items: list = []

$ python bad_dc.py

ValueError: mutable default <class 'list'> for field items is not allowed: use default_factory

Note that this fires at import, before any instance exists — the decorator inspects the defaults while building the class. The sanctioned form is a factory, called once per instance:

good_dc.py

from dataclasses import dataclass, field

@dataclass
class Good:
    items: list = field(default_factory=list)

g1, g2 = Good(), Good()
g1.items.append("apple")
print(g1.items, g2.items, g1.items is g2.items)

$ python good_dc.py

['apple'] [] False

default_factory is the "call it per invocation" idea made explicit — the same thing the None check does by hand.

When the sharing is deliberate

A mutable default occasionally is the point — a memo dict that must persist across calls. It works, but it hides a piece of global state in a signature where no reader expects it. Use a module-level variable or functools.cache instead; both announce themselves.

Part 6 · What is safe, what is not

A quick classification

The rule follows directly from the mechanism: the stored object is only a hazard if something can mutate it.

Default	Safe?	Because
`None`, `0`, `""`, `True`	yes	Immutable; cannot be changed in place
`()`, `frozenset()`	yes	Immutable containers
A module-level constant	usually	Safe if genuinely immutable — a shared `CONFIG` dict is not
`[]`, `{}`, `set()`	no	One object, mutated in place by every caller
A custom class instance	no	One instance shared across calls
`datetime.now()`	no	Not mutation — it freezes at import time

That last row is a different bug wearing the same clothes, and it is worth seeing on its own:

frozen at import, not "now"

import time
from datetime import datetime

def stamped(when=datetime.now()):
    return when

t1 = stamped()
time.sleep(0.05)
t2 = stamped()
print(t1 == t2)          # True — both are the moment the module was imported

A long-running service will hand out the same timestamp for days. The test suite will not notice, because it imports and calls within the same instant.

Part 7 · Letting a linter catch it

B006 and B008

You should never be finding this by reading. flake8-bugbear — bundled into Ruff as the B rules — catches both variants:

Rule	Catches	Example
`B006` mutable-argument-default	A mutable literal or constructor as a default — the bug in this post	`def f(x=[])`
`B008` function-call-in-default-argument	Any function call in a default, evaluated once at import	`def f(t=datetime.now())`

pyproject.toml

[tool.ruff.lint]
extend-select = ["B"]      # all of flake8-bugbear, B006 and B008 included

$ ruff check --select B006,B008 --output-format concise .

app.py:4:28: B006 Do not use mutable data structures for argument defaults
app.py:9:21: B006 Do not use mutable data structures for argument defaults
app.py:13:18: B008 Do not perform function call `datetime.now` in argument defaults;
              instead, perform the call within the function, or read the default from
              a module-level singleton variable
app.py:18:30: B006 Do not use mutable data structures for argument defaults
Found 4 errors.

The two rules get conflated

target=[] is B006, not B008. The advice "perform the call within the function" belongs to B008's message — apt for datetime.now(), but a list literal is not a call. The remedy happens to be the same shape in both cases: move the work inside the body.

B008 has real exceptions

Some frameworks give a call in the default an actual meaning — FastAPI's Depends() is the standard example. Ruff's extend-immutable-calls exists for these; list the specific callable rather than disabling the rule.

Wrapping up

Cheat sheet

Instead of	Write
`def f(x=[])`	`def f(x=None)` then `if x is None: x = []`
`def f(x={})`	`def f(x=None)` then `x = {} if x is None else x`
`def f(t=datetime.now())`	`def f(t=None)` then `t = t or datetime.now()`
`x = x or []`	`x = [] if x is None else x` — unless discarding falsy input is intended
`items: list = []` in a dataclass	`items: list = field(default_factory=list)`
A memo dict in the signature	A module-level variable, or `functools.cache`

The short list

Defaults are evaluated once, at def time. The object is stored on the function and reused forever; [] means one list, not a fresh one per call.
You can watch it happen in f.__defaults__ — the list grows in place.
The damage scales in constructors. A default items=[] in __init__ is shared by every instance the process ever creates.
Default to None and build inside the body. Test with is None, not or, so a caller's empty list is still theirs.
Enable B in Ruff. B006 and B008 find every instance in seconds, and neither bug ever announces itself at runtime.

7/28/2026

The loop variable that changed: late binding and default arguments in Python

Python · Language

You define a function inside a loop, collect the functions, call them later — and every single one returns the last value. The usual fix is a strange-looking line where the same name appears twice. Here is what is actually going on, and why the fix works.

This bug has a particular texture. The code reads correctly, each function looks like it captured its own value, and nothing raises. You only find out when the results are all identical — and if the loop happens to have one iteration during development, you never find out at all.

The explanation is one sentence long: a closure captures the variable, not the value. Everything else in this post is unpacking what that means and what to do about it.

Every snippet was run on CPython 3.10 before publishing, and the output shown is copied from those runs.

The bug — three functions, one value
Why it happens — variables, cells, and Python's missing block scope
Anatomy of the fix — reading def f(x: T = x) as three parts
Defaults are evaluated once — the rule that makes the fix work
The same rule, as a trap — mutable default arguments
Other ways to capture — partial, factories, and which to prefer
Where it bites — threads, callbacks, handlers, tasks
What the type hint does — and when it is evaluated
Scoping rules, briefly — LEGB, nonlocal, comprehensions
Letting a linter catch it — B023 and B008
Cheat sheet

Part 1 · The bug

Three functions, one value

Build a list of functions in a loop. Each one returns the loop variable.

late.py

fns = []
for i in range(3):
    def show():
        return i
    fns.append(show)

print([f() for f in fns])

$ python late.py

[2, 2, 2]

Not [0, 1, 2]. All three functions return 2, the value i had when the loop finished. The functions were created at three different times, but they were never looking at three different values — they were all looking at the same i.

Why it survives review

Nothing about the code is unusual, and the failure is silent. It also depends on when you call: if you call show() inside the loop it returns the right thing, because i still holds that value at that moment. Deferring the call is what exposes it.

Part 2 · Why it happens

Variables, cells, and the missing block scope

A closure is a function plus a reference to the enclosing scope's variables. Not copies of their values — references to the variables themselves. Python stores each captured variable in a cell, and every closure over that variable shares the one cell.

You can look at the cell directly:

the cell is shared and mutable

def make():
    n = 0
    def inc():
        nonlocal n
        n += 1
        return n
    return inc

c = make()
print(c(), c(), c())              # 1 2 3
print(c.__closure__[0].cell_contents)   # 3

The three calls returned different numbers because they all read and wrote the same cell. That is the whole point of a closure — and it is exactly what goes wrong in the loop, because the loop variable is also just a variable in the enclosing scope.

There is a second ingredient. In most C-like languages, for (int i = ...) creates a new i scoped to the loop body. Python has no block scope: if, for and while do not create scopes. Only modules, functions and classes do. So there is exactly one i, it belongs to the enclosing function, and it outlives the loop:

the loop variable leaks

for i in range(3):
    pass
print(i)          # 2 — still in scope, still holding the last value

Put the two facts together and the behaviour is inevitable:

The loop rebinds one variable three times.
All three closures reference that one variable.
Calling them afterwards reads it once the loop has finished, so all three see the final value.

"Late binding" names the timing

The name lookup happens when the function runs, not when it is defined. That is the general rule for every free variable in Python, and it is usually what you want — it is why you can define a function that calls a helper declared later in the file.

Part 3 · Anatomy of the fix

Reading `def f(x: T = x)` as three parts

The fix is to bind the value into the function's own signature. In a typed codebase it ends up looking like this, which is where the confusion starts:

the idiom

def invoke(kernel: Kernel = kernel) -> None:
    ...

Three independent things are packed into kernel: Kernel = kernel:

def invoke(kernel: Kernel = kernel) -> None:
           ~~~~~~~~~~~~~~ ~~~~~~~~
           │     │        │
           │     │        └─ default value: the OUTER variable,
           │     │           evaluated now, at def time
           │     └─ type hint: the Kernel class
           └─ parameter name: a NEW local, created per call

The name appears twice because two different things happen to share a spelling. They live in different scopes and are resolved at different times:

	Left `kernel`	Right `kernel`
What it is	A parameter being declared	An expression being evaluated
Which scope	Local to `invoke`	The enclosing scope — the loop variable
When	On every call to `invoke`	Once, when the `def` statement runs
Effect	Shadows the outer name inside the body	Snapshots the current value

That last row is the mechanism. Because the right-hand side is evaluated while the loop is on that iteration, the value is captured then and stored on the function object. And because the left-hand side introduces a local with the same name, the body reads the captured parameter instead of the outer variable — the closure is gone.

fixed.py

fns = []
for i in range(3):
    def show(i=i):     # left i = new parameter; right i = this iteration's value
        return i
    fns.append(show)

print([f() for f in fns])

$ python fixed.py

[0, 1, 2]

You can see the snapshot on the function object. Rebinding the outer name afterwards changes nothing:

the value is stored, not looked up

x = 10
def f(a=x):
    return a

x = 999
print(f.__defaults__, f(), x)

output

(10,) 10 999

It changes the public signature

show(i=i) is an honest fix but not a free one: show now accepts an argument, and a caller can override the captured value. For an internal callback that is harmless. For a function you hand to someone else, prefer a factory (below) so the capture is not part of the API.

Part 4 · Defaults are evaluated once

The rule underneath

Default argument expressions are evaluated once, when the def statement executes — not on each call. This single rule explains both the fix above and the trap below.

Prove it with a default that announces itself:

once.py

def stamp(value=print("  (default evaluated)")):
    return value

print("  function defined, not called yet")
stamp()
stamp()

$ python once.py

  (default evaluated)
  function defined, not called yet

The message appears before "function defined", and appears only once despite two calls. The default was computed while Python was building the function object, and the result was stored on it.

So def show(i=i) is not a clever trick, it is the ordinary rule applied deliberately: you are asking for an expression to be evaluated now and remembered.

Part 5 · The same rule, as a trap

Mutable default arguments

Evaluated once means the default object is created once and reused by every call. If it is mutable, every call shares it — and the sharing persists for the lifetime of the function.

mutable.py

def append_bad(item, bucket=[]):
    bucket.append(item)
    return bucket

print(append_bad("a"))
print(append_bad("b"))
print(append_bad.__defaults__)

$ python mutable.py

['a']
['a', 'b']
(['a', 'b'],)

The second call did not start from an empty list. The [] in the signature is one list object, created once, now permanently attached to the function — you can see it grow inside __defaults__.

The fix is the None sentinel. Take the default as None and build the real thing inside:

avoid

def f(bucket=[]):
    bucket.append(1)
    return bucket

def g(cache={}):
    ...

def h(when=datetime.now()):
    ...       # frozen at import time!

prefer

def f(bucket=None):
    if bucket is None:
        bucket = []
    bucket.append(1)
    return bucket

def g(cache=None):
    cache = {} if cache is None else cache

def h(when=None):
    when = when or datetime.now()

The subtlest version of this bug

def h(when=datetime.now()) looks like "default to now". It means "default to the moment this module was imported". A long-running process will hand out the same timestamp for days, and the test suite will pass because it imports and calls within the same second.

When sharing is the point

A mutable default is occasionally deliberate — a memo dictionary that must persist across calls. It works, but it is invisible to the reader. Prefer an explicit module-level variable or functools.cache, both of which say what they mean.

Part 6 · Other ways to capture

partial, factories, and which to prefer

A factory function

Give the value a scope of its own by passing it as a parameter to an outer function. The inner closure then captures that parameter, and each call to the factory creates a fresh one.

factory

def make_show(value):
    def show():
        return value       # captures the parameter, not the loop variable
    return show

fns = [make_show(i) for i in range(3)]
print([f() for f in fns])          # [0, 1, 2]

This is the version to reach for in library code: the signature of show stays clean, and the capture is explicit rather than a signature side effect.

`functools.partial`

partial

from functools import partial

def show(value):
    return value

fns = [partial(show, i) for i in range(3)]
print([f() for f in fns])          # [0, 1, 2]

partial binds arguments immediately and stores them on the resulting object, so there is no closure to go stale. It is the cleanest option when the function already exists and takes the value as a parameter.

Comparison

Technique	Good	Cost	Use when
`def f(x=x)`	One token; no restructuring	Adds a parameter callers can override	Local callbacks, quick lambdas
Factory function	Signature stays clean; intent explicit	An extra function	Library code, anything exported
`partial`	No new scope; composes well	Function must accept the value	The callable already exists
Bind to an instance attribute	Natural when state is already an object	A class	Several values travel together

Part 7 · Where it bites

Anything that defers the call

The pattern is always the same: a callable is created in a loop and invoked later. Threads make it vivid because the fix and the bug differ by six characters.

threads.py

import threading

results = []
threads = [
    threading.Thread(target=lambda: results.append(name))
    for name in ("a", "b", "c")
]
for t in threads: t.start()
for t in threads: t.join()
print("buggy:", sorted(results))

results = []
threads = [
    threading.Thread(target=lambda name=name: results.append(name))
    for name in ("a", "b", "c")
]
for t in threads: t.start()
for t in threads: t.join()
print("fixed:", sorted(results))

$ python threads.py

buggy: ['c', 'c', 'c']
fixed: ['a', 'b', 'c']

The same shape shows up in:

Event handlers. Buttons built in a loop that all act on the last item.
Async tasks. asyncio.create_task over a coroutine closing on a loop variable.
Handler dictionaries. {name: lambda: dispatch(name) for name in names} — the dict comprehension has its own scope, but the lambdas still close over its variable.
Retry and timing wrappers. A benchmark harness that builds one thunk per case and runs them afterwards ends up measuring the last case repeatedly.
Deferred logging. log.debug(lambda: f"{item}") style lazy messages.

The measurement version is the nastiest

When the deferred callables are benchmarks, nothing crashes and every number looks plausible — you simply publish the same case's timing under several names. There is no error to notice, only a conclusion that is wrong.

Part 8 · What the type hint does

The middle part, and when it runs

In kernel: Kernel = kernel, the annotation is the only part with no effect on behaviour. Python does not check it, coerce with it, or consult it when binding arguments. It is metadata for readers and for tools such as mypy or pyright.

But it is an expression, and by default it is evaluated at definition time:

annotations are evaluated eagerly

def side_effect():
    print("  (annotation evaluated)")
    return int

def g(a: side_effect() = 1):
    return a

print(g.__annotations__)

output

  (annotation evaluated)
{'a': <class 'int'>}

Add from __future__ import annotations and they are kept as strings instead, never evaluated unless something asks for them:

output with the future import

{'a': 'side_effect()'}          # note: no "(annotation evaluated)" line
(1,)                            # __defaults__ — still evaluated eagerly

Two things worth carrying away. First, deferring annotations lets you reference a class before it is defined, which is why the future import is common in typed code. Second, it changes nothing about defaults — those are always evaluated immediately, which is exactly what the capture idiom relies on.

Hints do not narrow the capture

Writing kernel: Kernel = kernel rather than kernel=kernel documents the parameter and gives a type checker something to verify. The value that gets captured, and when, is identical either way.

Part 9 · Scoping rules, briefly

LEGB, nonlocal, and comprehensions

Python resolves a name by searching four scopes in order:

Scope	Is
Local	Names assigned in the current function
Enclosing	Locals of any lexically enclosing function — this is where closures read from
Global	Module level
Built-in	`len`, `print`, and friends

Assigning to a name makes it local for the whole function, which is why reading it before the assignment raises rather than falling through to an outer scope. To assign to an outer name instead, declare the intent:

nonlocal vs global

count = 0

def outer():
    total = 0
    def bump():
        nonlocal total     # assign to outer()'s local
        global count       # assign to the module-level name
        total += 1
        count += 1
    bump(); bump()
    return total

print(outer(), count)      # 2 2

Comprehensions are the one construct that looks like a loop but is not: each has its own scope, so its variable does not leak. That is a small blessing and a common source of surprise:

comprehension scope

squares = [j * j for j in range(3)]
print(squares)             # [0, 1, 4]
print(j)                   # NameError: name 'j' is not defined

But it does not save you

The comprehension's variable is scoped, yet a lambda created inside still closes over it and still reads it late. [lambda: j for j in range(3)] gives three functions that all return 2. Scoping the variable is not the same as snapshotting it.

Part 10 · Letting a linter catch it

B023 and B008

You should not have to spot this by eye. flake8-bugbear and Ruff both ship the checks; enable them once and the class of bug stops reaching review.

Rule	Catches
`B023` function-uses-loop-variable	A function defined in a loop that references the loop variable — the bug in part 1
`B006` mutable-argument-default	`def f(x=[])` and friends — the trap in part 5
`B008` function-call-in-default-argument	`def f(when=datetime.now())` — a call evaluated once at import

pyproject.toml

[tool.ruff.lint]
extend-select = ["B"]      # all of flake8-bugbear, including B006/B008/B023

what it looks like

$ ruff check --select B023,B006,B008 --output-format concise .
b008.py:3:12: B008 Do not perform function call `datetime.now` in argument defaults;
              instead, perform the call within the function, or read the default from
              a module-level singleton variable
late.py:4:16: B023 Function definition does not bind loop variable `i`
mutable.py:1:29: B006 Do not use mutable data structures for argument defaults
Found 3 errors.

B008 has legitimate exceptions

Some frameworks make a call in the default meaningful — FastAPI's Depends() is the usual example. Ruff has extend-immutable-calls for exactly this; add the specific callable rather than switching the rule off.

Wrapping up

Cheat sheet

Expression	Evaluated	Consequence
Function body	On every call	Free variables are read late
Default argument	Once, at `def`	Snapshots values; shares mutables
Annotation	Once, at `def` — or never, with the future import	No runtime effect either way
Decorator	Once, at `def`	Wrapping happens at definition time

The short list

A closure captures the variable, not the value. If a function outlives the loop that made it, it will see the loop's final state.
def f(x=x) works because defaults are evaluated once, at definition. Left is a new local, right is the outer variable read right now. Same word, different things.
That same rule is the mutable-default trap. One [], created once, shared by every call. Use None and build inside.
Turn on bugbear (B) in Ruff. B023, B006 and B008 cover all three failure modes above, and none of them announce themselves at runtime.

pytest, from your first test to your own plugin

Python · Testing

Most pytest tutorials stop right after assert 1 + 1 == 2, which is roughly where the interesting part begins. This one keeps going — through fixtures, parametrization and the configuration that makes a suite pleasant to live with, and out the other side into hooks and plugins.

There is a moment in most Python projects where the test suite stops helping. It takes four minutes to run, three tests fail intermittently, and nobody remembers what conftest.py is for. The tool is rarely the problem — pytest has an answer for each of those, and the answers are small.

So this is written as a path rather than a reference. Part 1 gets you productive with nothing but functions and assert. Parts 2 and 3 cover fixtures and parametrization, which is where pytest stops being "unittest with less typing" and starts being a different tool. Part 4 is the boring, high-leverage material: layout, configuration, mocking, CI. Part 5 opens the hood.

Every runnable snippet here was executed against pytest 9.1.1 before publishing, and the terminal output is copied from real runs. Anything version-sensitive is flagged inline.

Basics — why pytest, then the mechanics you use every day
Fixtures — the feature that makes pytest a different tool
Parametrizing — one function, many independently reported cases
Practical — the unglamorous, high-leverage material
Advanced — under the hood, and how to extend it
Reference — what to avoid, and a one-screen summary

Part 1 · Basics

Why pytest

Python ships with unittest, a port of Java's JUnit. It works, but it makes you inherit from a base class, remember a different assert method for every comparison, and wrap setup logic in setUp methods that run for every test whether you need them or not.

pytest replaces all of that with plain functions and the plain assert keyword.

unittest

import unittest

class TestMath(unittest.TestCase):
    def setUp(self):
        self.values = [1, 2, 3]

    def test_sum(self):
        self.assertEqual(sum(self.values), 6)

    def test_membership(self):
        self.assertIn(2, self.values)
        self.assertNotIn(9, self.values)

pytest

import pytest

@pytest.fixture
def values():
    return [1, 2, 3]

def test_sum(values):
    assert sum(values) == 6

def test_membership(values):
    assert 2 in values
    assert 9 not in values

The pytest version has no class, no inheritance, and no assertion vocabulary to memorise. It is also strictly more capable: values is a fixture, and unlike setUp it only runs for tests that actually ask for it.

pytest also runs unittest test cases as-is, so adopting it never requires a big-bang migration.

Your first test

install

python -m pip install pytest

Create two files side by side.

slugify.py

import re

def slugify(title: str) -> str:
    """Turn a human title into a URL-safe slug."""
    cleaned = re.sub(r"[^\w\s-]", "", title.lower())
    return re.sub(r"[\s_]+", "-", cleaned).strip("-")

test_slugify.py

from slugify import slugify

def test_lowercases_and_joins_words():
    assert slugify("Hello World") == "hello-world"

def test_strips_punctuation():
    assert slugify("What's new?!") == "whats-new"

def test_collapses_whitespace():
    assert slugify("too    many   spaces") == "too-many-spaces"

$ pytest -q

...                                                            [100%]
3 passed in 0.01s

Three dots, three passing tests. There is no registration step, no test suite object, no if __name__ == "__main__". pytest found the file, found the functions, and ran them.

Run it from the right place

pytest inserts the test file's directory (technically its rootdir for that file) into sys.path, which is why from slugify import slugify works here. For real projects use the layout in Project layout instead of relying on this.

How tests are found

Discovery is convention-driven. pytest walks the directories you give it (or the current one) and collects:

Level	Default rule	Config key
Files	`test_.py` or `_test.py`	`python_files`
Classes	`Test*`, and it must not define `__init__`	`python_classes`
Functions	`test*`, at module level or inside a collected class	`python_functions`

Ask pytest what it would run without running anything. This is the single most useful command when a test "isn't running":

dry run

pytest --collect-only -q

Common trap

A test class with an __init__ method is silently skipped — pytest cannot instantiate it, so it warns and moves on. If a whole class mysteriously vanishes, that is almost always why.

The assert statement

pytest rewrites the bytecode of your test modules at import time so that a failing assert reports the value of every subexpression. You get JUnit-grade diagnostics from the plain keyword.

test_report.py

def build_report():
    return {"rows": 41, "status": "ok", "tags": ["a", "b"]}

def test_report_shape():
    report = build_report()
    assert report == {"rows": 42, "status": "ok", "tags": ["a", "c"]}

$ pytest -q

    def test_report_shape():
        report = build_report()
>       assert report == {"rows": 42, "status": "ok", "tags": ["a", "c"]}
E       AssertionError: assert {'rows': 41, ...': ['a', 'b']} == {'rows': 42, ...': ['a', 'c']}
E
E         Omitting 1 identical items, use -vv to show
E         Differing items:
E         {'rows': 41} != {'rows': 42}
E         {'tags': ['a', 'b']} != {'tags': ['a', 'c']}
E         Use -v to get more diff

test_report.py:6: AssertionError
1 failed in 0.02s

pytest knows how to diff dicts, lists, sets, strings and dataclasses. Add -vv when the diff is truncated — it disables the shortening.

Add a message only when it adds information

assert x == y, "x should equal y" is noise; the rewriting already showed you that. A good message supplies context the values cannot: assert resp.ok, f"upstream said {resp.text}".

Exceptions and warnings

Use pytest.raises as a context manager. The test fails if the block does not raise.

test_errors.py

import pytest

def withdraw(balance, amount):
    if amount <= 0:
        raise ValueError(f"amount must be positive, got {amount}")
    if amount > balance:
        raise ValueError("insufficient funds")
    return balance - amount

def test_rejects_negative_amount():
    with pytest.raises(ValueError):
        withdraw(100, -5)

def test_error_message_names_the_amount():
    # match= is a regex applied with re.search against str(exception)
    with pytest.raises(ValueError, match=r"must be positive, got -5"):
        withdraw(100, -5)

def test_inspect_the_exception_object():
    with pytest.raises(ValueError) as excinfo:
        withdraw(100, 500)
    assert "insufficient" in str(excinfo.value)
    assert excinfo.type is ValueError

Assert after the block, not inside it

Code placed after the raising line inside a with pytest.raises(...) block never executes. Anything you want to check about the exception goes after the block using excinfo.value.

Warnings have a mirror-image API:

warnings

import warnings, pytest

def legacy_api():
    warnings.warn("use new_api() instead", DeprecationWarning, stacklevel=2)
    return 42

def test_warns():
    with pytest.warns(DeprecationWarning, match="new_api"):
        assert legacy_api() == 42

Comparing floats

0.1 + 0.2 == 0.3 is False in binary floating point. pytest.approx wraps a value with a tolerance and works on scalars, sequences, dicts and numpy arrays.

approx

from pytest import approx

def test_scalar():
    assert 0.1 + 0.2 == approx(0.3)

def test_explicit_tolerance():
    assert 4.7 == approx(4.7005, abs=1e-3)     # absolute
    assert 1_000_100 == approx(1_000_000, rel=1e-3)  # relative

def test_containers():
    assert [0.1 + 0.2, 1 / 3] == approx([0.3, 0.33333333])
    assert {"loss": 0.1 + 0.2} == approx({"loss": 0.3})

The default is a relative tolerance of 1e-6 with a small absolute floor, which is the right default for most numeric code. Say what you mean when precision actually matters.

Running and selecting tests

Command	What it does
`pytest`	Everything under the current directory
`pytest tests/unit`	One directory
`pytest tests/test_api.py`	One file
`pytest tests/test_api.py::test_login`	One test — this is a node id
`pytest "tests/test_api.py::test_login[admin]"`	One parametrized case
`pytest -k "login and not slow"`	Substring/boolean match on names
`pytest -m integration`	Match a marker
`pytest -x`	Stop at the first failure
`pytest --maxfail=3`	Stop after three failures
`pytest -q` / `-v` / `-vv`	Quieter / one line per test / no truncated diffs
`pytest -rA`	Summary lines for every outcome, including passes
`pytest -s`	Don't capture stdout (see your `print` calls live)
`pytest --lf`	Only the tests that failed last run
`pytest --ff`	Everything, but last run's failures first
`pytest --sw`	Stepwise: stop at a failure, resume there next time

Node ids are copy-pasteable. When a test fails, the id in the output is exactly the argument you need to re-run just that case.

Part 2 · Fixtures

Fixture basics

A fixture is a function that produces something a test needs. Tests request fixtures by naming them as parameters, and pytest supplies them. That is the entire mental model — it is dependency injection with a decorator.

request by name

import pytest

@pytest.fixture
def inventory():
    return {"apple": 3, "pear": 0}

def test_has_apples(inventory):        # ← the parameter name is the fixture name
    assert inventory["apple"] == 3

def test_pears_are_out_of_stock(inventory):
    assert inventory["pear"] == 0

Each test gets a fresh inventory by default, so one test mutating the dict cannot affect another. That isolation is the reason fixtures beat module-level constants.

Fixtures can request other fixtures, forming a graph that pytest resolves for you:

composition

@pytest.fixture
def config():
    return {"currency": "EUR", "vat": 0.21}

@pytest.fixture
def cart(config):                       # fixtures compose
    return {"config": config, "items": []}

def test_cart_uses_config(cart):
    assert cart["config"]["vat"] == 0.21

Setup and teardown

Replace return with yield and everything after the yield becomes teardown. It runs even if the test fails, because pytest wraps it in a finalizer.

yield fixture

import sqlite3
import pytest

@pytest.fixture
def db():
    conn = sqlite3.connect(":memory:")
    conn.execute("CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT)")
    yield conn                          # ← the test runs here
    conn.close()                        # ← always runs afterwards

def test_insert_and_read_back(db):
    db.execute("INSERT INTO users (name) VALUES (?)", ("ada",))
    assert db.execute("SELECT name FROM users").fetchone() == ("ada",)

Setup failures are errors, not failures

If the code before yield raises, pytest reports the test as an error rather than a failure, and the test body never runs. That distinction in the summary line tells you instantly whether your production code or your scaffolding broke.

Scopes

Expensive setup should not repeat per test. scope controls how long one instance is cached and shared.

Scope	Created once per	Typical use
`function` default	Test	Anything mutable
`class`	Test class	Shared object for a group of methods
`module`	File	A parsed file, a compiled artifact
`package`	Directory	Rare; a shared subsystem
`session`	Whole run	Docker container, server process, big download

session scope

@pytest.fixture(scope="session")
def http_server():
    proc = start_server()               # slow: do it once for the whole run
    yield proc
    proc.terminate()

@pytest.fixture
def client(http_server):                # cheap per-test wrapper over the shared server
    return Client(http_server.url)

The scope rule you will hit eventually

A fixture may only depend on fixtures of equal or wider scope. A session fixture requesting a function fixture is an error, because the narrow one would be destroyed while the wide one still holds it.

And never let a widely scoped fixture hand out something mutable. One test appending to a session-scoped list poisons every later test, producing failures that depend on run order.

conftest.py

Fixtures defined in conftest.py are visible to every test in that directory and below, with no import needed. Nested conftest.py files stack, and the closest definition wins.

tree

tests/
├── conftest.py            # fixtures for everything below
├── unit/
│   └── test_parser.py
└── integration/
    ├── conftest.py        # extra fixtures, only for integration tests
    └── test_api.py

tests/conftest.py

import pytest

@pytest.fixture(scope="session")
def sample_data():
    return {"users": ["ada", "grace"], "version": 3}

@pytest.fixture
def frozen_clock(monkeypatch):
    """Pin time.time() so timestamp assertions are deterministic."""
    import time
    monkeypatch.setattr(time, "time", lambda: 1_700_000_000.0)
    return 1_700_000_000.0

conftest.py is more than a fixture bucket

It is also where hooks, custom CLI options and plugin registration live. pytest imports it automatically — it must never be imported by hand, and it needs no __init__.py.

Where does a fixture come from? Ask:

list available fixtures

pytest --fixtures            # every fixture visible here, with docstrings
pytest --fixtures-per-test tests/test_api.py::test_login

Built-in fixtures

These ship with pytest. Reach for them before writing your own.

`tmp_path` — a real, empty directory per test

tmp_path

def test_writes_a_csv(tmp_path):
    target = tmp_path / "out.csv"       # tmp_path is a pathlib.Path
    target.write_text("id,name\n1,ada\n")

    assert target.exists()
    assert target.read_text().splitlines()[0] == "id,name"

pytest keeps the last few runs' directories on disk for post-mortem inspection and cleans up older ones. Use tmp_path_factory for a directory shared across a session.

`monkeypatch` — scoped, auto-undone patching

monkeypatch

def test_reads_token_from_environment(monkeypatch):
    monkeypatch.setenv("API_TOKEN", "secret")
    monkeypatch.delenv("HTTP_PROXY", raising=False)
    assert load_config().token == "secret"

def test_patch_an_attribute(monkeypatch):
    monkeypatch.setattr("myapp.net.fetch", lambda url: {"ok": True})
    assert myapp.summarize("http://x") == "ok"

def test_run_from_another_directory(monkeypatch, tmp_path):
    monkeypatch.chdir(tmp_path)         # undone automatically
    assert Path.cwd() == tmp_path

Every change is reverted when the test ends, including failures. That is the whole point: hand-rolled os.environ[...] = ... leaks into subsequent tests.

`capsys` and `caplog` — captured output

capsys / caplog

import logging

def test_prints_a_banner(capsys):
    print("READY")
    captured = capsys.readouterr()      # consumes the buffer
    assert captured.out == "READY\n"
    assert captured.err == ""

def test_logs_a_warning(caplog):
    with caplog.at_level(logging.WARNING):
        logging.getLogger("app").warning("disk almost full")
    assert "disk almost full" in caplog.text
    assert caplog.records[0].levelno == logging.WARNING

Use capfd instead of capsys when the output comes from a subprocess or a C extension.

`request` — introspection

request

@pytest.fixture
def workspace(request, tmp_path):
    """Name the directory after the test that asked for it."""
    d = tmp_path / request.node.name
    d.mkdir()
    return d

Factory fixtures

When a test needs several of a thing, or needs to choose parameters, return a function instead of a value. Track what you create so teardown stays correct.

factory-as-fixture

import pytest

@pytest.fixture
def make_user(db):
    created = []

    def _make(name, admin=False):
        cur = db.execute(
            "INSERT INTO users (name, admin) VALUES (?, ?)", (name, admin)
        )
        created.append(cur.lastrowid)
        return {"id": cur.lastrowid, "name": name, "admin": admin}

    yield _make

    for user_id in created:             # clean up exactly what this test made
        db.execute("DELETE FROM users WHERE id = ?", (user_id,))

def test_admins_can_see_everyone(make_user):
    admin = make_user("ada", admin=True)
    make_user("grace")
    make_user("alan")
    assert len(visible_users(admin)) == 3

Part 3 · Parametrizing

parametrize

One test function, many cases. Each case is a separate test: it gets its own node id, its own pass/fail, and one failing case does not hide the others.

basic

import pytest

@pytest.mark.parametrize(
    ("title", "expected"),
    [
        ("Hello World", "hello-world"),
        ("What's new?!", "whats-new"),
        ("  padded  ", "padded"),
        ("", ""),
    ],
)
def test_slugify(title, expected):
    assert slugify(title) == expected

$ pytest -v

test_slug.py::test_slugify[Hello World-hello-world] PASSED
test_slug.py::test_slugify[What's new?!-whats-new]   PASSED
test_slug.py::test_slugify[  padded  -padded]        PASSED
test_slug.py::test_slugify[-]                        PASSED

Stacking decorators produces the cartesian product:

3 × 2 = 6 tests

@pytest.mark.parametrize("codec", ["gzip", "zstd", "none"])
@pytest.mark.parametrize("size", [0, 1_000_000])
def test_roundtrip(codec, size):
    payload = b"x" * size
    assert decompress(compress(payload, codec), codec) == payload

Combinatorics grow fast

Stacking is convenient and dangerous. Three stacked decorators of five values each is 125 tests. Prefer an explicit list of meaningful tuples once the product stops being all-interesting.

Readable IDs

Auto-generated ids come from the values, which is unreadable for objects and awkward for long strings. Name your cases — the id is what you will read in CI output and paste on the command line.

ids=

@pytest.mark.parametrize(
    ("payload", "expected"),
    [
        ({"v": 1}, True),
        ({"v": 1, "extra": None}, True),
        ({}, False),
    ],
    ids=["minimal", "with-nulls", "empty"],
)
def test_validate(payload, expected):
    assert validate(payload) is expected

A dict keeps the name and the data in one place, which stops them drifting apart:

dict-driven cases

CASES = {
    "empty-input":     ("",        []),
    "single-token":    ("a",       ["a"]),
    "trailing-comma":  ("a,b,",    ["a", "b"]),
    "quoted-comma":    ('"a,b",c', ["a,b", "c"]),
}

@pytest.mark.parametrize(
    ("raw", "expected"), list(CASES.values()), ids=list(CASES.keys())
)
def test_parse_csv_line(raw, expected):
    assert parse_csv_line(raw) == expected

Per-case marks

pytest.param attaches a mark or an id to a single case, so one known-broken input does not force you to skip the whole function.

pytest.param

@pytest.mark.parametrize(
    ("raw", "expected"),
    [
        ("2024-01-01", date(2024, 1, 1)),
        pytest.param("01/01/2024", date(2024, 1, 1), id="us-format"),
        pytest.param(
            "2024-W01-1", date(2024, 1, 1),
            marks=pytest.mark.xfail(reason="ISO week dates not supported yet"),
        ),
        pytest.param(
            "yesterday", date(2024, 1, 1),
            marks=pytest.mark.skipif(sys.platform == "win32", reason="needs GNU date"),
        ),
    ],
)
def test_parse_date(raw, expected):
    assert parse_date(raw) == expected

Parametrized fixtures

Put params on the fixture and every test using it runs once per value. This is how you sweep a whole suite across backends without touching a single test body.

fixture params

@pytest.fixture(params=["sqlite", "postgres"], ids=["sqlite", "pg"])
def store(request):
    backend = request.param             # ← the current value
    s = open_store(backend)
    yield s
    s.close()

# both of these now run twice, once per backend
def test_put_then_get(store):
    store.put("k", "v")
    assert store.get("k") == "v"

def test_missing_key_returns_none(store):
    assert store.get("nope") is None

Indirect parametrization

indirect=True routes the parameter through a fixture rather than into the test. Use it when the value needs setup before the test can see it.

indirect

@pytest.fixture
def user(request):
    role = request.param                # comes from parametrize, not from the test
    return create_user(role=role)

@pytest.mark.parametrize("user", ["admin", "viewer"], indirect=True)
def test_dashboard_access(user):
    assert user.can_open_dashboard() is (user.role == "admin")

Mix direct and indirect by naming which arguments are indirect:

partial indirect

@pytest.mark.parametrize(
    ("user", "expected"),
    [("admin", True), ("viewer", False)],
    indirect=["user"],                  # only `user` goes through the fixture
)
def test_dashboard_access(user, expected):
    assert user.can_open_dashboard() is expected

Part 4 · Practical

Project layout

The src layout is the one that avoids import surprises: your package is not importable from the repository root, so tests are forced to import the installed package — the same thing your users get.

recommended

myproject/
├── pyproject.toml
├── src/
│   └── myproject/
│       ├── __init__.py
│       └── core.py
└── tests/
    ├── conftest.py
    ├── unit/
    │   └── test_core.py
    └── integration/
        └── test_end_to_end.py

install once, then test

python -m pip install -e ".[dev]"
pytest

Do not put __init__.py in tests/

Without it, test modules are imported as top-level modules — which means two files named test_utils.py in different directories will collide. Either give them distinct names, or add __init__.py files and accept package semantics. The distinct-names route is simpler.

Configuration

One config block removes a lot of repeated typing and makes local runs match CI. pyproject.toml is the modern home; pytest.ini and setup.cfg still work.

pyproject.toml

[tool.pytest.ini_options]
minversion = "8.0"
testpaths = ["tests"]
addopts = [
    "-ra",                 # summary for all non-passing outcomes
    "--strict-markers",    # unknown @pytest.mark.* becomes an error
    "--strict-config",     # typos in this very section become an error
    "--import-mode=importlib",
]
markers = [
    "slow: takes more than a second",
    "integration: needs external services",
]
filterwarnings = [
    "error",                                   # warnings fail the suite ...
    "ignore::DeprecationWarning:third_party.*" # ... except from code you don't own
]

Option	Why you want it
`--strict-markers`	A typo like `@pytest.mark.slwo` silently does nothing otherwise
`filterwarnings = ["error"]`	Catches deprecations while they are cheap to fix
`testpaths`	Bare `pytest` stops walking `.venv`, `build`, docs
`--import-mode=importlib`	Modern import semantics; avoids `sys.path` surgery

Markers, skip and xfail

Markers are labels. You attach them to tests and select on them later.

marking

import pytest

@pytest.mark.slow
def test_full_reindex():
    ...

@pytest.mark.integration
class TestPaymentGateway:          # applies to every method in the class
    def test_charge(self): ...
    def test_refund(self): ...

pytestmark = pytest.mark.integration   # module-level: applies to the whole file

selecting

pytest -m slow                  # only slow
pytest -m "not slow"            # everything else
pytest -m "integration and not slow"

skip vs xfail

Construct	Meaning	When
`@pytest.mark.skip(reason=…)`	Never run it	Temporarily irrelevant
`@pytest.mark.skipif(cond, reason=…)`	Run only if the condition is false	Platform / version / optional dependency
`pytest.skip(reason)`	Skip from inside the test	Decision needs runtime information
`pytest.importorskip("numpy")`	Skip if the import fails	Optional dependency
`@pytest.mark.xfail(reason=…)`	Run it; a failure is expected	Known bug, with a test that proves it

the useful forms

import sys, pytest

@pytest.mark.skipif(sys.version_info < (3, 12), reason="uses PEP 695 syntax")
def test_new_generics(): ...

def test_needs_a_gpu():
    if not gpu_present():
        pytest.skip("no GPU on this runner")
    ...

pandas = pytest.importorskip("pandas", minversion="2.0")

@pytest.mark.xfail(strict=True, reason="bug #482: rounds half-down")
def test_rounds_half_up():
    assert round_half_up(0.5) == 1

Prefer xfail(strict=True) over skip for known bugs

A strict xfail that starts passing becomes a failure. That is the feature: the day someone fixes the bug, the suite tells you to delete the marker. A skipped test just rots.

Mocking

Two tools, two jobs. monkeypatch replaces an attribute for the duration of a test. unittest.mock additionally records how the replacement was used.

monkeypatch: substitute behaviour

def test_retries_on_timeout(monkeypatch):
    calls = []

    def flaky_get(url):
        calls.append(url)
        if len(calls) < 3:
            raise TimeoutError
        return {"status": "ok"}

    monkeypatch.setattr("myapp.client.get", flaky_get)
    assert fetch_with_retry("http://x")["status"] == "ok"
    assert len(calls) == 3

mock: assert on the interaction

from unittest.mock import patch, MagicMock

def test_sends_exactly_one_email():
    with patch("myapp.mailer.send") as send:
        send.return_value = MagicMock(status=202)
        register_user("ada@example.com")

    send.assert_called_once_with(
        to="ada@example.com", template="welcome"
    )

Patch where it is looked up

If myapp/service.py does from myapp.mailer import send, then patching "myapp.mailer.send" has no effect — service already holds its own reference. Patch "myapp.service.send", the name in the module using it.

Mocking is a smell in proportion to its depth

Mocking a network boundary is good. Mocking five internal collaborators means the test now asserts your implementation's shape rather than its behaviour, and it will break on every refactor while catching no bugs. Consider a fake object or a real in-memory implementation instead.

Output and logging

pytest captures stdout, stderr and logging, then prints them only for failing tests. That is why your print statements seem to vanish on success.

Flag	Effect
`-s`	Disable capture entirely — output streams live
`--capture=no`	Same as `-s`
`--log-cli-level=DEBUG`	Stream log records to the terminal as they happen
`--show-capture=no`	Hide captured output even on failures

Coverage

pytest-cov

python -m pip install pytest-cov

pytest --cov=myproject --cov-report=term-missing
pytest --cov=myproject --cov-report=html      # then open htmlcov/index.html
pytest --cov=myproject --cov-fail-under=85    # gate in CI

Coverage measures execution, not verification

A test that calls a function and asserts nothing still counts as full coverage of it. Treat the number as a way to find untested code, never as evidence that tested code is correct. Branch coverage (--cov-branch) is a strictly better signal than line coverage.

Speed

find and fix slowness

pytest --durations=10          # the 10 slowest tests, with setup/teardown split out

python -m pip install pytest-xdist
pytest -n auto                 # one worker per CPU
pytest -n 4 --dist loadfile    # keep each file on a single worker

Parallelism exposes hidden coupling: tests that only pass in a particular order will start failing. That is a bug being surfaced, not a bug being introduced.

Debugging a failure

the loop

pytest -x -q                   # 1. stop at the first failure
pytest --lf -x                 # 2. iterate on just that one
pytest --lf -x --pdb           # 3. drop into a debugger at the failure point
pytest --lf -x -vv --tb=long   # 4. full diffs and full traceback
pytest -q                      # 5. confirm the whole suite is green again

Flag	Traceback style
`--tb=long`	Every frame, fully expanded default
`--tb=short`	One line per frame — best for CI logs
`--tb=line`	One line per failure
`--tb=no`	Names only

breakpoint() works inside tests as long as capture is off (-s), and --pdb opens a debugger at the moment of failure with the whole frame still alive — usually faster than guessing where to put the breakpoint.

CI recipe

GitHub Actions

name: tests
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        python-version: ["3.10", "3.11", "3.12", "3.13"]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
          cache: pip
      - run: python -m pip install -e ".[dev]"
      - run: pytest -ra --tb=short --durations=10 --cov --cov-report=xml -n auto

fail-fast: false matters: without it, one failing Python version cancels the others and hides whether the problem is version-specific.

Part 5 · Advanced

Hooks

pytest is a plugin system with a test runner attached. Every phase — collection, setup, running, reporting — publishes hooks, and conftest.py is a plugin that is loaded automatically. Implement a hook by defining a function with the right name.

conftest.py — auto-mark by location

def pytest_collection_modifyitems(config, items):
    """Everything under tests/integration/ gets @pytest.mark.integration."""
    import pytest
    for item in items:
        if "/integration/" in str(item.path):
            item.add_marker(pytest.mark.integration)

conftest.py — expose the outcome to fixtures

import pytest

@pytest.hookimpl(hookwrapper=True)
def pytest_runtest_makereport(item, call):
    outcome = yield
    report = outcome.get_result()
    # stash "rep_setup" / "rep_call" / "rep_teardown" on the test item
    setattr(item, f"rep_{report.when}", report)

@pytest.fixture
def browser(request):
    driver = start_browser()
    yield driver
    if getattr(request.node, "rep_call", None) and request.node.rep_call.failed:
        driver.save_screenshot(f"/tmp/{request.node.name}.png")
    driver.quit()

That pattern — capture an artifact only when the test failed — is the standard answer for browser screenshots, server logs and database dumps.

Hook	Fires
`pytest_addoption`	Once at startup, to register CLI flags
`pytest_configure`	After config is read; register markers here
`pytest_collection_modifyitems`	After collection — reorder, filter, mark
`pytest_generate_tests`	Per test function, to generate parameters
`pytest_runtest_setup`	Before each test
`pytest_runtest_makereport`	After each phase, with the result
`pytest_sessionfinish`	Once, at the very end

Custom CLI options

conftest.py

import pytest

def pytest_addoption(parser):
    parser.addoption(
        "--runslow", action="store_true", default=False,
        help="run tests marked slow",
    )
    parser.addoption(
        "--env", action="store", default="staging",
        choices=("staging", "prod"), help="target environment",
    )

def pytest_configure(config):
    config.addinivalue_line("markers", "slow: takes more than a second")

def pytest_collection_modifyitems(config, items):
    if config.getoption("--runslow"):
        return
    skip = pytest.mark.skip(reason="needs --runslow")
    for item in items:
        if "slow" in item.keywords:
            item.add_marker(skip)

@pytest.fixture(scope="session")
def env(request):
    return request.config.getoption("--env")

usage

pytest                      # slow tests are skipped
pytest --runslow            # everything
pytest --env prod

pytest_generate_tests

When the set of cases is only known at runtime — read from a directory, a manifest, a database — generate them in this hook. Each generated case is still a first-class test.

golden files as test cases

# conftest.py
from pathlib import Path

CASES = Path(__file__).parent / "cases"

def pytest_generate_tests(metafunc):
    if "case_file" in metafunc.fixturenames:
        files = sorted(CASES.glob("*.input"))
        metafunc.parametrize(
            "case_file", files, ids=[f.stem for f in files]
        )

# test_render.py
def test_render_matches_golden(case_file):
    expected = case_file.with_suffix(".expected").read_text()
    assert render(case_file.read_text()) == expected

Dropping a new pair of files into cases/ adds a test. No code change, and the id is the file name so failures point straight at the data.

Writing a plugin

A plugin is a module of hooks and fixtures. Once it lives in a package with the pytest11 entry point, installing the package is enough — no imports, no conftest.py.

pytest_timer/plugin.py

import time
import pytest

@pytest.fixture
def timer():
    """Yield an object whose .elapsed is the time spent inside the block."""
    class _Timer:
        start = time.perf_counter()
        @property
        def elapsed(self):
            return time.perf_counter() - self.start
    return _Timer()

def pytest_addoption(parser):
    parser.addoption("--warn-slower-than", type=float, default=None)

@pytest.hookimpl(hookwrapper=True)
def pytest_runtest_call(item):
    started = time.perf_counter()
    yield
    limit = item.config.getoption("--warn-slower-than")
    took = time.perf_counter() - started
    if limit is not None and took > limit:
        item.warn(pytest.PytestWarning(f"{item.name} took {took:.2f}s"))

pyproject.toml

[project.entry-points.pytest11]
timer = "pytest_timer.plugin"

Test your plugin with pytest's own pytester fixture, which runs a nested pytest session in a temporary directory:

pytester

pytest_plugins = ["pytester"]

def test_warns_about_slow_tests(pytester):
    pytester.makepyfile("""
        import time
        def test_slow():
            time.sleep(0.05)
    """)
    result = pytester.runpytest("--warn-slower-than=0.01")
    result.assert_outcomes(passed=1, warnings=1)

Custom assertion helpers

Shared assertion functions clutter tracebacks with frames nobody cares about. Set __tracebackhide__ and the failure points at the caller instead.

__tracebackhide__

import pytest

def assert_valid_invoice(invoice):
    __tracebackhide__ = True            # hide this frame in the report
    if invoice.total < 0:
        pytest.fail(f"negative total: {invoice.total}")
    if invoice.lines and invoice.total != sum(l.amount for l in invoice.lines):
        pytest.fail(
            f"total {invoice.total} != sum of lines "
            f"{sum(l.amount for l in invoice.lines)}"
        )

def test_generated_invoice(order):
    assert_valid_invoice(build_invoice(order))   # ← failure is reported here

To keep pytest's value-introspection inside a helper module, register it for rewriting before it is imported:

conftest.py

import pytest
pytest.register_assert_rewrite("mypackage.testing.helpers")

Why plain `assert` works

Python's assert throws away everything but the boolean. pytest installs an import hook that rewrites the AST of test modules before they are compiled, replacing each assert with code that stores intermediate values and builds an explanation on failure.

conceptually

# what you write
assert compute(x) == expected

# roughly what gets compiled
@py_left  = compute(x)
@py_right = expected
if not (@py_left == @py_right):
    raise AssertionError(explain("==", @py_left, @py_right))

Three consequences worth knowing:

Rewriting applies to test modules, conftest.py, and modules you explicitly register. A plain library module keeps bare AssertionErrors with no detail.
Running Python with -O strips assert statements entirely, so never run a suite optimised.
Stale __pycache__ from a non-pytest run can shadow rewritten bytecode; pytest --cache-clear or deleting the caches fixes the "my assertions lost their detail" mystery.

Async tests

pytest cannot await a coroutine on its own — an async def test without a plugin is skipped with a warning, which looks like a pass in a hurry. Install one of the two plugins.

pytest-asyncio

# pyproject.toml
# [tool.pytest.ini_options]
# asyncio_mode = "auto"        # no per-test marker needed

import asyncio, pytest

@pytest.fixture
async def client():
    c = await open_client()
    yield c
    await c.aclose()

async def test_fetches_concurrently(client):
    a, b = await asyncio.gather(client.get("/a"), client.get("/b"))
    assert a.status == b.status == 200

anyio is the alternative when you need the same tests to run on both asyncio and trio.

Property-based testing

Instead of listing examples, describe the shape of valid input and assert a property that must hold for all of them. Hypothesis generates cases, and on failure shrinks them to the smallest reproducer.

hypothesis

from hypothesis import given, strategies as st

@given(st.text())
def test_slugify_is_idempotent(s):
    once = slugify(s)
    assert slugify(once) == once

@given(st.lists(st.integers()))
def test_sort_is_a_permutation(xs):
    out = my_sort(xs)
    assert len(out) == len(xs)
    assert sorted(out) == sorted(xs)
    assert all(a <= b for a, b in zip(out, out[1:]))

It composes with pytest normally — @given functions are collected like any other test. Good properties are round-trips (decode(encode(x)) == x), invariants (length, ordering) and equivalence to a slow reference implementation.

Don't mix @given with function-scoped fixtures

Hypothesis calls the test body many times, but a function-scoped fixture is created once for the whole thing. Shared mutable state leaks between generated examples. Build what you need inside the test, or use a factory fixture.

Flaky tests and isolation

A flaky test is worse than no test: it trains people to re-run CI instead of reading it. The usual causes are few, and each has a direct fix.

Cause	Fix
Shared mutable state across tests	Narrow the fixture scope; return copies
Order dependence	`pytest -p no:randomly` to confirm, then fix the coupling
Real clock / real timezone	Freeze time (`monkeypatch`, `freezegun`, `time-machine`)
Unseeded randomness	Seed it, and log the seed
`sleep()`-based waiting	Poll for the condition with a timeout
Dict/set iteration order assumptions	Compare sets, or sort before comparing
Leftover files or env vars	`tmp_path` and `monkeypatch`, never manual mutation

prove isolation

python -m pip install pytest-randomly
pytest                       # shuffles order every run, prints the seed
pytest -p no:randomly        # reproduce the deterministic order
pytest -p randomly --randomly-seed=12345   # reproduce a specific shuffle

Retry plugins are anaesthetic, not treatment

pytest-rerunfailures can keep a pipeline moving, but a test that passes on the second attempt is telling you something real about your system. Retry only at genuinely non-deterministic boundaries (a live network), and never on unit tests.

Reference

Anti-patterns

Logic in the test

A test with branching needs its own tests. Split it into parametrized cases so each path is visible in the report.

avoid

def test_prices():
    for currency in ("EUR", "USD"):
        if currency == "EUR":
            assert fmt(1, currency) == "1,00 €"
        else:
            assert fmt(1, currency) == "$1.00"

prefer

@pytest.mark.parametrize(
    ("currency", "expected"),
    [("EUR", "1,00 €"), ("USD", "$1.00")],
)
def test_prices(currency, expected):
    assert fmt(1, currency) == expected

Asserting the implementation

mock.assert_called_once_with(...) on an internal helper locks in today's call graph. Rename the helper and the test fails while the behaviour is unchanged. Assert on the observable result instead.

One test, many concerns

When a 40-line test fails you learn "something in registration is broken". Several focused tests tell you which part, and the names double as documentation.

Depending on execution order

A test that only passes after another one ran is not a test, it is half of one. Each test must set up everything it needs.

Sleeping

avoid

start_worker()
time.sleep(2)          # slow AND flaky
assert job_done()

prefer

def wait_until(pred, timeout=5, interval=0.05):
    __tracebackhide__ = True
    deadline = time.monotonic() + timeout
    while time.monotonic() < deadline:
        if pred():
            return
        time.sleep(interval)
    pytest.fail(f"condition not met in {timeout}s")

start_worker()
wait_until(job_done)

Cheat sheet

Command line

`pytest path::test_name`	Run one test
`-k "expr"` / `-m "expr"`	Select by name / by marker
`-x` / `--maxfail=N`	Stop early
`--lf` / `--ff` / `--sw`	Last-failed / failures-first / stepwise
`-q` / `-v` / `-vv` / `-ra`	Verbosity and outcome summary
`-s`	Show output live
`--pdb`	Debugger at the failure
`--tb=short`	Compact tracebacks
`--collect-only`	List tests without running
`--fixtures`	List available fixtures
`--durations=10`	Slowest tests
`-n auto`	Parallel (xdist)
`-W error`	Turn warnings into failures

API

`pytest.fixture(scope=, params=, autouse=, ids=)`	Define a fixture
`pytest.mark.parametrize(argnames, argvalues, ids=, indirect=)`	Generate cases
`pytest.param(*values, id=, marks=)`	One case with metadata
`pytest.raises(Exc, match=)`	Expect an exception
`pytest.warns(W, match=)`	Expect a warning
`pytest.approx(value, rel=, abs=)`	Float comparison
`pytest.skip(reason)` / `pytest.fail(msg)`	Skip / fail imperatively
`pytest.importorskip(name, minversion=)`	Optional dependency
`pytest.register_assert_rewrite(module)`	Rewrite asserts in a helper module

Built-in fixtures

`tmp_path`, `tmp_path_factory`	Temporary directories
`monkeypatch`	Patch attributes, env vars, cwd — auto-undone
`capsys`, `capfd`	Captured stdout/stderr (Python level / fd level)
`caplog`	Captured log records
`request`	Test metadata, `request.param`, config access
`pytester`	Run a nested pytest — for testing plugins
`recwarn`	Record all warnings raised

Wrapping up

What actually moves the needle

Reading a tool's full feature list rarely changes how anyone works. If you adopt three things from this post, make them these.

The short list

Turn on --strict-markers and filterwarnings = ["error"]. Two lines of config that convert a class of silent failures into loud ones, today, in an existing project.
Replace your ad-hoc setup with fixtures, and keep them function-scoped until proven slow. Almost every order-dependent flake traces back to state shared more widely than it needed to be.
Reach for parametrize the moment a test grows a loop or an if. Each case gets its own name and its own verdict, which is the difference between "billing is broken" and "billing is broken for zero-quantity line items".

The advanced material matters less often, but it is worth knowing it exists. The day you find yourself copying the same conftest.py into a fourth repository, that is the signal to turn it into a plugin — and by then the entry point is a two-line change.

Pages

7/29/2026

The list that remembers: mutable default arguments in Python

Contents

Part 1 · The bug

Three calls, one list

Part 2 · Why it happens

The default lives on the function object

Part 3 · The shape you actually hit

Two carts, one basket

Part 4 · The fix

The None sentinel

Type it honestly

Part 5 · Variations and near-misses

Three things worth knowing

or [] is shorter and subtly different

When None is itself a valid value

Dataclasses refuse to let you do it

Part 6 · What is safe, what is not

A quick classification

Part 7 · Letting a linter catch it

B006 and B008

Wrapping up

Cheat sheet

The short list

Further reading

7/28/2026

The loop variable that changed: late binding and default arguments in Python

Contents

Part 1 · The bug

Three functions, one value

Part 2 · Why it happens

Variables, cells, and the missing block scope

Part 3 · Anatomy of the fix

Reading def f(x: T = x) as three parts

Part 4 · Defaults are evaluated once

The rule underneath

Part 5 · The same rule, as a trap

Mutable default arguments

Part 6 · Other ways to capture

partial, factories, and which to prefer

A factory function

functools.partial

Comparison

Part 7 · Where it bites

Anything that defers the call

Part 8 · What the type hint does

The middle part, and when it runs

Part 9 · Scoping rules, briefly

LEGB, nonlocal, and comprehensions

Part 10 · Letting a linter catch it

B023 and B008

Wrapping up

Cheat sheet

The short list

Further reading

pytest, from your first test to your own plugin

Contents

Part 1 · Basics

Why pytest

Your first test

How tests are found

The assert statement

Exceptions and warnings

Comparing floats

Running and selecting tests

Part 2 · Fixtures

Fixture basics

Setup and teardown

Scopes

conftest.py

Built-in fixtures

tmp_path — a real, empty directory per test

monkeypatch — scoped, auto-undone patching

capsys and caplog — captured output

request — introspection

Factory fixtures

Part 3 · Parametrizing

parametrize

Readable IDs

The `None` sentinel

`or []` is shorter and subtly different

When `None` is itself a valid value

Reading `def f(x: T = x)` as three parts

`functools.partial`

`tmp_path` — a real, empty directory per test

`monkeypatch` — scoped, auto-undone patching

`capsys` and `caplog` — captured output

`request` — introspection

Why plain `assert` works