Testing Strategy and Benchmark Policy
PHIDS testing is designed to protect both correctness and performance invariants. Because the project is a deterministic simulation system with benchmark-sensitive hot paths, a good testing strategy is not only about maximizing coverage; it is about selecting the right test surface for the kind of change being made.
Testing Philosophy
A useful way to read the PHIDS test suite is as a layered defense model:
- schema and invariant tests protect configuration boundaries and hard architectural rules,
- system and loop tests protect runtime behavior across phases,
- UI/API tests protect operator surfaces and draft/live semantics,
- example-scenario tests protect curated experimental fixtures,
- benchmark tests protect performance-sensitive contracts.
This layered structure reflects the architecture of the project itself.
Current Pytest Configuration
The active pytest settings live in pyproject.toml.
Current repository-wide defaults include:
- coverage over
src/phids, --cov-fail-under=80,- benchmark GC control,
- benchmark sorting by mean runtime.
This means that a plain repository-level run:
uv run pytest
is both a functional and coverage-enforcing quality gate.
Full Quality-Gate Test Run
For merge-ready confidence, the project expects contributors to be able to run:
uv run pytest
This is the command mirrored by CI and should be treated as the canonical whole-suite check.
Focused Functional Runs
For local iteration, PHIDS also supports focused functional runs on selected files.
Because the project-level pytest configuration injects coverage arguments, contributors may prefer to
clear addopts when they want a quick subsystem-only run without the global coverage gate.
Typical pattern:
uv run pytest -o addopts='' tests/<target_file>.py -q
This is useful for rapid feedback while editing one subsystem, as long as the contributor still runs broader gates before considering the work complete.
Suite Topology by Surface
API route presence and top-level contracts
Representative file:
tests/test_api_routes.py
Use this when you need to confirm that expected route surfaces still exist.
HTMX/UI rendering and draft/live interaction
Representative files:
tests/test_ui_routes.pytests/test_ui_state.pytests/test_api_builder_and_helpers.py
These protect:
- partial rendering,
- diagnostics tabs,
- draft mutation behavior,
- scenario import/export through the UI,
- load-draft semantics,
- helper functions that support the control center.
Engine systems and phase behavior
Representative files:
tests/test_systems_behavior.pytests/test_termination_and_loop.py
These protect:
- lifecycle behavior,
- interaction behavior,
- signaling behavior,
- termination semantics,
- loop integration,
- replay/telemetry integration.
Core invariants and structural rules
Representative files:
tests/test_schemas_and_invariants.pytests/test_biotope_diffusion.pytests/test_ecs_world.pytests/test_flow_field.py
These protect:
- Rule-of-16 bounds,
- schema validation,
- buffering behavior,
- spatial-hash properties,
- flow-field semantics,
- subnormal-threshold behavior.
Curated scenario fixtures
Representative file:
tests/test_example_scenarios.py
This protects the curated example pack as both a validation and runtime-compatibility surface.
Recommended Focused Runs by Change Type
UI and builder changes
When editing:
src/phids/api/main.pysrc/phids/api/ui_state.pysrc/phids/api/templates/
start with:
uv run pytest -o addopts='' tests/test_ui_routes.py tests/test_ui_state.py tests/test_api_builder_and_helpers.py -q
Engine system changes
When editing:
src/phids/engine/systems/src/phids/engine/loop.py
start with:
uv run pytest -o addopts='' tests/test_systems_behavior.py tests/test_termination_and_loop.py -q
Schema and scenario-language changes
When editing:
src/phids/api/schemas.pysrc/phids/io/scenario.py- scenario import/export behavior
start with:
uv run pytest -o addopts='' tests/test_schemas_and_invariants.py tests/test_example_scenarios.py tests/test_ui_state.py -q
Replay, telemetry, and export changes
When editing:
src/phids/telemetry/src/phids/io/replay.py
start with:
uv run pytest -o addopts='' tests/test_termination_and_loop.py tests/test_replay_roundtrip.py tests/test_additional_coverage.py -q
Benchmark Policy
PHIDS has a small but important benchmark layer. These tests are not decorative; they express performance-sensitive expectations for core runtime surfaces.
Current dedicated benchmark files
tests/test_flow_field_benchmark.pytests/test_spatial_hash_benchmark.py
What they currently cover
- flow-field generation cost on a representative grid,
- hot-cell spatial-hash query behavior.
When Benchmarks Are Mandatory
Benchmark tests should be run whenever a change touches:
src/phids/engine/core/flow_field.pysrc/phids/engine/core/ecs.pyspatial-hash behavior,- any code that materially changes the work performed by those subsystems,
- diffusion-adjacent or locality-sensitive behavior that could indirectly degrade those paths.
Even if the change is functionally correct, a regression in these hotspots can violate important project expectations.
Benchmark Commands
Use:
uv run pytest -o addopts='' tests/test_flow_field_benchmark.py tests/test_spatial_hash_benchmark.py -q
For CI-parity whole-suite benchmarking, the repository-level uv run pytest remains the canonical
full run.
Current Benchmark Scope Limits
The current benchmark layer does not mean every performance-sensitive subsystem has a dedicated benchmark file.
In particular:
- flow field has a dedicated benchmark,
- spatial hash has a dedicated benchmark,
- diffusion is benchmark-sensitive in policy, but does not currently have a dedicated benchmark test file of its own.
Contributors should document and reason about that current state precisely.
Coverage Strategy
Coverage is a project-level quality gate, but PHIDS has already demonstrated that the right response to missing coverage is usually to add meaningful tests rather than weaken the threshold.
A good contribution should therefore:
- add focused tests for new behavior,
- prefer tests close to the owning subsystem,
- avoid bypassing the full-suite coverage gate as a final validation step.
CI Parity
The current CI workflow distributes validation across three focused jobs:
uv run pytest
uv run ruff check .
uv run ruff format --check .
uv run mkdocs build --strict
In practice, the jobs are:
qualityfor the repository's currently green Ruff lint/format checks,tests-py312for the full suite,docsfor the strict MkDocs build.
The workflow is now intentionally triggered only for pull requests targeting main and manual
dispatch runs, which keeps expensive validation off intermediate branch pushes and off develop.
Any serious contribution should be considered in relation to this path, even if local iteration uses
smaller commands first. For the workflow structure and local act rehearsal commands, see
github-actions-and-local-ci.md.
Testing as Architectural Verification
In PHIDS, tests are especially valuable because many of the most important properties are not merely function outputs; they are architectural invariants.
Examples include:
- draft state being distinct from live runtime state,
- environmental buffering behavior,
- root-network growth cadence,
- activation-condition tree semantics,
- flow-field linearity and toxin summation,
- replay framing and termination semantics.
These are the kinds of behaviors contributors should protect first when making changes.
Practical Change-to-Test Matrix
| Change surface | First focused tests | Broader follow-up |
|---|---|---|
| UI partials / builder routes | tests/test_ui_routes.py, tests/test_ui_state.py, tests/test_api_builder_and_helpers.py |
uv run pytest |
| Engine systems | tests/test_systems_behavior.py, tests/test_termination_and_loop.py |
uv run pytest |
| Flow field | tests/test_flow_field.py, tests/test_flow_field_benchmark.py |
uv run pytest |
| Spatial hash / ECS locality | tests/test_ecs_world.py, tests/test_spatial_hash_benchmark.py |
uv run pytest |
| Scenario schema / import-export | tests/test_schemas_and_invariants.py, tests/test_example_scenarios.py, tests/test_ui_state.py |
uv run pytest |
| Replay / telemetry | tests/test_replay_roundtrip.py, tests/test_termination_and_loop.py, tests/test_additional_coverage.py |
uv run pytest |
Verified Current-State Evidence
pyproject.toml.github/workflows/ci.ymlREADME.mdtests/test_api_routes.pytests/test_ui_routes.pytests/test_ui_state.pytests/test_api_builder_and_helpers.pytests/test_systems_behavior.pytests/test_termination_and_loop.pytests/test_flow_field.pytests/test_biotope_diffusion.pytests/test_ecs_world.pytests/test_example_scenarios.pytests/test_flow_field_benchmark.pytests/test_spatial_hash_benchmark.py
Where to Read Next
- For contributor workflow around these checks:
contribution-workflow-and-quality-gates.md - For quality-gate traceability:
../reference/requirements-traceability.md - For benchmark-sensitive runtime subsystems:
../engine/flow-field.mdand../engine/ecs-and-spatial-hash.md