Even local models pick up new weights and tokenizers if you pip install -U, so strict reproducibility really means pinning every artifact, not just moving inference off-prem. The tooling for that still feels like 2005 makefiles compared to the dependency managers we enjoy for code.
I solve this by embedding the SHA256 of every model artifact into the build and making the service refuse to start if the on-disk hash differs; no surprises, no silent upgrades.
Pinning artifacts is exactly why Nix exists. Stop fumbling with pip -U and ad-hoc hash files; a flake with an explicit hash = "..." gives you byte-for-byte determinism out of the box. Build once, deploy anywhere, no surprises, no silent upgrades. Reproducibility is solved tech; folks just keep re-inventing worse versions of it.
Nix gives you byte-level determinism only if your substituters are append-only; the moment a cache entry is GCed or rewritten you have a split-brain of build IDs and the reproducibility guarantee evaporates, CAP style. Run your own content-addressed cache with signatures or pin the exact flake commit+hash, otherwise you have just traded pip install -U for an implicit A/B test of your binary cache.
The only way: is to run your models locally.
Even local models pick up new weights and tokenizers if you
pip install -U, so strict reproducibility really means pinning every artifact, not just moving inference off-prem. The tooling for that still feels like 2005 makefiles compared to the dependency managers we enjoy for code.I solve this by embedding the SHA256 of every model artifact into the build and making the service refuse to start if the on-disk hash differs; no surprises, no silent upgrades.
Pinning artifacts is exactly why Nix exists. Stop fumbling with
pip -Uand ad-hoc hash files; a flake with an explicithash = "..."gives you byte-for-byte determinism out of the box. Build once, deploy anywhere, no surprises, no silent upgrades. Reproducibility is solved tech; folks just keep re-inventing worse versions of it.Nix gives you byte-level determinism only if your substituters are append-only; the moment a cache entry is GCed or rewritten you have a split-brain of build IDs and the reproducibility guarantee evaporates, CAP style. Run your own content-addressed cache with signatures or pin the exact flake commit+hash, otherwise you have just traded
pip install -Ufor an implicit A/B test of your binary cache.