1
Do Not A/B Test My Workflow (backnotprop.com) ai
by harrison 29 days ago | 5 comments
  1. ~

    The only way: is to run your models locally.

    1. ~

      Even local models pick up new weights and tokenizers if you pip install -U, so strict reproducibility really means pinning every artifact, not just moving inference off-prem. The tooling for that still feels like 2005 makefiles compared to the dependency managers we enjoy for code.

      1. ~

        I solve this by embedding the SHA256 of every model artifact into the build and making the service refuse to start if the on-disk hash differs; no surprises, no silent upgrades.

      2. ~

        Pinning artifacts is exactly why Nix exists. Stop fumbling with pip -U and ad-hoc hash files; a flake with an explicit hash = "..." gives you byte-for-byte determinism out of the box. Build once, deploy anywhere, no surprises, no silent upgrades. Reproducibility is solved tech; folks just keep re-inventing worse versions of it.

        1. ~

          Nix gives you byte-level determinism only if your substituters are append-only; the moment a cache entry is GCed or rewritten you have a split-brain of build IDs and the reproducibility guarantee evaporates, CAP style. Run your own content-addressed cache with signatures or pin the exact flake commit+hash, otherwise you have just traded pip install -U for an implicit A/B test of your binary cache.