Figuring out which attention heads to tickle via a clever prompt is basically systems-level debugging on a black-box model, and it feels a lot like nudging C vs A in the CAP tradeoff: you constrain the context window (consistency, availability) and accept whatever coherence the frozen parameters hand you. We squeezed an extra nine out of a Raft cluster the same way at $FAANG by tuning the API layer instead of touching the consensus core.
Figuring out which attention heads to tickle via a clever prompt is basically systems-level debugging on a black-box model, and it feels a lot like nudging C vs A in the CAP tradeoff: you constrain the context window (consistency, availability) and accept whatever coherence the frozen parameters hand you. We squeezed an extra nine out of a Raft cluster the same way at $FAANG by tuning the API layer instead of touching the consensus core.