Yikes! FWIW, I think best practice here is to hardcode all feature flags to off ...

btilly · on June 14, 2023

No, the best practice is that on each test run, every feature flag used implicitly or explicitly needs to be captured AND it must be possible to re-run the test with the same set of feature flags.

That way when you get a failure, you can reproduce it. And then one of the easy things to do is test which features may have contributed to it.

yojo · on June 15, 2023

I strongly disagree. If you have non-deterministic tests, you are going to have builds breaking for unrelated changes, seriously hampering developer productivity as teams chase down failures unrelated to their change.

Nothing kills confidence in testing more than test flakes. It’s a huge drain on velocity and morale, and encourages devs not to trust test output.

If you want to have some sort of chaos monkey process that runs your test suite flipping feature flags at random and notifying teams of failures (along with some sort of resourcing to investigate) I could get behind that. But that should be something outside of the main suite that gates code deployment.

If a test passes when run by a dev pre-commit, it should pass in CI.

nosefrog · on June 14, 2023

But then you won't catch the bug before it hits production :)

dmoy · on June 14, 2023

Also you end up with some strange long term test behavior. Because people will often leave feature flags in place long after full release (years sometimes), you end up with a default-off-in-tests only testing behavior with everything newer than N years since the last feature flag cleanup disabled.

Yes it's kinda fractal of bad practices that have to align for this problem to occur, but that's the nature of tech debt.

yojo · on June 15, 2023

I agree that this is a real and separate problem, but I believe the solution lies outside of the test suite.

One way I have seen this handled is to enforce restricting rollouts of a feature flag to 95% at most. That way turning a feature all the way on requires removing the flag from your codebase. It’s draconian, but honestly anything less than that leads to the situation you describe.

dmoy · on June 17, 2023

I like that idea a lot. We've been informally doing it on my current team, made easier since we can sort of cleanly do atomic code+flag updates in a single commit

linuxdude314 · on June 14, 2023

You are both misunderstanding the post.

He’s not saying to alter any of the feature flags used for the test, but simply to record which were used during the test.

Simply logging doesn’t introduce any of the issues you are describing.

jsnell · on June 15, 2023

Huh? This is what yojo@ wrote:

> I think best practice here is to hardcode all feature flags to off in the integration test suite

That's pretty clearly about forcing the flags to be off, i.e. altering them, and not about logging their values.

yojo · on June 15, 2023

Agreed, I am advocating for deterministic behavior for all feature flags in the test suite.

If you’re testing a new feature, you should have explicit tests for the enabled state (along with existing tests for the disabled state).

If you have bugs propagating up the stack from flags changing in low-level dependencies, the change to the dependency is probably not properly tested.

Alternatively, if the feature flag gates a change to the interface of the dependency, you should have explicit integration tests covering the systems on both sides of the change.