Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yikes!

FWIW, I think best practice here is to hardcode all feature flags to off in the integration test suite, unless explicitly overwritten in a test. Otherwise you risk exactly these sorts of heisenbugs.

At a BigCo that’s probably going to require coordinating with an internal tools team, but worth getting it on their backlog. All tests should be as deterministic as possible, and this goes double for integration tests that can flake for reasons outside of the code.



No, the best practice is that on each test run, every feature flag used implicitly or explicitly needs to be captured AND it must be possible to re-run the test with the same set of feature flags.

That way when you get a failure, you can reproduce it. And then one of the easy things to do is test which features may have contributed to it.


I strongly disagree. If you have non-deterministic tests, you are going to have builds breaking for unrelated changes, seriously hampering developer productivity as teams chase down failures unrelated to their change.

Nothing kills confidence in testing more than test flakes. It’s a huge drain on velocity and morale, and encourages devs not to trust test output.

If you want to have some sort of chaos monkey process that runs your test suite flipping feature flags at random and notifying teams of failures (along with some sort of resourcing to investigate) I could get behind that. But that should be something outside of the main suite that gates code deployment.

If a test passes when run by a dev pre-commit, it should pass in CI.


But then you won't catch the bug before it hits production :)


Also you end up with some strange long term test behavior. Because people will often leave feature flags in place long after full release (years sometimes), you end up with a default-off-in-tests only testing behavior with everything newer than N years since the last feature flag cleanup disabled.

Yes it's kinda fractal of bad practices that have to align for this problem to occur, but that's the nature of tech debt.


I agree that this is a real and separate problem, but I believe the solution lies outside of the test suite.

One way I have seen this handled is to enforce restricting rollouts of a feature flag to 95% at most. That way turning a feature all the way on requires removing the flag from your codebase. It’s draconian, but honestly anything less than that leads to the situation you describe.


I like that idea a lot. We've been informally doing it on my current team, made easier since we can sort of cleanly do atomic code+flag updates in a single commit


You are both misunderstanding the post.

He’s not saying to alter any of the feature flags used for the test, but simply to record which were used during the test.

Simply logging doesn’t introduce any of the issues you are describing.


Huh? This is what yojo@ wrote:

> I think best practice here is to hardcode all feature flags to off in the integration test suite

That's pretty clearly about forcing the flags to be off, i.e. altering them, and not about logging their values.


Agreed, I am advocating for deterministic behavior for all feature flags in the test suite.

If you’re testing a new feature, you should have explicit tests for the enabled state (along with existing tests for the disabled state).

If you have bugs propagating up the stack from flags changing in low-level dependencies, the change to the dependency is probably not properly tested.

Alternatively, if the feature flag gates a change to the interface of the dependency, you should have explicit integration tests covering the systems on both sides of the change.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: