Stop On Red! - Why ignoring fails leads to flakiness
Feb 20, 2024
Sappo
"Flakiness" is the worst thing in automation testing. It's much worse than having low coverage or slow test running times. Having no tests at all is preferable to having flaky ones. At least this way, you know you're relying entirely on manual testing without any false confidence.
Before we look at what you should do to reduce test flakiness, we should explore some scenarios to illustrate why flakiness is so damaging.
False Positives
False positives are when there are bugs in the software you're testing, but the related tests pass. These are (hopefully) the rarest type of flakiness but can be very damaging.
False positives can come about in numerous ways; almost all are human errors. But before you think, "I'm better than that," please remember very few people deliberately cause car crashes. Mistakes happen, especially when rushing or distracted.
Despite its rarity, I've seen this caused by widely different areas of a test pack. With one, someone had set up a CI/CD pipeline to return the response from the test pack once it had finished OR ZERO! After a simple change, this fell into the "OR" for the first time and started returning passes, even though the pack didn't even start. Another was just a low-quality test. The test logged into a user's account but didn't do any checks after that point. The server returned a custom error page, but the pack was still "green".
False Negatives
After the potential consequences of the above, you might be tempted to think these are preferable, but they are just as bad! False positives are when your tests fail in an area that would be considered a pass by a human. When people say "Flakiness" or "Flaky Test," this is the type they typically are referring to.
False negatives can happen for so many different reasons that I'll just quick-fire a bunch:
Requests taking longer in different environments
Tests sharing data
Tests overlapping
Tests not being circular
Unstable environment
Unstable remote runners
Bad element selectors
Minor UI changes
And, of course, test script mistakes
All of the above and many more can cause your test pack to fail when you or someone else manually testing the same thing would consider it a pass.
If left unchecked, this will destroy all confidence in test automation within your team, department, and company. No joke, this can cost you your job. I don't mean directly like, "This is a false negative, YOU'RE FIRED!" but engineering teams and companies only have so much budget. When nobody values the output from the test automation pack, they will stop valuing the people who work on it and look at alternatives like off-shore test teams and Magic-based testing solutions.
Why ignoring fails leads to flakiness?
There is no such thing as "Flakiness". It's just the term we use to mean unstable for an unknown reason.
If your test pack fails and you don't stop and give it the required investigation time, it becomes "Flaky".
If your test pack fails and, after investigation, you realize it's because your remote runners had an outage, it's not a flaky test pack. It's a runner outage. If you discover it's because you needed to swap a pause for a wait-for element, it was just the need for a test improvement.
What should you do when your pack fails?
Stop, Investigate & Eradicate!
Before anything else, talk as an entire team about the importance of having confidence in the test automation pack. It's there to be a speed boost, but everyone must value it and believe the results.
This conversation should include scrum people, as you must make them aware that investigating all fails will take priority; you need to allow adequate time for this. You also need development representatives, as not all "Red Runs" are false negatives!
Your pack should be green. If it turns red, Stop, Investigate, and Eradicate!
What should happen when you correctly find a bug?
If there's value in the steps preceding the buggy area, shorten the test.
If you already have that coverage elsewhere, temporarily exclude or remove the test until the bug is fixed.
Your pack should be green. If it turns red, Stop, Investigate, and Eradicate!
What should you do if you notice a false result?
Make everyone aware of the false positive/negative and make it the highest possible priority.
If you don't Stop, Investigate, and Eradicate every single instance, you will never reach the point where the entire engineering team values the test pack results and, by extension, all the effort you have put in.
What can you do to reduce flakiness?
Human-error will always be the most significant factor, partially when people need to rush or have limited time while bouncing between meetings.
Integrations are also a significant complexity. When you have a bespoke test pack and need to connect up remote runners, there's always room for configuration and 3rd party issues.
Codeless test automation resolves these prominent reasons for "Flakiness" as there's little room for human error, and runners are integrated out of the box.
But the number one way to reduce flakiness is to Stop, Investigate, and Eradicate! If you continue to ignore these issues, they will only grow.
"Flakiness" is the worst thing in automation testing. It's much worse than having low coverage or slow test running times. Having no tests at all is preferable to having flaky ones. At least this way, you know you're relying entirely on manual testing without any false confidence.
Before we look at what you should do to reduce test flakiness, we should explore some scenarios to illustrate why flakiness is so damaging.
False Positives
False positives are when there are bugs in the software you're testing, but the related tests pass. These are (hopefully) the rarest type of flakiness but can be very damaging.
False positives can come about in numerous ways; almost all are human errors. But before you think, "I'm better than that," please remember very few people deliberately cause car crashes. Mistakes happen, especially when rushing or distracted.
Despite its rarity, I've seen this caused by widely different areas of a test pack. With one, someone had set up a CI/CD pipeline to return the response from the test pack once it had finished OR ZERO! After a simple change, this fell into the "OR" for the first time and started returning passes, even though the pack didn't even start. Another was just a low-quality test. The test logged into a user's account but didn't do any checks after that point. The server returned a custom error page, but the pack was still "green".
False Negatives
After the potential consequences of the above, you might be tempted to think these are preferable, but they are just as bad! False positives are when your tests fail in an area that would be considered a pass by a human. When people say "Flakiness" or "Flaky Test," this is the type they typically are referring to.
False negatives can happen for so many different reasons that I'll just quick-fire a bunch:
Requests taking longer in different environments
Tests sharing data
Tests overlapping
Tests not being circular
Unstable environment
Unstable remote runners
Bad element selectors
Minor UI changes
And, of course, test script mistakes
All of the above and many more can cause your test pack to fail when you or someone else manually testing the same thing would consider it a pass.
If left unchecked, this will destroy all confidence in test automation within your team, department, and company. No joke, this can cost you your job. I don't mean directly like, "This is a false negative, YOU'RE FIRED!" but engineering teams and companies only have so much budget. When nobody values the output from the test automation pack, they will stop valuing the people who work on it and look at alternatives like off-shore test teams and Magic-based testing solutions.
Why ignoring fails leads to flakiness?
There is no such thing as "Flakiness". It's just the term we use to mean unstable for an unknown reason.
If your test pack fails and you don't stop and give it the required investigation time, it becomes "Flaky".
If your test pack fails and, after investigation, you realize it's because your remote runners had an outage, it's not a flaky test pack. It's a runner outage. If you discover it's because you needed to swap a pause for a wait-for element, it was just the need for a test improvement.
What should you do when your pack fails?
Stop, Investigate & Eradicate!
Before anything else, talk as an entire team about the importance of having confidence in the test automation pack. It's there to be a speed boost, but everyone must value it and believe the results.
This conversation should include scrum people, as you must make them aware that investigating all fails will take priority; you need to allow adequate time for this. You also need development representatives, as not all "Red Runs" are false negatives!
Your pack should be green. If it turns red, Stop, Investigate, and Eradicate!
What should happen when you correctly find a bug?
If there's value in the steps preceding the buggy area, shorten the test.
If you already have that coverage elsewhere, temporarily exclude or remove the test until the bug is fixed.
Your pack should be green. If it turns red, Stop, Investigate, and Eradicate!
What should you do if you notice a false result?
Make everyone aware of the false positive/negative and make it the highest possible priority.
If you don't Stop, Investigate, and Eradicate every single instance, you will never reach the point where the entire engineering team values the test pack results and, by extension, all the effort you have put in.
What can you do to reduce flakiness?
Human-error will always be the most significant factor, partially when people need to rush or have limited time while bouncing between meetings.
Integrations are also a significant complexity. When you have a bespoke test pack and need to connect up remote runners, there's always room for configuration and 3rd party issues.
Codeless test automation resolves these prominent reasons for "Flakiness" as there's little room for human error, and runners are integrated out of the box.
But the number one way to reduce flakiness is to Stop, Investigate, and Eradicate! If you continue to ignore these issues, they will only grow.
"Flakiness" is the worst thing in automation testing. It's much worse than having low coverage or slow test running times. Having no tests at all is preferable to having flaky ones. At least this way, you know you're relying entirely on manual testing without any false confidence.
Before we look at what you should do to reduce test flakiness, we should explore some scenarios to illustrate why flakiness is so damaging.
False Positives
False positives are when there are bugs in the software you're testing, but the related tests pass. These are (hopefully) the rarest type of flakiness but can be very damaging.
False positives can come about in numerous ways; almost all are human errors. But before you think, "I'm better than that," please remember very few people deliberately cause car crashes. Mistakes happen, especially when rushing or distracted.
Despite its rarity, I've seen this caused by widely different areas of a test pack. With one, someone had set up a CI/CD pipeline to return the response from the test pack once it had finished OR ZERO! After a simple change, this fell into the "OR" for the first time and started returning passes, even though the pack didn't even start. Another was just a low-quality test. The test logged into a user's account but didn't do any checks after that point. The server returned a custom error page, but the pack was still "green".
False Negatives
After the potential consequences of the above, you might be tempted to think these are preferable, but they are just as bad! False positives are when your tests fail in an area that would be considered a pass by a human. When people say "Flakiness" or "Flaky Test," this is the type they typically are referring to.
False negatives can happen for so many different reasons that I'll just quick-fire a bunch:
Requests taking longer in different environments
Tests sharing data
Tests overlapping
Tests not being circular
Unstable environment
Unstable remote runners
Bad element selectors
Minor UI changes
And, of course, test script mistakes
All of the above and many more can cause your test pack to fail when you or someone else manually testing the same thing would consider it a pass.
If left unchecked, this will destroy all confidence in test automation within your team, department, and company. No joke, this can cost you your job. I don't mean directly like, "This is a false negative, YOU'RE FIRED!" but engineering teams and companies only have so much budget. When nobody values the output from the test automation pack, they will stop valuing the people who work on it and look at alternatives like off-shore test teams and Magic-based testing solutions.
Why ignoring fails leads to flakiness?
There is no such thing as "Flakiness". It's just the term we use to mean unstable for an unknown reason.
If your test pack fails and you don't stop and give it the required investigation time, it becomes "Flaky".
If your test pack fails and, after investigation, you realize it's because your remote runners had an outage, it's not a flaky test pack. It's a runner outage. If you discover it's because you needed to swap a pause for a wait-for element, it was just the need for a test improvement.
What should you do when your pack fails?
Stop, Investigate & Eradicate!
Before anything else, talk as an entire team about the importance of having confidence in the test automation pack. It's there to be a speed boost, but everyone must value it and believe the results.
This conversation should include scrum people, as you must make them aware that investigating all fails will take priority; you need to allow adequate time for this. You also need development representatives, as not all "Red Runs" are false negatives!
Your pack should be green. If it turns red, Stop, Investigate, and Eradicate!
What should happen when you correctly find a bug?
If there's value in the steps preceding the buggy area, shorten the test.
If you already have that coverage elsewhere, temporarily exclude or remove the test until the bug is fixed.
Your pack should be green. If it turns red, Stop, Investigate, and Eradicate!
What should you do if you notice a false result?
Make everyone aware of the false positive/negative and make it the highest possible priority.
If you don't Stop, Investigate, and Eradicate every single instance, you will never reach the point where the entire engineering team values the test pack results and, by extension, all the effort you have put in.
What can you do to reduce flakiness?
Human-error will always be the most significant factor, partially when people need to rush or have limited time while bouncing between meetings.
Integrations are also a significant complexity. When you have a bespoke test pack and need to connect up remote runners, there's always room for configuration and 3rd party issues.
Codeless test automation resolves these prominent reasons for "Flakiness" as there's little room for human error, and runners are integrated out of the box.
But the number one way to reduce flakiness is to Stop, Investigate, and Eradicate! If you continue to ignore these issues, they will only grow.