Flakiness issue has caused a large portion of the Flutter tree redness, and below workflow will be enforced to reduce flaky issues. The framework post-submit DeviceLab tests will be focused on in the beginning, and the logic will be extended to other host only tests in the future.
From Flutter tree dashboard, a flake is identified as a box with an exclamation icon. There are two types that will result in same flaky box.
stdout
of the test step: it shows data about failed or succeeded runs in the end (example). See Understanding a LUCI build failure for how to locate the test step and stdout
.DeviceLab tests are located under /dev/devicelab/bin/tasks
. If you plan to add a new DeviceLab test, please follow
bringup: true
in .ci.yamlbringup
to true
.On a weekly basis, an automation script will scan through test execution statistics over the past 15 days and identify top flaky ones:
# TODO(username): github issue url
above the bringup: true
lineIf an issue is closed, there will be a grace period of 15 days before the automation script refile the issue if the same flakiness persists.
Figuring out how and why a set of tests is failing can be tricky. Here are a few tips to help kickstart the process.
The auto-generated ticket will provide links to:
All of these pieces of information are helpful for further narrowing down the issue.
Things not directly related to the tests being run can be difficult to determine from the stdout logs. Things like timeouts are easier to determine using the execution details
rather than test_stdout
. From the Luci build page:
execution details
linkSome other common infra issues:
adb: device 'ZY223CXXGL' not found
Sometimes, the reported error is not immediately obvious as to what test has failed. In these cases, digging into the test_stdout
for clues can be helpful. From the Luci build page:
run flutter_view_ios__start_up
.test_stdout
linkFlakes, by their nature, are inconsistent. Determining a pattern can be very helpful in figuring out what causes the issue, and may help in identifying a fix or workaround. The most useful tool for this is the flutter dashboard link provided by the ticket under “Recent test runs” which will filter to just the relevant flaky test set. E.g. Linux_android flutter_gallery__transition_perf_with_semantics
Flakes are reported on the set of tests run, not specific test failures. This can mean that the issue raising us above our 2% threshold is actually more than one issue. This is pretty rare, but does happen on occasion. There's no need to waste a lot of time verifying that every failure is the same, but taking some time to verify some of the failures can save time in the long run. The list of “Flaky builds:” on the automated ticket is very useful for this, and you can often find additional flaky builds in the flutter dashboard.
Flakes can be under the 2% threshold for a long time and slowly build up from various small increases in flakiness. Other times they can have a specific cause that started in an obvious location. The latter are much easier to resolve, so it's a good idea to check for a root cause. In the Flutter dashboard:
Sometimes it can be easier to identify patterns of what‘s different between test runs than it is to identify a reason for the failure just from the failing runs. After you’ve identified what‘s failing, if you can’t figure out what's causing the test to fail, keep an eye out for these common patterns:
The TL will help triage, reassign, and attempt to fix the flakiness.
If the test was marked flaky in CI and then fixed, the test can be re-enabled after being validated for 50 consecutive runs without flakiness issues (task without exclamation point in flutter build dashboard and task not failed due to the same flaky failure). This can be done by updating the test entry in .ci.yaml to remove the bringup: true
entry for that target.
If not fixable, the test will be removed from the flutter build dashboard or deleted from CI completely depending on specific cases.