ENGINEERING

Triage flaky tests from uploaded JUnit results

On every JUnit XML test report posted to the gateway, this diffs results against recent history to spot tests flipping outcomes, quarantines them in Postgres.

CategoryEngineering
Enginesim
Difficultyadvanced
Triggerwebhook
Steps5
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerJUnit XML report posted to webhookHTTP webhook
  • ActionWrite per-test outcomes to Postgres historyPostgreSQLPostgres
  • LogicFlag tests flipping pass/fail across runs
  • ActionMark flippers quarantined in PostgresPostgreSQLPostgres
  • OutputPost triage digest to SlackSlack

What it does

It ingests a JUnit XML report whenever CI uploads one, compares each test's outcome to its recent history stored in Postgres, and identifies tests that have flipped between pass and fail without a code change to the test. Those flippers are marked quarantined in the history table and surfaced to the team in a single Slack triage digest.

When to use it

Use it when your CI emits standard JUnit XML and you want a pipeline-agnostic flake detector that works regardless of which runner or language produced the results.

How it works

  1. 1An HTTP webhook receives the JUnit XML report payload after a CI run.
  2. 2Results are parsed and each test's current outcome is written to a Postgres history table.
  3. 3A logic step flags tests whose last N outcomes mix pass and fail (outcome-flipping) as flaky.
  4. 4Flagged tests get a `quarantined` flag and reason set in Postgres for downstream gating.
  5. 5A Slack message posts the triage summary — newly quarantined tests, flip counts, and owners — to the team channel.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect HTTP webhookTrigger any URL on agent actions.
  2. 2
    Connect PostgresAny Postgres URL — query, write, migrate.
  3. 3
    Connect SlackChannels, DMs, threads, mentions.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.