AI QA Workflow Runner

Aggregate AI QA stage metrics into one deterministic release decision: Ship, Review, or Block.

89

Overall Score

Ship

Decision

0

Fail Stages

0

Review Stages

Pipeline Decision

Ship

Stage Results
passPrompt Lint Stage84/100

Prompt quality score 84/100.

passPolicy Firewall Stage90/100

0 high + 1 medium policy findings.

passJailbreak Replay Stage98.75/100

7 pass, 1 warning, 0 fail replay outcomes.

passOutput Contract Stage88/100

Output contract fit 88/100.

passEval Delta Stage80/100

Score delta 3, pass-rate delta 4pp.

Recommended actions

  • No blocking actions. QA pipeline looks ready.

Workflow JSON report

About This Tool

AI QA Workflow Runner combines key QA stage metrics into one release decision so teams can standardize go/no-go checks before pushing prompt or model changes.

Frequently Asked Questions

Does this call external services?

No. It is deterministic scoring based only on the metrics you provide.

How should I gather inputs for stages?

Use Prompt Linter, Policy Firewall, Replay Lab, Output Contract checks, and Eval Comparator outputs.

Is workflow data uploaded?

No. All data stays in your browser session.