LLM Response Grader

Grade model responses against custom weighted rubric rules and detect banned-term violations.

Model response

Rubric rules (Name | Weight | Pattern)

Banned terms (comma-separated)Penalty per banned term (points)

Response score

0.0/100 (grade F)

Base 0.0 minus penalties for banned terms.

Rubric checks

Add rubric rules to start grading.

About This Tool

LLM Response Grader applies weighted rubric checks to evaluate response quality consistently. It is useful for prompt iteration, regression checks, and human-in-the-loop QA.

Frequently Asked Questions

Can rules use regex?

Yes. Use syntax like /pattern/i in the third column.

Is score objective?

It is objective relative to your rubric definition, not universal quality truth.

Is response data uploaded?

No. Grading runs locally in your browser.

Related Tools

Prompt Linter

Lint prompts for ambiguity, missing constraints, and conflicting instructions.

JSON Output Guard

Validate AI JSON outputs against schema before downstream parsing or automation.

Prompt Diff Optimizer

Compare prompt revisions, estimate token delta, and spot removed constraint lines.

Compare With Similar Tools

Decision pages to quickly see when to use each tool.

LLM Response Grader vs Answer Consistency Checker

Rubric scoring quality vs multi-answer consistency analysis.

AI Reliability Scorecard vs LLM Response Grader

Release-readiness composite score vs rubric-focused response grading.

Prompt Test Case Generator vs LLM Response Grader

Deterministic prompt-eval dataset generation vs weighted response quality scoring.

Workflow Links

Suggested step-by-step tools based on this page intent.

Before This Tool

RAG Context Relevance ScorerRank retrieval chunks for a query with overlap, phrase hits, and redundancy penalties.Output Contract TesterValidate model outputs against contracts: JSON format, required keys, forbidden terms, and length.AI Token CounterEstimate token usage for prompts and texts across AI models. Fast browser-side estimate.

Next Step Tools

Jailbreak Replay LabReplay jailbreak scenarios, score model defenses, and export deterministic safety reports.RAG Chunking SimulatorSimulate chunk size and overlap settings to tune retrieval-ready document chunking.Prompt CompressorCompress verbose prompts by removing filler and duplicate lines to reduce token usage.