CLEAR: Comprehensive LLM Error Analysis and Reporting

CLEAR is an open-source toolkit for LLM error analysis using an LLM-as-a-Judge approach.

🎯 What is CLEAR?

CLEAR provides systematic error analysis for:

Single LLM Responses — Analyze quality issues in model outputs for tasks like Q&A, summarization, and generation
Agentic Workflows — Evaluate complex workflows with multiple components, tool usage, and multi-step task trajectories

It combines automated LLM-as-a-judge evaluation with interactive dashboards to help you:

Identify recurring error patterns across your dataset
Quantify issue frequencies and severity
Drill down into specific failure cases
Prioritize improvements based on data-driven insights

⚙️ How It Works

CLEAR operates in two phases:

Analysis — Generates per-instance textual feedback, identifies system-level error categories, and quantifies their frequencies.
Interactive Dashboard — Explore aggregate visualizations, apply dynamic filters, and drill down into individual failure examples.

🔀 Two Analysis Modes

CLEAR supports two distinct analysis modes, each with its own pipeline, dashboard, and documentation:

📝 LLM Analysis

Evaluate standard LLM outputs — generation quality, correctness, and recurring error patterns. Provide a CSV with prompts and responses, and CLEAR will score each instance, generate textual critiques, and surface system-level issues.


Input	CSV with model inputs and responses
Output	Per-record scores, evaluation text, aggregated issue categories
Dashboard	Streamlit-based interactive explorer

📖 Full LLM Analysis Guide →

🤖 Agentic Analysis

Evaluate multi-agent system trajectories — step-by-step agent interactions and full trajectory analysis. Supports traces from LangGraph, CrewAI, and other frameworks via MLflow or Langfuse.


Input	Raw JSON traces or preprocessed trajectory CSVs
Output	Per-step CLEAR analysis, trajectory-level scores, rubric evaluations
Dashboard	NiceGUI-based workflow visualization with path and temporal analysis

📖 Agentic Workflows Guide → | Agentic Dashboard Guide →

✨ Key Features


🧑‍⚖️ LLM-as-a-Judge	Automated evaluation for any text generation task
🤖 Agentic Workflows	Evaluate agent trajectories - step by step and as a whole
🔌 Multiple Backends	LangChain, LiteLLM (100+ providers), or direct HTTP endpoints
🧩 External Judges	Plug in custom evaluation functions
📊 Interactive Dashboards	Standard and agentic-specific visualizations
🛠️ Flexible Configuration	YAML config files, CLI flags, or Python API

📦 Installation

Requires Python 3.10+

Option 1: pip (recommended)

pip install clear-eval

Option 2: From source (for development)

git clone https://github.com/IBM/CLEAR.git
cd CLEAR
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -e .

🚀 Quick Start

1. Set your provider credentials

CLEAR requires a supported LLM provider. Set the appropriate environment variables for your provider (e.g., OPENAI_API_KEY for OpenAI). See the Providers and Credentials Guide for all supported providers and backends.

2. Run on sample data

With no data path specified, CLEAR runs on a built-in GSM8K sample dataset using default settings:

run-clear-eval-analysis --provider openai --eval-model-name gpt-4o

Results are saved to results/gsm8k/sample_output/.

3. Run on your own data

run-clear-eval-analysis \
    --provider openai \
    --eval-model-name gpt-4o \
    --data-path path/to/your_data.csv \
    --output-dir results/my_run/ \
    --run-name my_run

Your CSV should contain at minimum id, model_input, and response columns. See the LLM Analysis Guide for the full input format specification.

4. View results

run-clear-eval-dashboard

Upload the generated ZIP file from the results directory to explore issues, scores, and individual examples.

🔍 Usage Overview

📝 LLM Analysis (CLI)

# Full pipeline
run-clear-eval-analysis --provider openai --eval-model-name gpt-4o --config_path path/to/config.yaml

# Evaluation only (using existing responses)
run-clear-eval-evaluation --provider openai --eval-model-name gpt-4o --config_path path/to/config.yaml

📝 LLM Analysis (Python)

from clear_eval.analysis_runner import run_clear_eval_analysis

run_clear_eval_analysis(
    run_name="my_run",
    provider="openai",
    data_path="my_data.csv",
    eval_model_name="gpt-4o",
    output_dir="results/",
)

🤖 Agentic Analysis

run-clear-agentic-eval \
    --data-dir data/my_traces \
    --results-dir results \
    --from-raw-traces true \
    --eval-model-name gpt-4o \
    --provider openai

# Launch agentic dashboard
run-clear-agentic-dashboard

See the Agentic Workflows Guide for full details.

📚 Documentation

Guide	Description
📝 LLM Analysis Guide	Full pipeline reference — input formats, CLI arguments, configuration, and external judges
🤖 Agentic Workflows Guide	Multi-agent evaluation — trace preprocessing, step-by-step and trajectory analysis, configuration reference
📊 Agentic Dashboard Guide	Dashboard features — workflow view, node analysis, trajectory explorer, path and temporal analysis
🔑 Providers and Credentials	Inference backends (LangChain, LiteLLM, Endpoint), provider setup, and configuration examples

🔑 Supported Providers

Provider	Backend	Credentials
OpenAI	LangChain, LiteLLM, Endpoint	`OPENAI_API_KEY`
WatsonX	LangChain, LiteLLM, Endpoint	`WATSONX_APIKEY`, `WATSONX_URL`, `WATSONX_PROJECT_ID`
Anthropic	LiteLLM	`ANTHROPIC_API_KEY`
AWS Bedrock	LiteLLM	AWS credentials
Google Vertex AI	LiteLLM	GCP credentials
100+ more	LiteLLM	Provider-specific

See the Providers and Credentials Guide for backend configuration details and examples.

🗂️ Project Structure

CLEAR/
├── README.md                              # This file
├── src/clear_eval/
│   ├── pipeline/                          # LLM analysis pipeline
│   ├── dashboard/                         # LLM dashboard (Streamlit)
│   ├── agentic/
│   │   ├── README.md                      # Agentic Workflows Guide
│   │   ├── pipeline/                      # Agentic pipeline
│   │   └── dashboard/
│   │       ├── README_DASHBOARD.md        # Agentic Dashboard Guide
│   │       └── ...
│   └── sample_data/                       # Sample datasets
├── docs/
│   ├── ANALYSIS_README.md                 # LLM Analysis Guide
│   └── PROVIDERS.md                       # Providers and Credentials Guide
├── examples/                              # Configuration examples
└── tests/                                 # Test suite

📄 License

Apache 2.0 — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 246 Commits
docs		docs
examples/custom_judges		examples/custom_judges
src/clear_eval		src/clear_eval
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
.whitesource		.whitesource
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLEAR: Comprehensive LLM Error Analysis and Reporting

🎯 What is CLEAR?

⚙️ How It Works

🔀 Two Analysis Modes

📝 LLM Analysis

🤖 Agentic Analysis

✨ Key Features

📦 Installation

Option 1: pip (recommended)

Option 2: From source (for development)

🚀 Quick Start

1. Set your provider credentials

2. Run on sample data

3. Run on your own data

4. View results

🔍 Usage Overview

📝 LLM Analysis (CLI)

📝 LLM Analysis (Python)

🤖 Agentic Analysis

📚 Documentation

🔑 Supported Providers

🗂️ Project Structure

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CLEAR: Comprehensive LLM Error Analysis and Reporting

🎯 What is CLEAR?

⚙️ How It Works

🔀 Two Analysis Modes

📝 LLM Analysis

🤖 Agentic Analysis

✨ Key Features

📦 Installation

Option 1: pip (recommended)

Option 2: From source (for development)

🚀 Quick Start

1. Set your provider credentials

2. Run on sample data

3. Run on your own data

4. View results

🔍 Usage Overview

📝 LLM Analysis (CLI)

📝 LLM Analysis (Python)

🤖 Agentic Analysis

📚 Documentation

🔑 Supported Providers

🗂️ Project Structure

📄 License

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages