Skip to content

IBM/CLEAR

Repository files navigation

CLEAR: Comprehensive LLM Error Analysis and Reporting

CLEAR is an open-source toolkit for LLM error analysis using an LLM-as-a-Judge approach.

Python 3.10+ License PyPI


🎯 What is CLEAR?

CLEAR provides systematic error analysis for:

  • Single LLM Responses β€” Analyze quality issues in model outputs for tasks like Q&A, summarization, and generation
  • Agentic Workflows β€” Evaluate complex workflows with multiple components, tool usage, and multi-step task trajectories

It combines automated LLM-as-a-judge evaluation with interactive dashboards to help you:

  • Identify recurring error patterns across your dataset
  • Quantify issue frequencies and severity
  • Drill down into specific failure cases
  • Prioritize improvements based on data-driven insights

βš™οΈ How It Works

CLEAR operates in two phases:

  1. Analysis β€” Generates per-instance textual feedback, identifies system-level error categories, and quantifies their frequencies.
  2. Interactive Dashboard β€” Explore aggregate visualizations, apply dynamic filters, and drill down into individual failure examples.

πŸ”€ Two Analysis Modes

CLEAR supports two distinct analysis modes, each with its own pipeline, dashboard, and documentation:

πŸ“ LLM Analysis

Evaluate standard LLM outputs β€” generation quality, correctness, and recurring error patterns. Provide a CSV with prompts and responses, and CLEAR will score each instance, generate textual critiques, and surface system-level issues.

Input CSV with model inputs and responses
Output Per-record scores, evaluation text, aggregated issue categories
Dashboard Streamlit-based interactive explorer

πŸ“– Full LLM Analysis Guide β†’

πŸ€– Agentic Analysis

Evaluate multi-agent system trajectories β€” step-by-step agent interactions and full trajectory analysis. Supports traces from LangGraph, CrewAI, and other frameworks via MLflow or Langfuse.

Input Raw JSON traces or preprocessed trajectory CSVs
Output Per-step CLEAR analysis, trajectory-level scores, rubric evaluations
Dashboard NiceGUI-based workflow visualization with path and temporal analysis

πŸ“– Agentic Workflows Guide β†’ | Agentic Dashboard Guide β†’


✨ Key Features

πŸ§‘β€βš–οΈ LLM-as-a-Judge Automated evaluation for any text generation task
πŸ€– Agentic Workflows Evaluate agent trajectories - step by step and as a whole
πŸ”Œ Multiple Backends LangChain, LiteLLM (100+ providers), or direct HTTP endpoints
🧩 External Judges Plug in custom evaluation functions
πŸ“Š Interactive Dashboards Standard and agentic-specific visualizations
πŸ› οΈ Flexible Configuration YAML config files, CLI flags, or Python API

πŸ“¦ Installation

Requires Python 3.10+

Option 1: pip (recommended)

pip install clear-eval

Option 2: From source (for development)

git clone https://github.com/IBM/CLEAR.git
cd CLEAR
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -e .

πŸš€ Quick Start

1. Set your provider credentials

CLEAR requires a supported LLM provider. Set the appropriate environment variables for your provider (e.g., OPENAI_API_KEY for OpenAI). See the Providers and Credentials Guide for all supported providers and backends.

2. Run on sample data

With no data path specified, CLEAR runs on a built-in GSM8K sample dataset using default settings:

run-clear-eval-analysis --provider openai --eval-model-name gpt-4o

Results are saved to results/gsm8k/sample_output/.

3. Run on your own data

run-clear-eval-analysis \
    --provider openai \
    --eval-model-name gpt-4o \
    --data-path path/to/your_data.csv \
    --output-dir results/my_run/ \
    --run-name my_run

Your CSV should contain at minimum id, model_input, and response columns. See the LLM Analysis Guide for the full input format specification.

4. View results

run-clear-eval-dashboard

Upload the generated ZIP file from the results directory to explore issues, scores, and individual examples.


πŸ” Usage Overview

πŸ“ LLM Analysis (CLI)

# Full pipeline
run-clear-eval-analysis --provider openai --eval-model-name gpt-4o --config_path path/to/config.yaml

# Evaluation only (using existing responses)
run-clear-eval-evaluation --provider openai --eval-model-name gpt-4o --config_path path/to/config.yaml

πŸ“ LLM Analysis (Python)

from clear_eval.analysis_runner import run_clear_eval_analysis

run_clear_eval_analysis(
    run_name="my_run",
    provider="openai",
    data_path="my_data.csv",
    eval_model_name="gpt-4o",
    output_dir="results/",
)

πŸ€– Agentic Analysis

run-clear-agentic-eval \
    --data-dir data/my_traces \
    --results-dir results \
    --from-raw-traces true \
    --eval-model-name gpt-4o \
    --provider openai

# Launch agentic dashboard
run-clear-agentic-dashboard

See the Agentic Workflows Guide for full details.


πŸ“š Documentation

Guide Description
πŸ“ LLM Analysis Guide Full pipeline reference β€” input formats, CLI arguments, configuration, and external judges
πŸ€– Agentic Workflows Guide Multi-agent evaluation β€” trace preprocessing, step-by-step and trajectory analysis, configuration reference
πŸ“Š Agentic Dashboard Guide Dashboard features β€” workflow view, node analysis, trajectory explorer, path and temporal analysis
πŸ”‘ Providers and Credentials Inference backends (LangChain, LiteLLM, Endpoint), provider setup, and configuration examples

πŸ”‘ Supported Providers

Provider Backend Credentials
OpenAI LangChain, LiteLLM, Endpoint OPENAI_API_KEY
WatsonX LangChain, LiteLLM, Endpoint WATSONX_APIKEY, WATSONX_URL, WATSONX_PROJECT_ID
Anthropic LiteLLM ANTHROPIC_API_KEY
AWS Bedrock LiteLLM AWS credentials
Google Vertex AI LiteLLM GCP credentials
100+ more LiteLLM Provider-specific

See the Providers and Credentials Guide for backend configuration details and examples.


πŸ—‚οΈ Project Structure

CLEAR/
β”œβ”€β”€ README.md                              # This file
β”œβ”€β”€ src/clear_eval/
β”‚   β”œβ”€β”€ pipeline/                          # LLM analysis pipeline
β”‚   β”œβ”€β”€ dashboard/                         # LLM dashboard (Streamlit)
β”‚   β”œβ”€β”€ agentic/
β”‚   β”‚   β”œβ”€β”€ README.md                      # Agentic Workflows Guide
β”‚   β”‚   β”œβ”€β”€ pipeline/                      # Agentic pipeline
β”‚   β”‚   └── dashboard/
β”‚   β”‚       β”œβ”€β”€ README_DASHBOARD.md        # Agentic Dashboard Guide
β”‚   β”‚       └── ...
β”‚   └── sample_data/                       # Sample datasets
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ ANALYSIS_README.md                 # LLM Analysis Guide
β”‚   └── PROVIDERS.md                       # Providers and Credentials Guide
β”œβ”€β”€ examples/                              # Configuration examples
└── tests/                                 # Test suite

πŸ“„ License

Apache 2.0 β€” see LICENSE for details.

About

Comprehensive LLM Error Analysis and Reporting

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages