Projects

Open source tools for working with LLMs and agent systems. Built to support practical development and testing workflows.

Analysis

Count tokens and estimate costs across LLM providers. Supports OpenAI, Anthropic, and open source tokenizers.

Track LLM API costs across providers with budget alerts. SQLite storage, daily/weekly reports, spending limits.

Visualize context window usage in conversations. See token distribution by role, estimate costs, export reports.

Explore and debug embedding spaces. Similarity search, clustering, outlier detection, and visualization.

Parse and analyze agent execution traces. Extract tool calls, measure timing, identify patterns across runs.

Side-by-side comparison of LLM model outputs. Run prompts across models, diff responses, measure quality metrics.

Diff prompt templates with semantic awareness. Track changes across versions, identify meaningful modifications.

Static analysis for LLM prompts. Detect vague instructions, injection vulnerabilities, and consistency issues.

Version-controlled prompt templates with variables. Jinja2 templating, YAML storage, environment-specific configs.

Convert JSON Schema to OpenAI/Anthropic tool definitions. Validate schemas, batch convert, multiple output formats.

Local caching layer for LLM API responses. SQLite storage, TTL support, proxy server mode for drop-in caching.

Robust LLM client with retry, fallback, and circuit breaker. Exponential backoff, rate limit handling, multi-provider.

Mock LLM API server for testing. Deterministic responses, latency simulation, error injection for robust tests.

Validate LLM outputs against JSON schemas and constraints. JSONPath rules, auto-repair malformed JSON, batch validation.

Multi-turn conversation test fixture manager. Record, replay, and validate conversation flows with assertions.

Defensive security testing for LLM safety guardrails. Test prompt injection, jailbreaks, and content policy enforcement.

All 16 projects are open source under MIT license.