Projects
Open source tools for working with LLMs and agent systems. Built to support practical development and testing workflows.
Analysis
token-budget
Count tokens and estimate costs across LLM providers. Supports OpenAI, Anthropic, and open source tokenizers.
cost-tracker
Track LLM API costs across providers with budget alerts. SQLite storage, daily/weekly reports, spending limits.
context-window-viz
Visualize context window usage in conversations. See token distribution by role, estimate costs, export reports.
embedding-inspector
Explore and debug embedding spaces. Similarity search, clustering, outlier detection, and visualization.
agent-transcript
Parse and analyze agent execution traces. Extract tool calls, measure timing, identify patterns across runs.
model-compare
Side-by-side comparison of LLM model outputs. Run prompts across models, diff responses, measure quality metrics.
Development
prompt-diff
Diff prompt templates with semantic awareness. Track changes across versions, identify meaningful modifications.
prompt-linter
Static analysis for LLM prompts. Detect vague instructions, injection vulnerabilities, and consistency issues.
prompt-registry
Version-controlled prompt templates with variables. Jinja2 templating, YAML storage, environment-specific configs.
schema-to-tool
Convert JSON Schema to OpenAI/Anthropic tool definitions. Validate schemas, batch convert, multiple output formats.
Infrastructure
Testing
llm-mock
Mock LLM API server for testing. Deterministic responses, latency simulation, error injection for robust tests.
response-validator
Validate LLM outputs against JSON schemas and constraints. JSONPath rules, auto-repair malformed JSON, batch validation.
conversation-fixture
Multi-turn conversation test fixture manager. Record, replay, and validate conversation flows with assertions.
guardrail-tester
Defensive security testing for LLM safety guardrails. Test prompt injection, jailbreaks, and content policy enforcement.
All 16 projects are open source under MIT license.
View all on GitHub →