SkillJavaScriptv1.0.0
Agent Audit
Audit your AI agent setup for performance, cost, and ROI. Analyzes skill usage patterns, identifies redundant or underperforming skills, and suggests optimizations.
83 downloads
sharbelayy
Updated Feb 18, 2026Agent Audit
Scan your entire OpenClaw setup and get actionable cost/performance recommendations.
What This Skill Does
- Scans config — reads OpenClaw config to map models to agents/tasks
- Analyzes cron history — checks every cron job's model, token usage, runtime, success rate
- Classifies tasks — determines complexity level of each task
- Calculates costs — per agent, per cron, per task type using provider pricing
- Recommends changes — with confidence levels and risk warnings
- Generates report — markdown report with specific savings estimates
Running the Audit
python3 {baseDir}/scripts/audit.py
Options:
python3 {baseDir}/scripts/audit.py --format markdown # Full report (default)
python3 {baseDir}/scripts/audit.py --format summary # Quick summary only
python3 {baseDir}/scripts/audit.py --dry-run # Show what would be analyzed
python3 {baseDir}/scripts/audit.py --output /path/to/report.md # Save to file
How It Works
Phase 1: Discovery
- Read OpenClaw config (
~/.openclaw/openclaw.jsonor similar) - List all cron jobs and their configurations
- List all agents and their default models
- Detect provider (Anthropic, OpenAI, Google, xAI) from model names
Phase 2: History Analysis
- Pull cron job run history (last 7 days by default)
- Calculate per-job: avg tokens, avg runtime, success rate, model used
- Pull session history where available
- Calculate total token spend by model tier
Phase 3: Task Classification
Classify each task into complexity tiers:
| Tier | Examples | Recommended Models |
|---|---|---|
| Simple | Health checks, status reports, reminders, notifications | Cheapest tier (Haiku, GPT-4o-mini, Flash, Grok-mini) |
| Medium | Content drafts, research, summarization, data analysis | Mid tier (Sonnet, GPT-4o, Pro, Grok) |
| Complex | Coding, architecture, security review, nuanced writing | Top tier (Opus, GPT-4.5, Ultra, Grok-2) |
Classification signals:
- Simple: Short output (<500 tokens), low thinking requirement, repetitive pattern, status/health tasks
- Medium: Medium output, some reasoning needed, creative but templated, research tasks
- Complex: Long output, multi-step reasoning, code generation, security-critical, tasks that previously failed on weaker models
Phase 4: Recommendations
For each task where the model tier doesn't match complexity:
⚠️ RECOMMENDATION: Downgrade "Knox Bot Health Check" from opus to haiku
Current: anthropic/claude-opus-4 ($15/M input, $75/M output)
Suggested: anthropic/claude-haiku ($0.25/M input, $1.25/M output)
Reason: Simple status check averaging 300 output tokens
Estimated savings: $X.XX/month
Risk: LOW — task is simple pattern matching
Confidence: HIGH
Safety Rules — NEVER Recommend Downgrading:
- Coding/development tasks
- Security reviews or audits
- Tasks that have previously failed on weaker models
- Tasks where the user explicitly chose a higher model
- Complex multi-step reasoning tasks
- Anything the user flagged as critical
Phase 5: Report Generation
Output a clean markdown report with:
- Overview — total agents, crons, monthly spend estimate
- Per-agent breakdown — model, usage, cost
- Per-cron breakdown — model, frequency, avg tokens, cost
- Recommendations — sorted by savings potential
- Total potential savings — monthly estimate
- One-liner config changes — exact model strings to swap
Model Pricing Reference
See references/model-pricing.md for current pricing across all providers. Update this file when prices change.
Task Classification Details
See references/task-classification.md for detailed heuristics on how tasks are classified into complexity tiers.
Important Notes
- This skill is read-only — it never changes your config automatically
- All recommendations include risk levels and confidence scores
- When unsure about a task's complexity, it defaults to keeping the current model
- The audit should be re-run periodically (monthly) as usage patterns change
- Token counts are estimates based on cron history — actual costs depend on your provider's billing