SkillJavaScriptv1.0.0

Agent Audit

Audit your AI agent setup for performance, cost, and ROI. Analyzes skill usage patterns, identifies redundant or underperforming skills, and suggests optimizations.

83 downloads

sharbelayy

Updated Feb 18, 2026

Agent Audit

Scan your entire OpenClaw setup and get actionable cost/performance recommendations.

What This Skill Does

Scans config — reads OpenClaw config to map models to agents/tasks
Analyzes cron history — checks every cron job's model, token usage, runtime, success rate
Classifies tasks — determines complexity level of each task
Calculates costs — per agent, per cron, per task type using provider pricing
Recommends changes — with confidence levels and risk warnings
Generates report — markdown report with specific savings estimates

Running the Audit

python3 {baseDir}/scripts/audit.py

Options:

python3 {baseDir}/scripts/audit.py --format markdown    # Full report (default)
python3 {baseDir}/scripts/audit.py --format summary     # Quick summary only
python3 {baseDir}/scripts/audit.py --dry-run             # Show what would be analyzed
python3 {baseDir}/scripts/audit.py --output /path/to/report.md  # Save to file

How It Works

Phase 1: Discovery

Read OpenClaw config (~/.openclaw/openclaw.json or similar)
List all cron jobs and their configurations
List all agents and their default models
Detect provider (Anthropic, OpenAI, Google, xAI) from model names

Phase 2: History Analysis

Pull cron job run history (last 7 days by default)
Calculate per-job: avg tokens, avg runtime, success rate, model used
Pull session history where available
Calculate total token spend by model tier

Phase 3: Task Classification

Classify each task into complexity tiers:

Tier	Examples	Recommended Models
Simple	Health checks, status reports, reminders, notifications	Cheapest tier (Haiku, GPT-4o-mini, Flash, Grok-mini)
Medium	Content drafts, research, summarization, data analysis	Mid tier (Sonnet, GPT-4o, Pro, Grok)
Complex	Coding, architecture, security review, nuanced writing	Top tier (Opus, GPT-4.5, Ultra, Grok-2)

Classification signals:

Simple: Short output (<500 tokens), low thinking requirement, repetitive pattern, status/health tasks
Medium: Medium output, some reasoning needed, creative but templated, research tasks
Complex: Long output, multi-step reasoning, code generation, security-critical, tasks that previously failed on weaker models

Phase 4: Recommendations

For each task where the model tier doesn't match complexity:

⚠️ RECOMMENDATION: Downgrade "Knox Bot Health Check" from opus to haiku
   Current: anthropic/claude-opus-4 ($15/M input, $75/M output)
   Suggested: anthropic/claude-haiku ($0.25/M input, $1.25/M output)
   Reason: Simple status check averaging 300 output tokens
   Estimated savings: $X.XX/month
   Risk: LOW — task is simple pattern matching
   Confidence: HIGH

Safety Rules — NEVER Recommend Downgrading:

Coding/development tasks
Security reviews or audits
Tasks that have previously failed on weaker models
Tasks where the user explicitly chose a higher model
Complex multi-step reasoning tasks
Anything the user flagged as critical

Phase 5: Report Generation

Output a clean markdown report with:

Overview — total agents, crons, monthly spend estimate
Per-agent breakdown — model, usage, cost
Per-cron breakdown — model, frequency, avg tokens, cost
Recommendations — sorted by savings potential
Total potential savings — monthly estimate
One-liner config changes — exact model strings to swap

Model Pricing Reference

See references/model-pricing.md for current pricing across all providers. Update this file when prices change.

Task Classification Details

See references/task-classification.md for detailed heuristics on how tasks are classified into complexity tiers.

Important Notes

This skill is read-only — it never changes your config automatically
All recommendations include risk levels and confidence scores
When unsure about a task's complexity, it defaults to keeping the current model
The audit should be re-run periodically (monthly) as usage patterns change
Token counts are estimates based on cron history — actual costs depend on your provider's billing

Free

Installation

Reviews

No reviews yet. Be the first.