I was tired of writing disposable scripts for LLM benchmarks and built a harness that automatically computes the Pareto front
Summary
This article describes a new open-source LLM benchmarking harness built to replace one-off scripts with a repeatable workflow. It supports multiple backends such as vLLM, llama.cpp, ONNX Runtime, and transformers, and measures latency, throughput, VRAM, quality, and energy efficiency. The tool adds YAML-based configuration, matrix sweeps, Pareto-front analysis, recommendations, CSV/JSON export, and CI regression checks. It also includes a web UI, live updates, and Docker or pip installation paths for local and GPU-enabled setups.
Classifications
industries
No industries detected
applications
No applications detected
AI Classifications
Labels
No AI classifications detected