
APIEval-20
An open benchmark for AI agents that test APIs
116
Product Hunt Upvotes
May 8, 2026
Launch Date
May 9, 2026
First Tracked
About
AI Summary
APIEval-20 is a black-box benchmark designed for evaluating API testing agents by providing them with a JSON schema and sample payload to generate test suites, which are then assessed against live reference APIs for bug detection, API coverage, and efficiency. The scoring system is objective, focusing on whether bugs are identified, and covers various tasks including authentication, error handling, and multi-step flows.
APIEval-20 is a black-box benchmark for API testing agents. Each agent gets only a JSON schema and one sample payload, then generates a test suite. We run those tests against live reference APIs with planted bugs and score bug detection, API coverage, and efficiency. Unlike LLM-as-judge evals, scoring is fully objective: a bug is either caught or it isn’t. Tasks span auth, errors, pagination, schemas, and multi-step flows. Open on Hugging Face.
Tags