APIEval-20

An open benchmark for AI agents that test APIs

🔬 Research & Analysis

Unknown

Product Hunt

116

Product Hunt Upvotes

May 8, 2026

Launch Date

May 9, 2026

First Tracked

About

AI Summary

APIEval-20 is a black-box benchmark designed for evaluating API testing agents by providing them with a JSON schema and sample payload to generate test suites, which are then assessed against live reference APIs for bug detection, API coverage, and efficiency. The scoring system is objective, focusing on whether bugs are identified, and covers various tasks including authentication, error handling, and multi-step flows.

APIEval-20 is a black-box benchmark for API testing agents. Each agent gets only a JSON schema and one sample payload, then generates a test suite. We run those tests against live reference APIs with planted bugs and score bug detection, API coverage, and efficiency. Unlike LLM-as-judge evals, scoring is fully objective: a bug is either caught or it isn’t. Tasks span auth, errors, pagination, schemas, and multi-step flows. Open on Hugging Face.