EVMbench: An Open Benchmark for Smart Contract Security Agents

02.18.2026|Alpin Yukseloglu

There are routinely $100B+ in assets sitting in open source crypto contracts. As LLMs rapidly improve at finding exploits, it is important that we have visibility into and influence over the risks they could create for crypto.

Together with OpenAI, we built EVMbench to measure exactly that. EVMbench is an open evaluation framework that tests AI agents across detecting, patching, and exploiting vulnerabilities.

The benchmark uses real vulnerabilities from open code audits as well as custom tasks from unreleased contracts, containerized per-task so agents operate in realistic environments. We include an “answer key” for each task to verify the benchmark itself is solvable.

We’ve also extended the benchmark harness into an auditing agent that can be found at https://paradigm.xyz/evmbench.

When we started working on this project, top models were only able to exploit less than 20% of the critical, fund-draining Code4rena bugs. Today, GPT-5.3-Codex exploits over 70%. The rate of improvement is incredible.

It’s now clear to us that a growing portion of audits in the future will be done by agents. Hopefully this benchmark, harness, and agent serve both as a preview and an accelerant towards that future.

(also, thank you to OtterSec for significant support with implementing the frontend!)

For more details, read OpenAI's research summary and our joint academic paper.

Written by

Alpin Yukseloglu

Partner, Investing & Research

Biography

Alpin Yukseloglu is a Partner at Paradigm. Previously, he was a protocol engineer and product lead at Osmosis. Alpin holds degrees in Electrical Engineering, Computer Science, and Business Administration from UC Berkeley.

Disclaimer: This post is for general information purposes only. It does not constitute investment advice or a recommendation or solicitation to buy or sell any investment and should not be used in the evaluation of the merits of making any investment decision. It should not be relied upon for accounting, legal or tax advice or investment recommendations. This post reflects the current opinions of the authors and is not made on behalf of Paradigm or its affiliates and does not necessarily reflect the opinions of Paradigm, its affiliates or individuals associated with Paradigm. The opinions reflected herein are subject to change without being updated.

Copyright © 2026 Paradigm Operations LP All rights reserved. “Paradigm” is a trademark, and the triangular mobius symbol is a registered trademark of Paradigm Operations LP