Daily Digest

6 articles published

Articles

Marc Brooker

Pass@k is Mostly Bunk

Pass@k is Mostly Bunk Exponentially better results? I'll take three! Measuring the success of AI agents isn’t easy. It’s very sensitive to what success means, it can require a lot of samples, its hi