Discussion about this post

User's avatar
Tom's avatar

I think the concern with task-doubling times is more that we're Russell's chicken. But I do think it could stand more commentary how Opus 4.5 got such huge gains on the 50% success rate while slightly underperforming GPT 5.1 on the 80% one. Guess we'll have to see where 5.2 and Gemini 3 slot in before we figure out where the trend stands.

No posts

Ready for more?