Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

10 points by handcrafted 2 days ago | 5 comments

little_cad 2 days ago [-]

I only see closed-source models on your leaderboard so far: https://cadbench.ai/leaderboard

It would be interesting to see how open-source models perform on CAD tasks.

mjzh 2 days ago [-]

interesting, per https://cadbench.ai/leaderboard, gpt5.5 is the best, not the opus 4.7, why opus 4.7 is with mini-swe-agent, not claude code.

handcrafted 2 days ago [-]

GPT-5.5 and Opus 4.7 are comparable when using the same harness mini-swe-agent. GPT-5.5 demonstrates a significant performance delta only when integrated with the Codex module. We hypothesize that the superior performance of Opus 4.7 on mini-swe-agent relative to the more complex Claude Code harness stems from the tight feedback loop (edit-run-check), well suited for the CAD generation task.

bigskydog 1 days ago [-]

There are also a benchmark called BenchCAD that came out recently, which shows similiar results, Opus 4.7 seems to be the best. https://benchcad.github.io/BenchCAD_webpage/

handcrafted 20 hours ago [-]

We have a section "How we compare to CadBench and BenchCAD" in our bench report comparing our Parametric CAD Bench with CADBench and BenchCAD: https://cadbench.ai/news/parametric-cad-bench

gnucleus_peggy 2 days ago [-]

[dead]

Rendered at 16:13:03 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.