Video Coding Benchmarks

CodeClash Benchmarks LLMs through Multi-Round Coding Competitions

Researchers from Stanford, Princeton, and Cornell have developed a new benchmark to more accurately evaluate the coding abilities of large language models (LLMs). Called CodeClash, the new benchmark ...

Geeky Gadgets

Anthropic Claude Opus 4.5 Tops Coding Benchmarks While Slashing Token Use

What if the future of coding wasn’t human, but instead powered by an AI so advanced it could outpace even the most skilled developers? Enter Claude Opus 4.5, a model that doesn’t just assist with ...

VentureBeat

Microsoft’s GRIN-MoE AI model takes on coding and math, beating competitors in key benchmarks

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft has unveiled a groundbreaking artificial intelligence model, ...

来自MSN

OpenAI unveils GPT-5.5 with major coding benchmark gains

OpenAI has released GPT-5.5, describing it as its most capable AI model to date, with notable improvements in software engineering and command-line task benchmarks. The launch escalates competition ...

3 天

How ChatGPT 5.5 Automates Repetitive Coding Tasks to Save You Time

OpenAI's newly released ChatGPT 5.5 outperforms Cloud Opus 4.7 in coding benchmarks and allows developers to build complex 3D ...

VentureBeat

Anthropic’s Claude Opus 4.5 is here: Cheaper AI, infinite chats, and coding skills that ...

Anthropic released its most capable artificial intelligence model yet on Monday, slashing prices by roughly two-thirds while claiming state-of-the-art performance on software engineering tasks — a ...

eWeek

Gemini Beats Claude, GPT in Google’s First Android AI Coding Benchmark

AI thrives on data but feeding it the right data is harder than it seems. As enterprises scale their AI initiatives, they face the challenge of managing diverse data pipelines, ensuring proximity to ...

来自MSN

OpenCode closes gap with Claude Code in coding benchmark

Open-source AI coding tool OpenCode has come close to matching Claude Code in recent benchmark tests, offering flexibility, local model support, and cost control advantages. While Claude Code remains ...

Neowin

OpenAI debuts GPT-5.3-Codex: 25% faster and setting new coding benchmark records

OpenAI has launched GPT-5.3-Codex, a high-performance agentic model that sets new records across major coding benchmarks. Today, OpenAI announced GPT-5.3-Codex, its most capable agentic coding model ...

TechCrunch

Google launches Gemini 3 with new coding app and record benchmark scores

On Tuesday, Google released Gemini 3, its latest and most advanced foundation model, which is now immediately available through the Gemini app and AI search interface. Coming just seven months after ...

一些您可能无法访问的结果已被隐去。

显示无法访问的结果