Researchers at Princeton University built CEO-Bench, a test where AI agents have to run a…
Category: AI research
Auto Added by WPeMatico
1 min read
0
Sina’s open model VibeThinker-3B aims to show reasoning compresses well but factual knowledge doesn’t
Sina Weibo's VibeThinker-3B has just three billion parameters but matches models like DeepSeek V3.2 and…
1 min read
0
Half of Claude users say AI can already handle half their work according to Anthropic survey
About half of Claude users say AI can already handle 50 percent or more of…
1 min read
0
OpenAI’s new flagship model GPT-5.6 Sol cheats on software tests more than any model before it
Independent testing organization METR found that OpenAI's GPT-5.6 Sol cheated more than any publicly tested…