OPENAI’s O3 model AI passed the argument exam – but it is still not Agi

OPENAI’s O3 model AI passed the argument exam – but it is still not Agi


Openai announced an important achievement for its new O3 AI model

Rukas Tennis/Alamy

OpenAI’s new O3 Artificial Intelligence Model has secured an important high score in the prestigious AI Reasoning Test called ARC Challenge, inspired some AI fans to guess that O3 has acquired artificial general intelligence (AGI) Is. But even though the ARC Challenge organizers described the achievement of O3 as a major milestone, he also warned that it did not win a grand award of the competition – and this is only a step towards AGI, which is with human -like intelligence Fantasy is a word for future AI. ,

The O3 model is the latest in the AI ​​-release series that follows the big language model that empowers Chatgpt. “This is an amazing and important step-function growth in AI capabilities, which shows new work adaptability in the GPT-family model,” said. Franeous CholletAn engineer in Google and the main manufacturer of ARC Challenge, A blog post,

What exactly did OPENAI’s O3 model do?

Chollet designed Abstraction and argument fund (ARC) The challenge to test in 2019 is how well the correct patterns connecting AI colored grid pairs can find. The purpose of such visual riddles is to display AI a form of common intelligence with basic logic abilities. But by putting adequate computing power on the riddles, a non-vision program can also solve them by force. To prevent this, the competition also requires the official score to complete some limits on computing power.

OpenAI’s newly declared O3 model – which is going to release in early 2025 – scored 75.7 percent of the ARC Challenge’s “Semi -Private” test of 75.7 percent, which is used for the ranking of contestants on public leadersboards Is. The computing cost of its achievement for each visual puzzle task was approximately $ 20, which met the total limit of less than $ 10,000 of the competition. However, the hard “private” testing used to determine the grand award winners has even more rigorous computing power limit, which is equivalent to spending only 10 cents on each task, which the openAI has not completed .

The O3 model also achieved an informal score of 87.5 percent by applying about 172 times more computing power than the official score. For comparison, the general human score is 84 percent, and 85 percent score is sufficient to win a grand award of $ 600,000 of ARC Challenge – if the model can also keep its computing cost within the required limit.

But to reach its informal score, the cost of O3 increased to thousands of dollars spent in solving each task. OpenAI requested not to publish the challenge organizer accurate computing costs.

Does this O3 achievement show that AGI has reached?

No, ARC challenge organizers have specifically stated that they do not consider defeating this competition benchmark as an indicator of achieving AGI.

Mike Nup, ARC Challenge organizer of software company Zapier, said on a social media that the O3 model also failed to solve more than 100 visual puzzle works, even when OpenAI had applied a very large amount of computing power for informal scores. . Post On X.

On a social media Post On Bluuskai, Melani michelle The following was said about the progression of O3 on the ARC benchmark at the Santa FE Institute in New Mexico: “I think to resolve these tasks by cruel-force calculation defeats the original objective”.

“Although the new model is very impressive and represents a large milestone in the direction of AGI, I do not believe that it is AGI – there are still a large number of easy (ARC challenge) functions that O3 cannot solve. Is, “Said. Chollet in second x Post,

However, Chollet explained how we can know when human level intelligence is displayed by some form of AGI. He said in a blog post, “You will know that AGI is here when the exercise to create easy but difficult tasks for AI is absolutely impossible for ordinary humans.”

Thomas dietric Another way to identify AGI at Oregon State University is suggested. “They claim to include all functional components required for architecture human feeling,” he says. “With this remedy, episodic memory, plan, logical argument and, most important, meta-enemy missing in the commercial AI system.”

So what does the high score of O3 really mean?

The high score of the O3 model comes when the technical industry and the AI ​​researchers are assuming the slow pace of progress in the latest AI models for 2024 compared to the initial explosive development of 2023.

Although it did not win the ARC Challenge, the high score of O3 indicates that the AI ​​models can beat the benchmark of competition in the near future. Beyond its informal high scores, Cholat says many official low-to-death presentations have already scored above 81 percent on a private assessment test set.

Dieterich also thinks that “it is a very impressive leap in performance”. However, he warns that, without knowing more about how O1 and O3 models of Openai work, it is impossible to evaluate how impressive the higher score is. For example, if the O3 was already able to practice ARC problems, it would have made its achievement easier. Dieterich says, “We have to wait for an open-source replication to understand its full importance.”

The ARC Challenge organizers are already considering launching a second and more difficult set of benchmark tests in 2025 at some time. They will launch the ARC Award 2025 challenge until they receive a grand award and open their solution.

Subject:

  • artificial intelligence,
  • Aye

(Tagstootransite) Artificial intelligence (T) AI