AI 2024- OpenAI Thinks ChatGPT Thinks

AI TOOLS 2024.

Although many disagree, OpenAI’s most recent model, o1, is said to be “reasoning” and even “thinking.” In addition to the well-known doubter Gary Marcus, Hugging Face CEO Clem Delangue was obviously unimpressed with the “thinking” claim this time around.

“Once again, an AI system is not ‘thinking,’ it’s ‘processing,’ ‘running predictions,’ just like computers or Google do,” Delangue remarked in reference to OpenAI’s misrepresentation of what its newest model is capable of. “Making you believe that technology systems are human is just cheap marketing gimmick to make you believe it’s smarter than it is,” he continued.

However, isn’t that just how thought operates? Phillip Rhodes retorted, “Once again, human minds aren’t ‘thinking,’ they are just executing a complex series of bio-chemical / bio-electrical computing operations at massive scale.”

Table of Contents

How is o1 Thinking?

The introduction of OpenAI is heralded as “the beginning of a new paradigm: AI that can do general-purpose complex reasoning,” according to CEO Sam Altman. Unlike previous OpenAI models that begin producing text as soon as you give it a command, the new model actually takes some time to ponder before answering.

In order to accomplish this, the model generates a lengthy internal thought chain that prompts it to react to the user. Because of this, the team has also inadvertently warned against asking the model generic queries, believing that its strong reasoning powers would be more useful for solving challenging PhD-level puzzles and producing responses that are precisely accurate at that level.

But apart from coding and maths, this reasoning capability is the special highlight of the release. These ‘reasoning’ and ‘thinking’ capabilities were long being touted as the next frontier by Altman in all his speeches and it seems to finally be landing on the right spot.

According to the Learning to Reason with LLMs blog, the reinforcement learning algorithm developed by OpenAI helps the model think more efficiently by refining its thought process through a data-efficient training method.

Over time, the performance of “o1” improves as more training and thinking time is added. This differs from traditional LLM pretraining, which focuses more on expanding the size of the model, instead of focusing on increasing reasoning with a small model.

Through reinforcement learning, o1 improves its reasoning skills by breaking down complex problems, correcting mistakes, and trying new approaches when needed. This greatly enhances its ability to handle complicated prompts that require more than just predicting the next word—it can backtrack and “think” through the task.

However, a key challenge is that the model’s reasoning process remains hidden from users, even though they are billed for it, which are called “reasoning tokens.”

o1 is explicitly instructed to not disclose the “hidden chain-of-thought” which is done using “reasoning tokens”. and to not let users trick it or “ask for step by step”.

It didn’t disclose it to me but it’s readable in the “thought” summary.

But yeah o1 seems to literally be… pic.twitter.com/mWeKz37X7G

— Lewis N Watson (@LewisNWatson) September 13, 2024

OpenAI has explained that hiding the reasoning steps is necessary for two main reasons. First, for safety and policy compliance, as the model needs freedom to process without exposing sensitive intermediary steps. Second, to maintain a competitive advantage by preventing other models from using their reasoning work. This hidden process allows OpenAI to monitor the model’s thought patterns without interfering with its internal reasoning.

Not for Everyone and Focus on Inference

As Jim Fan explained, this “Strawberry” or o1 model is marking a significant shift towards inference-time scaling in production, a concept that focuses on improving reasoning through search rather than just learning.

Reasoning doesn’t require large models. Many parameters in current models are dedicated to memorising facts for trivia-like benchmarks. Instead, reasoning can be handled by a smaller “reasoning core” that interacts with external tools, like browsers or code verifiers.

This approach reduces the need for massive pre-training compute. A significant portion of compute is now dedicated to inference, rather than pre- or post-training. LLMs simulate various strategies, similar to how AlphaGo uses Monte Carlo Tree Search (MCTS). Over time, this leads to better solutions as the model converges on the best strategy.

This was also explained by Subbarao Kambhampati in his post.

My (pure) speculation about what OpenAI o1 might be doing

[Caveat: I don’t know anything more about the internal workings of o1 than the handful of lines about what they are actually doing in that blog post–and on the face of it, it is not more informative than “It uses Python… pic.twitter.com/QgDjLycLif

— Subbarao Kambhampati (కంభంపాటి సుబ్బారావు) (@rao2z) September 12, 2024

OpenAI likely discovered the benefits of inference scaling early on, while academic research has only recently caught up.

While effective in benchmarks, deploying o1 for real-world reasoning tasks presents challenges. Determining when to stop searching, defining reward functions, and managing compute costs for processes like code interpretation are complex issues that need to be solved for broader deployment.

o1 can act as a data flywheel, where correct answers generate training data, complete with both positive and negative rewards. This process improves the reasoning core over time, similar to how AlphaGo’s value network refined itself through MCTS-generated data. This would in the end create more valuable data.

So probably, we can say that ChatGPT is now thinking, that is why it gets better as you spend more time with it, and OpenAI doesn’t care much about speed.

AI Tools

AI Tools

AI 2024- OpenAI Thinks ChatGPT Thinks

How is o1 Thinking?

Not for Everyone and Focus on Inference

Leave a Reply Cancel reply

Other Story

How to maximize the “intelligence” of an LLM with the tree-inf-thought

Google unveils a completely free autonomous code agent

The top 5 best MCP servers of the moment

6 Key Strategies Behind V-JEPA 2 by Meta: How It’s Revolutionizing Vision-Only AI

7 Essential Prerequisites for a Successful Agentic AI Project

How to Get Your Own Free Local AI Agent: MCP & Local LLMs Explained