OpenAI’s Engineers No Longer Write Code. By September, They Want AI to Do the Science Too

16 min read 150 views

Sources: MIT Technology Review (exclusive interview with Jakub Pachocki, OpenAI Chief Scientist, March 20, 2026), Sam Altman X post (March 2026), SiliconAngle, VKTR, NewsBytesApp, theCUBE Research. All timelines and quotes directly sourced.

Table of Contents

Futuristic AI research environment representing OpenAI's plan to build a fully automated AI researcher by 2028

Nobody at OpenAI Really Edits Code Anymore

Jakub Pachocki is OpenAI’s chief scientist. He sets the company’s long-term research agenda. He was one of the key architects of GPT-4 and the reasoning models that now run under every major AI chatbot and agent system. When he gives an exclusive interview to MIT Technology Review describing what OpenAI is building next, it is worth reading carefully.

The line that jumped out to a lot of people who read the interview this week was not about superintelligence or AGI. It was a throwaway description of how his own team works now. “Our jobs are now totally different than they were even a year ago,” Pachocki said. “Nobody really edits code all the time anymore. Instead, you manage a group of Codex agents.”

The company that builds the most widely used AI coding tools in the world has internally moved past the era where engineers write code. The engineers at OpenAI now direct AI agents that write the code. That shift happened in roughly twelve months. Pachocki described it as a proof of concept for what comes next, not a destination but a preview.

What comes next is considerably more ambitious. OpenAI has declared its new North Star: building a fully automated AI researcher capable of tackling scientific problems that are too large or complex for humans to work through alone. The timeline is specific. An AI research intern by September 2026. A full automated multi-agent research system by 2028. Sam Altman confirmed both targets publicly this week.

The Plan: An Intern by September, a Full Researcher by 2028

Pachocki’s description of the two-stage plan is unusually concrete for an AI roadmap announcement. Most companies describe future AI capabilities in vague terms that could mean almost anything. The September and 2028 targets are specific enough to be checkable, which is either a sign of genuine confidence or a calculated risk on credibility.

The September 2026 target is an autonomous AI research intern, a system that can be delegated tasks that would take a human researcher a few days to complete. Not a few minutes. Not answering questions. Tasks with days-long timelines that involve sustained reasoning, information gathering, hypothesis generation, and reporting back with findings. The intern is described as a meaningful jump beyond current tools precisely because of the time horizon. Current AI tools are good at hour-scale tasks. The intern would operate at day-scale autonomy.

The 2028 target is the full vision: a fully automated multi-agent research system capable of working on problems too large or complex for humans to handle. Pachocki gave specific examples of what that means in practice. New mathematical proofs or conjectures. Advances in biology and chemistry. Business and policy challenges requiring the integration of enormous amounts of information across many domains. In theory, any problem that can be expressed in text, code, or whiteboard notation is within scope. Which covers a very wide range of problems.

Sam Altman posted his own framing on X this week: “In 2026 we expect that our AI systems may be able to make small new discoveries; in 2028 we could be looking at big ones.” He also added a qualifier that Pachocki echoed in his interview: “We may totally fail at this goal.” That admission of potential failure in the same sentence as the announcement is either unusual honesty or careful hedging. It might be both.

The confirmed timeline from OpenAI’s Chief Scientist and Sam Altman:
September 2026: Autonomous AI research intern running on hundreds of thousands of GPUs. Can handle tasks that take a human researcher a few days.
2028: Fully automated multi-agent AI researcher. Capable of making new scientific discoveries in mathematics, biology, chemistry, and other fields.
Current (now): OpenAI’s Codex agent is described as “an early version” of the researcher. Most OpenAI engineers now manage Codex agents rather than writing code directly.

Who Is Jakub Pachocki and Why His Words Matter Here

Pachocki became OpenAI’s chief scientist in May 2024 after Ilya Sutskever’s departure. Before that he was a key researcher on the GPT-4 project, which represented the largest capability leap in language model history at the time. He also led the work on OpenAI’s reasoning models, the o1 and subsequent series that introduced chain-of-thought reasoning as the dominant architecture for hard problem-solving.

His background is relevant because the AI researcher project is not a new idea being tried for the first time. It is the convergence of three research lines Pachocki has been working on for years: reasoning models that can work through problems step by step, long-context capability that allows systems to manage very large amounts of information coherently over extended time, and agent architecture that allows multiple systems to run in parallel on sub-problems of a larger challenge.

The reasoning model work produced GPT-o1. The long-context work enabled the sustained multi-hour agentic sessions that tools like Cursor and Claude Code now support in production. The agent architecture work is what Codex currently represents. Pachocki’s argument, made explicitly in the MIT Technology Review interview, is that these three strands have matured enough independently that combining them into a unified system is now the logical next step rather than a distant moonshot.

That does not mean it will work on the announced timeline. But it means the announcement is grounded in specific technical building blocks that exist rather than being a pure aspiration.

What an Automated AI Researcher Would Actually Do

The description of what the 2028 system would handle is worth spending time on because the claims are extraordinary enough that unpacking them is useful.

Mathematics is the first domain Pachocki mentioned specifically. An automated researcher that can generate new mathematical proofs or conjectures would be operating in territory where human mathematicians spend entire careers. Proofs of significant theorems are not produced quickly by anyone. Andrew Wiles spent seven years proving Fermat’s Last Theorem. The Riemann Hypothesis has been open for 167 years. An AI system that could make meaningful progress on these classes of problems would represent a qualitative change in what mathematics produces per unit time, not an incremental efficiency improvement.

Biology and chemistry are the second domain. Drug discovery specifically is a problem where the search space is so large and the interactions so complex that human researchers can only explore a tiny fraction of it in a lifetime. AI systems that could autonomously navigate protein folding landscapes, identify candidate compounds, design experimental tests, and iterate based on results at continuous machine speed would compress timelines for therapeutic discovery in ways that have large real-world consequences. DeepMind’s AlphaFold work on protein structure prediction was a preview of what happens when AI systems start making genuine contributions in this domain. The automated researcher vision is that preview applied to a much broader class of biological questions.

The business and policy domain is the third category Pachocki mentioned, and it is the most immediately legible to most readers. An AI system that can take a complex policy question, gather relevant information across thousands of sources, model the interactions between different variables, run scenario analyses, and produce structured findings would be doing work that currently requires teams of analysts months to produce. Whether that work would be trusted by the humans who need to act on it is a different question, but the capability claim is that the system could produce it.

Scientific research laboratory representing the domains an automated AI researcher would tackle including math biology and chemistry

How Far Away Is This, Really?

Pachocki’s framing is careful in a specific way worth noting. He did not predict AGI. He did not claim the 2028 system would be as smart as humans in all ways. His exact words were: “Even by 2028, I don’t expect that we’ll get systems as smart as people in all ways. I don’t think that will happen.” He then added the qualifier that has become the key phrase in coverage of this announcement: “But I don’t think it’s absolutely necessary. The interesting thing is you don’t need to be as smart as people in all their ways in order to be very transformative.”

That is an important distinction. The claim is not that the 2028 system will match or exceed human intelligence broadly. The claim is that it will be capable enough in specific research-relevant tasks to generate meaningful new knowledge autonomously. Those are very different claims and the second is considerably more credible than the first given the current state of the technology.

The September intern target is the near-term checkpoint. If OpenAI delivers a system by September that can reliably handle research tasks with day-scale completion times, that is strong evidence the longer-term trajectory is on track. If September comes and the intern is delayed or significantly reduced in capability from what Pachocki described, that is an equally informative signal. The specific timeline is what makes this announcement more than a vision document. It is a commitment to a checkable milestone in six months.

The Safety Problem Nobody Has Fully Solved

The safety dimension of a fully autonomous research system operating for extended periods in a data center is serious enough that Pachocki addressed it directly in the interview rather than leaving it to critics to raise afterward. His current primary safeguard is chain-of-thought monitoring, where models are trained to narrate their reasoning in an internal scratchpad that can be audited by other AI systems looking for signs of unwanted behavior.

Pachocki described this as the linchpin of safe autonomy for long-running systems: “Once we get to systems working mostly autonomously for a long time in a big data center, I think this will be something that we’re really going to depend on.” The honesty of that statement, that a technique they depend on is one they are still actively developing, is notable.

Researchers in the AI safety community have pointed out that chain-of-thought monitoring is promising but not a complete solution. Language models are not yet well enough understood to be fully controlled. A model capable enough to generate novel scientific hypotheses is capable enough to pursue instrumental goals in ways that might not be visible in its narrated reasoning. The gap between “the model narrates its reasoning” and “the narrated reasoning accurately reflects what the model is actually doing” is a fundamental open problem in interpretability research. OpenAI knows this. Pachocki’s acknowledgment that this is what they will “really depend on” suggests either confidence that the technique will mature enough by 2028 or acceptance of a residual safety risk that they have decided to manage rather than eliminate.

The broader safety concern raised by an autonomous research system is the question of what happens when it makes a mistake. A human researcher who generates an incorrect hypothesis wastes some time. An automated multi-agent system running at machine speed on hundreds of thousands of GPUs that pursues an incorrect direction could waste enormous compute resources and potentially generate misleading results at a scale that is hard to audit quickly. The error correction mechanisms for systems operating at this autonomy level and time horizon are not fully designed yet. Pachocki acknowledged this gap rather than claiming it was solved.

The Super App: ChatGPT, Atlas, and Codex Collapsing Into One

The automated researcher announcement was not the only major OpenAI news this week. The same day the MIT Technology Review interview published, the Wall Street Journal reported that OpenAI is building a super app that will merge ChatGPT, the Atlas browser, and the Codex programming assistant into a single desktop client.

OpenAI’s chief executive of applications, Fidji Simo, sent an internal note explaining the rationale: “We realized we were spreading our efforts across too many apps and stacks, and that we need to simplify our efforts. That fragmentation has been slowing us down and making it harder to hit the quality bar we want.” OpenAI also confirmed that Codex now has more than 2 million users, triple the number at the start of 2026. The company is also acquiring Astral, a startup that builds Python tooling for developers, to enhance Codex’s capabilities further.

The super app is described as a desktop client that handles writing code, analyzing data, browsing the web, and running agentic workflows from a single interface. The framing, and the parallel acquisition of Astral, suggests OpenAI is trying to own the full developer workflow rather than individual tools within it. Cursor, which currently has 90 percent of Salesforce developers using it, is the most direct competitive target. The super app is OpenAI’s bid to bring that population back to a first-party experience.

Sam Altman Said Something Unusually Honest

The X post Altman published this week alongside Pachocki’s interview included a sentence that has gotten less attention than it deserves. After laying out the September intern and 2028 researcher timeline, he added: “We may totally fail at this goal.”

That sentence is worth flagging specifically because it is not how AI companies typically communicate about research goals. The standard mode is aspirational certainty: the technology is coming, it will be transformative, here is the timeline. Altman’s public acknowledgment that the company might completely fail to hit its stated goal is either a sign of unusual intellectual honesty from someone who has been spectacularly wrong about timelines before, or a carefully calculated hedge that gives the company room to miss without a major credibility blow. It might be both simultaneously.

The comparison to previous ambitious OpenAI predictions is relevant context. OpenAI has made multiple timeline predictions that did not land where stated. The self-imposed acknowledgment of possible failure in the same breath as the announcement of the goal is a departure from that pattern that is worth taking seriously rather than dismissing as a rhetorical move. If the September intern does not arrive on schedule, Altman has pre-acknowledged that possibility. If it does arrive, the “we may totally fail” caveat looks like honesty rather than hedging.

What Google DeepMind and Anthropic Are Doing While OpenAI Announces This

OpenAI is not building toward the automated researcher in a competitive vacuum. Google DeepMind has had a dedicated AI for science research team for years, and the work that produced AlphaFold, AlphaMath, and AlphaCode is an independent research program aimed at exactly the domains Pachocki described. DeepMind’s approach has been more incremental, targeting specific high-value scientific problems with specialized systems, rather than a general automated researcher. Whether the specialized approach or OpenAI’s general approach produces more scientific value in the same timeframe is genuinely uncertain.

Anthropic’s positioning has been more cautious about autonomous research claims. The company’s CEO Dario Amodei has described building the equivalent of “a country of geniuses in a data center” as a long-term vision, and Anthropic’s Constitutional AI approach to safety is explicitly designed for the kind of extended autonomous operation the researcher would require. But Anthropic has not announced specific research automation milestones with the timeline specificity that OpenAI published this week.

The competitive pressure that makes this announcement strategically significant is visible in the market share data that accompanied it. OpenAI has lost ground to Anthropic in the enterprise market according to Axios reporting this week. GPT-4’s era of unquestioned capability leadership is over. The automated researcher announcement, alongside the super app and the Astral acquisition, looks like an attempt to reassert a capability trajectory that convinces enterprise customers and investors that OpenAI is still the company defining where AI goes next.

What This Changes If It Actually Works

The honest answer is that if the 2028 system delivers meaningfully on what Pachocki described, it changes the pace of scientific discovery in a way that has no clear historical precedent. Science currently advances at the speed of human researchers working on problems for the duration of careers. An automated system that can work continuously at machine speed on complex problems in mathematics, biology, chemistry, and policy represents a compression of that timeline that is genuinely difficult to reason about from where we currently stand.

The analogy Pachocki used in the interview captures the aspiration precisely: “I think we will get to a point where you kind of have a whole research lab in a data center.” A research lab that does not sleep, does not forget previous results, can run thousands of parallel threads of inquiry simultaneously, and does not experience the career pressures that cause human researchers to sometimes avoid high-risk high-reward problems in favor of publishable incremental work.

The questions this raises are not primarily technical at this stage. The technical building blocks exist and are being assembled. The questions are about verification, trust, and institutional readiness. If an AI system proposes a new mathematical proof, who checks it? If it proposes a new drug candidate, what does the validation pipeline look like at machine-speed throughput? If it proposes a policy intervention, who is accountable for the recommendation? These are not questions that OpenAI can answer alone, and they are not questions that September or 2028 targets will resolve.

What September will tell us is whether the intern milestone is real. That is the first checkpoint and it is six months away. The automated researcher vision is compelling enough that the September deadline matters more than almost any other AI benchmark release scheduled for this year. If it lands, the 2028 target becomes the most watched date in AI development. If it does not, we learn something important about the gap between the vision and the current state of the technology.

What do you think: is September 2026 realistic for an AI system that can handle multi-day research tasks autonomously? Drop your take in the comments. The gap between “technically plausible” and “actually shipping on schedule” is where the most useful debate lives right now.

References (March 21, 2026):
MIT Technology Review exclusive interview with Jakub Pachocki (March 20, 2026): technologyreview.com
Sam Altman X post (September intern, 2028 researcher, “may totally fail” admission): x.com/sama
VKTR: “OpenAI Wants to Build Science’s Most Tireless Researcher by 2028” (chain-of-thought monitoring, Pachocki quotes): vktr.com
SiliconAngle: “OpenAI to launch ChatGPT superapp, AI research intern” (super app, Astral acquisition, Fidji Simo internal note): siliconangle.com
NewsBytesApp: “OpenAI is building AI researcher that needs no human help” (safety concerns, interpretability gaps): newsbytesapp.com
Axios: “OpenAI has lost ground to Anthropic in the enterprise market” (competitive context): axios.com
OpenAI Codex (2M+ users, triple growth since January 2026): OpenAI official statement

OpenAI’s engineers no longer edit code. They manage AI agents that write it.
By September, those same agents are supposed to start doing the science too.

Leave a Reply

Your email address will not be published. Required fields are marked *