The latest ChatGPT is supposed to be âPhD levelâ smart. It canât even label a map
cnn_businessđ Article Content
A version of this story appeared in CNN Businessâ Nightcap newsletter. To get it in your inbox, sign up for freehere. Sam Altman, the artificial intelligence hype master, is in damage-control mode. OpenAIâs latest version of its vaunted ChatGPT bot was supposed to be âPhD-levelâ smart. It was supposed to be the next great leap forward for a company that investors have poured billions of dollars into. Instead, ChatGPT got a flatter, more terse personality that canât reliably answer basic questions. The resulting public mockery has forced the company to make sweaty apologies while standing by its highfalutin claims about the botâs capabilities. In short: Itâs a dud. The misstep on the model, called GPT-5, is notable for a couple of reasons. 1. It highlighted the many existingshortcomings of generative AI that critics were quick to seize on (more on that in a moment, because they were quite funny). 2. It raised serious doubts about OpenAIâs ability to build and market consumer products that human beings are willing to pay for. That should be particularly concerning for investors, givenOpenAI, which has never turned a profit, is reportedly worth $500 billion. Letâs rewind a bit to last Thursday, when OpenAI finally released GPT-5 to the world â about a year behind schedule,according to the Wall Street Journal.Now, one thing this industry is really good at is hype, and on that metric, CEO Sam Altmandelivered. During a livestream ahead of the launch last Thursday, Altman said talking to GPT-5 would be like talking to âa legitimate PhD-level expert in anything, any area you need.â In his typically lofty style, Altman said GPT-5 reminds him of âwhen the iPhone went from those giant-pixel old ones to the retina display.â The new model, he said, is âsignificantly better in obvious ways and subtle ways, and it feels like something I donât want to ever have to go back from,â Altman said in a press briefing. Then people started actually using it. Users had a field day testing GPT-5 and mocking its wildly incorrect answers. 500 billion dollars and the robot can't even count to twelve[image or embed] The journalist Tim Burke said onBlueskythat he prompted GPT-5 to âshow me a diagram of the first 12 presidents of the United States with an image of their face and their name under the image.â The bot returned an image of nine people instead, with rather creative spellings of Americaâs early leaders, like âGearge Washingionâ and âWilliam Henry Harrtson.â A similar prompt for the last 12 presidents returned an image that included two separate versions of George W. Bush. No, not George H.W. Bush, and then Dubya. It had âGeorge H. Bush.â And then his son, twice. Except the second time, George Jr. looked like just some random guy. At first I thought GPT 5 had got this right then I saw things like "Tonnessee," "Mississipo" and my personal favourite "West Wigina." Please do not respond just saying the different typos to me we can all read the joke, we all know about "Distrricke"[image or embed] Labeling basic maps of the United States also proved tricky for GPT-5 (but again, pretty funny, as tech writer Ed Zitronâspost on Blueskyshowed). GPT-5 did slightly better when I asked it on Wednesday for a map of the US. Some people can, in fact, label the great state of Vermont correctly without a PhD, but not GPT-5. And this is the first Iâm hearing of states named âYirginia.â The slop coming out of GPT-5 was funny when it was just us nerds trying to find its blind spots. But some regular fans of ChatGPT werenât laughing. Especially because users have been particularly alarmed by the new versionâs personality â or rather, lack thereof. In rolling out the new model, OpenAI essentially retired its earlier models, including the wildly popular GPT-4o thatâs been on the market for over a year, making it so that even people who loved the previous iteration of the chatbot suddenly couldnât use it. More than 4,000 peoplesigned a Change.org petitionto compel OpenAI to resurrect it. âIâm so done with ChatGPT 5,â one userwrote on Reddit, explaining how they tried to use the new model to run âa simple systemâ of tasks that an earlier ChatGPT model used to handle. The user said GPT-5 âwent rogue,â deleting tasks and moving deadlines. And while OpenAIâs defenders could chalk that up to an isolated or even made-up incident, within 24 hours of the GPT-5 launch Altman was doing damage control, seemingly caught of guard by the bad reception. OnX, he announced a laundry list of updates, including the return of GPT-4o for paid subscribers. âWe expected some bumpiness as we roll out so many things at once,â Altman said in a post. âBut it was a little more bumpy than we hoped for!â The CEOâs failure to anticipate the outrage suggests he doesnât have a firm grasp on how an estimated 700 million weekly active users are engaging with his product. Perhaps Altman missed all the coverage â fromCNN,the New York Times,the Wall Street Journalâ of people forming deep emotional attachments to ChatGPT or rival chatbots, having endless conversations with them as if they were real people. A simple search of Reddit could have offered insights into how others are integrating the tool into their workflows and lives. Basic market research should have shown OpenAI that a mass update sunsetting the tools people rely on would be more than just a bit bumpy. When asked about the backlash to GPT-5, an OpenAI representative pointed CNN to Altmanâs public statementson social mediaannouncing the return of older models, as well asa blog postabout how the company is optimizing GPT-5. The messy rollout speaks to how the AI industry as a whole is struggling to prove themselves as producers of consumer goods rather than âlabsâ â as they love to call themselves, because it sounds more scientific and distracts people from the fact that they are backed by people who are trying to make unfathomable amounts of cash for themselves. AI companies often base their fanfare around how a model performs in various behind-the-scenes benchmark tests that show how well a bot can do complex math. For all we know, GPT-5 sailed through those evaluations. But the problem is that OpenAI hyped the thing so far into the stratosphere, disappointment was (or should have been) inevitable. âI honestly didnât think OpenAI would burn the brand name on something so mid,âwroteprominent researcher and AI critic Gary Marcus. âIn a rational world, their valuation would take a hit,â he added, noting OpenAI still hasnât turned a profit, is slashing prices to keep its user numbers up, and is hemorrhaging talent as competition heats up. For critics like Marcus, the GPT-5 flop was a kind of vindication. As he noted in a blog post, other models like Elon Muskâs Grok arenât faring much better, and the backlash from even AI proponents feels like a turning point. When people talk about AI, theyâre talking about one of two things: the AI we have now â chatbots with limited, defined utility â and the AI that companies like Altmanâs claim they can build â machines that can outsmart humans and tell us how to cure cancer, fix global warming, drive our cars and grow our crops, all while entertaining and delighting us along the way. But the gap between the promise and the reality of AI only seems to widen with every new model. CNNâs Lisa Eadicicco contributed reporting.