Microsoft’s Bing AI made several factual errors during last week’s launch demo; “Completely made up some answers”
First, it was Google. Now, it turned out that Microsoft Bing even made the worse factual errors in last week’s launch demo of Bing AI. Microsoft wowed the audience after it showed off its new AI-Powered Bing as it attempts to challenge Google in search, of which Google currently controls over 90 % of the market. However, “Bing AI got some answers completely wrong during its demo and no one noticed.”
During the demo of its chatbot technology last week, the ChatGPT-like technology embedded in the company’s Bing search engine analyzed several earnings reports but produced some incorrect numbers for Gap and Lululemon. The demo received rave reviews from many people on social. Microsoft also said that more than one million people signed up to try Microsoft’s new AI tool in the first 48 hours.
However, in comparing Bing AI’s answers to the actual reports, the chatbot missed some numbers. Some other answers appear to have just been made up. In a Substack post on Monday titled “Bing AI Can’t Be Trusted,” independent search researcher Dmitri Brereton wrote:
“Bing AI got some answers completely wrong during their demo. But no one noticed. Instead, everyone jumped on the Bing hype train.”
Brereton added.
“Google’s Bard got an answer wrong during an ad, which everyone noticed. Now the narrative is “Google is rushing to catch up to Bing and making mistakes! That would be a fine narrative if Bing didn’t make even worse mistakes during its own demo,”
After carefully comparing answers from the Bing AI demo against the real-live data, Brereton identified possible factual issues in the Microsoft demo in its responses about vacuum cleaner specifications and travel plans to Mexico in addition to the financial errors.
In an interview with CNBC, Brereton said: “he wasn’t initially looking for errors, and only discovered them when he looked more closely to write a comparison of the AI unveilings from Microsoft and Google.”
Pet Vacuums
Brereton checked to see how Bing AI fared when it was asked to find the best pet vacuums. Below is how it all went. Bing AI answers are shown in the screenshot below.
“According to this pros and cons list, the “Bissell Pet Hair Eraser Handheld Vacuum” sounds pretty bad. Limited suction power, a short cord, and it’s noisy enough to scare pets? Geez, how is this thing even a best seller?
“Oh wait, this is all completely made up information,” Brereton wrote.
Bing AI was kind enough to give us its sources, so we can go to the hgtv article and check for ourselves.
The cited article says nothing about limited suction power or noise. In fact, the top amazon review for this product talks about how quiet it is.
The article also says nothing about the “short cord length of 16 feet” because it doesn’t have a cord. It’s a portable handheld vacuum.
I hope Bing AI enjoys being sued for libel.”
The worst mistake made during the Bing AI demo
According to Brereton, by far the worst mistake made during the demo was that of Gap Financial Statement Summary. Calling it “the most unexpected,” Brereton said, “I thought that summarizing a document would be trivial for AI at this point. But Bing AI manages to take a simple financial document, and make all the numbers wrong.”
Below are Brereton’s findings.
“Gap Inc. reported net sales of $4.04 billion, up 2% compared to last year, and comparable sales were up 1% year-over-year”
Bing AI starts off fine. This statement is totally correct, probably because it is a direct copy paste from the financial document.
“Gap Inc. reported gross margin of 37.4%, adjusted for impairment charges related to Yeezy Gap, and merchandise margin declined 370 basis points versus last year due to higher discounting and inflationary commodity price increases”
Uh…no. That’s the unadjusted gross margin. The gross margin adjusted for impairment charges was 38.7%. And the merchandise margin declined 480 basis points if we’re adjusting for impairment charges.
Don’t worry, it gets much worse.
“Gap Inc. reported operating margin of 5.9%, adjusted for impairment charges and restructuring costs, and diluted earnings per share of $0.42, adjusted for impairment charges, restructuring costs, and tax impacts.”
“5.9%” is neither the adjusted nor the unadjusted value. This number doesn’t even appear in the entire document. It’s completely made up.
The operating margin including impairment is 4.6% and excluding impairment is 3.9%.
The diluted earnings per share is also a completely made up number that doesn’t appear in the document. Adjusted diluted earnings per share is $0.71 and unadjusted is $0.77.
“Gap Inc. reaffirmed its full year fiscal 2022 guidance, expecting net sales growth in the low double digits, operating margin of about 7%, and diluted earnings per share of $1.60 to $1.75.”
No…they don’t expect net sales growth in the low double digits. They expect net sales to be down mid-single digits.
And I didn’t see anything else in this document about the future outlook for operating margin, or diluted earnings per share. So Bing AI either got that from a separate document, or made it up completely.
Conclusion
In Microsoft’s defense, a company spokesperson said during the demo, “We recognize that there is still work to be done and are expecting that the system may make mistakes during this preview period,, which is why the feedback is critical so we can learn and help the models get better.” Microsoft also said it knows about the errors and that it expects Bing AI to make mistakes.
“We’re aware of this report and have analyzed its findings in our efforts to improve this experience.”
While both tech giants are rushing to integrate new kinds of generative AI into their respective search engines and are eager to show their advancements following the explosion of ChatGPT, it appears the two companies still have a long way to go.