According to the study, ChatGPT is getting worse and worse

July 21, 2023 Fiona Smith

According to a survey conducted around half a year ago, every fourth person in Germany already knows ChatGPT or even uses the AI. In the meantime, there are probably even more people working with AI. Have the AI write an essay for the university, formulate an application for a new job or write a letter to a customer – the possibilities for which the chatbot can be considered are diverse.

But: If you’ve been working with ChatGPT lately and weren’t entirely satisfied with the performance of the AI, you can now be sure that you weren’t imagining it.

Contents

1 ChatGPT isn’t getting better, it’s getting worse
2 ChatGPT is a math genius? Are you kidding me? Are you serious when you say that!
3 What explains ChatGPT’s drop in performance?
4 Experts have their own theory
5 And what does OpenAI say about this?

ChatGPT isn’t getting better, it’s getting worse

Editor’s Recommendations

Like researchers from Stanford and Berkeley Universities in a new paper revealed, ChatGPT has not improved over time. In contrast, the new study actually shows that the current GPT-4 model performed worse and worse over time in the tested tasks.

In their research, the scientists analyzed in particular the change in the nature of ChatGPT’s responses and found that the performance of the underlying AI models GPT-3.5 and GPT-4 actually “vary greatly”.

They developed rigorous benchmark tests to assess ChatGPT’s proficiency in math, coding, and visual brain teasers. The frightening result: In fact, the current GPT-4 model even shows a drop in performance.

ChatGPT is a math genius? Are you kidding me? Are you serious when you say that!

An example: In a mathematical challenge to determine prime numbers, ChatGPT was able to solve 488 of 500 questions correctly in March, which corresponds to an accuracy of 97.6 percent. In June, on the other hand, ChatGPT was only able to correctly answer 12 questions, which corresponds to an accuracy level of just 2.4 percent. The decline was particularly noticeable in the chatbot’s software coding capabilities.

“With GPT-4, the proportion of generated code that was directly executable dropped from 52 percent in March to 10 percent in June,” the study said. These results were obtained using the pure version of the models. That means: No code interpreter plugins were used.

The ChatGPT researchers also wanted to know whether 17,077 is a prime number. Although the answer to that is yes, ChatGPT saw an extreme drop in accuracy of 95.2 percent. On the other hand, the hit rate for the same question in the free version of ChatGPT, GPT-3.5, increased from 7.4 to 86.8 percent.

What explains ChatGPT’s drop in performance?

Researchers suspect that this could be a side effect of optimizations made by OpenAI, the creator of the model. One possible cause is changes introduced to prevent ChatGPT from answering dangerous questions.

However, these security measures could affect ChatGPT’s usefulness for other tasks. The scientists also noted that the model now tends to give verbose and indirect answers instead of clear answers.

Experts have their own theory

“GPT-4 gets worse over time, not better,” AI expert Santiago Valderrama wrote on Twitter. Valderrama also raised the possibility that a “cheaper and faster” mix of models could have replaced the original ChatGPT architecture.

“Rumor has it that they use several smaller and specialized GPT-4 models that function similarly to one large model but are cheaper to run,” he speculated. This could, he believes, speed up response times for users, but reduce proficiency.”

Another expert, Dr. Jim Fan, also shared his findings in a Twitter thread. “Unfortunately, security usually comes at the expense of utility,” he wrote.

It continues: “My guess (no evidence, just speculation) is that OpenAI spent the majority of its efforts constraining the model from March to June and did not have time to fully restore the other relevant capabilities.”

And what does OpenAI say about this?

Peter Welinder, manager at OpenAI, tweeted in response to the allegations that ChatGPT was getting worse and worse: “No, we didn’t make GPT-4 stupider. On the contrary: we make each new version smarter than the previous one.”

Do not miss any news about software & development 💌

Note on the newsletter & data protection

Almost finished!

Please click on the link in the confirmation email to complete your registration.

Would you like more information about the newsletter? Find out more now

ApkRig

According to the study, ChatGPT is getting worse and worse

ChatGPT isn’t getting better, it’s getting worse

ChatGPT is a math genius? Are you kidding me? Are you serious when you say that!

What explains ChatGPT’s drop in performance?

Experts have their own theory

And what does OpenAI say about this?

Leave a Reply Cancel reply

ChatGPT isn’t getting better, it’s getting worse

ChatGPT is a math genius? Are you kidding me? Are you serious when you say that!

What explains ChatGPT’s drop in performance?

Experts have their own theory

And what does OpenAI say about this?

You May Also Like

Leave a Reply Cancel reply