Chat GPT 4 - Still Not Telling The Whole Truth

Written by Mike James

Wednesday, 15 March 2023

OpenAI has just announced GPT-4. The future may not be here just yet, but it's very close. Is this really the breakthrough moment for AI?

OpenAIbanner

GPT-4 is the next version of the Large Language Model created by OpenAI. GPT-3 created quite a fuss and GPT- 4 looks like it will continue the commotion. What makes GPT-4 different from 3 is that it is now "multimodal" - it will work with images and text.

OpenAI seems to be realistic about its new creation:

"...while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks."

On these benchmarks it does a lot better than GPT-3:

"For example, it passes a simulated bar exam with a score around the top 10% of test takers; in contrast, GPT-3.5’s score was around the bottom 10%."

No specific training for the exams was provided and this is an indication of what GPT-4 knows about the world. This is some achievement given that both systems use the same overall approach. However OpenAI acknowledges that in a more general setting the difference is subtle:

"In a casual conversation, the distinction between GPT-3.5 and GPT-4 can be subtle. The difference comes out when the complexity of the task reaches a sufficient threshold—GPT-4 is more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5."

This isn't too surprising - an expert talking about everyday things doesn't reveal much about their expertise.

From OpenAi's point of view an important step forward is the ability to predict how much better a model will be after a given amount of training.

What of the visual input? GPT 4 can accept prompts in text or images or a mixture. From the examples given it look impressive:

User

What is funny about this image? Describe it panel by panel.

gtp41

GPT-4

The image shows a package for a "Lightning Cable" adapter with three panels.

Panel 1: A smartphone with a VGA connector (a large, blue, 15-pin connector typically used for computer monitors) plugged into its charging port.

Panel 2: The package for the "Lightning Cable" adapter with a picture of a VGA connector on it.

Panel 3: A close-up of the VGA connector with a small Lightning connector (used for charging iPhones and other Apple devices) at the end.

The humor in this image comes from the absurdity of plugging a large, outdated VGA connector into a small, modern smartphone charging port.

Apart from this example, OpenAI isn't giving much away about the vision aspects of GPT 4 which isn't going to be made generally available until later. If the behavior of Google's PaLM LLM is anything to go by, this could be the most interesting part of GPT 4.

Of course, being based on the same sort of model GPT 4 has all of the problems of GPT 3 and similar:

"Despite its capabilities, GPT-4 has similar limitations as earlier GPT models. Most importantly, it still is not fully reliable (it “hallucinates” facts and makes reasoning errors). Great care should be taken when using language model outputs, particularly in high-stakes contexts, with the exact protocol (such as human review, grounding with additional context, or avoiding high-stakes uses altogether) matching the needs of a specific use-case."

Despite trying to keep the outputs correct, this is a real limitation of GPT 4 and similar models. It is difficult for it to avoid selecting things that are wrong but statistically plausible. This is the challenge that later models are going to have to meet to if they are to be useful outside of situations where being wrong is simply amusing or unimportant.

OpenAI plans to make GPT4 available as soon as possible:

ChatGPT Plus subscribers will get GPT-4 access on chat.openai.com with a usage cap. We will adjust the exact usage cap depending on demand and system performance in practice, but we expect to be severely capacity constrained (though we will scale up and optimize over upcoming months).

There is also an API which will charge at the range of $0.03 per1K prompt tokens and $0.06 per 1K complete tokens.

This isn't yet the age of general AI, but it's getting closer. How much you agree with me depends on how you see GPT-like models. Are they just sophisticated auto-complete machines or are they capturing something deep in the structure of language. I know which one I think is the answer, but even if I'm right we need to find a way to keep LLMs honest and true.

If I am right I wonder how long it will take for the information to trickle down. We sometimes speculate on the psychological impact of meeting extraterrestrials - we are not alone and not unique. Meeting a general intelligence based on the statistical properties of language should be equally disturbing. We are no more than them (it) ...

gpt4icon

More Information

https://openai.com/research/gpt-4

Google's Large Language Model Takes Control

Runaway Success Of ChatGPT

Open AI And Microsoft Exciting Times For AI

The Unreasonable Effectiveness Of GPT-3

AI Helps Generate Buggy Code

The Robots Are Coming - AlphaCode Can Program!

The Unreasonable Effectiveness Of GPT-3

Does OpenAI's GPT-2 Neural Network Pose a Threat to Democracy?

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Linus - Garbage Code And AI Code
15/10/2025

Linus Torvalds seems to have had a bit of a relapse lately in his efforts to moderate his comments on code. One of his latest outbursts merits more than a surface analysis.

+ Full Story

Steve Jobs And Cray-1 Will Appear On $1 Coins
17/10/2025

The United States Mint has now released the designs for the 2026 American Innovation $1 Coin Program, including two related to computing - Steve Jobs and the Cray-1 supercomputer.

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Wednesday, 15 March 2023 )

More Information

Related Articles

Comments