OpenAI announced the fourth generation of its giant language model (GPT) called (GPT-4), which is expected to make a major breakthrough in the capabilities of generative artificial intelligence, because it is a multi-modal language model (Multi-Modal LLM), meaning that it can accept image and text inputs and generate text outputs.
The latest version of GPT-4 marks a milestone in the field of artificial intelligence, particularly in natural language processing. So today in this article we will review the new possibilities offered by the GPT-4 model and how it differs from previous versions?
What is GPT-4?
The GPT-4 is the latest and most advanced version of the OpenAI script generation language model launched in 2018, the third generation GPT-3 in 2020, and even the GPT-3.5 version it used to create the interactive chatbot ChatGPT launched last November.
If you've used ChatGPT, you're familiar with the previous version GPT-3.5, which is limited to text input, and generates texts in natural language or simple code. But GPT-4 is the latest version and is characterized as multimedia because it accepts image and text inputs, and generates text outputs, it is able to understand and analyze the content of images well, even with complex images, such as: graphic images or worksheets, and long documents, which is a significant improvement over the previous model.
The new features in this model significantly develop the capabilities of interactive chatbots, as the GPT-4 model can perform multiple tasks simultaneously, and increases the likelihood of providing realistic answers by 40%, making it a more useful tool for applications that rely on factual information, such as search engines.
OpenAI indicated that it spent 6 months developing this version of its language model, based on its experience with ChatGPT, and the new model has proven its superior ability to commit to providing the correct facts and information, as well as facing user attempts to take it out of the context of dialogue or provide responses contrary to the company's standards.
The company pointed out that GPT-3.5 can be distinguished from GPT-4, when the complexity of the task reaches its maximum, as the GPT-4 model has become more reliable, creative and able to handle instructions with higher accuracy compared to the previous generation.
What new features does GPT-4 offer?
1-Performance improvements:
The GPT-4 can handle more than 25,000 words, about 8 times more than the previous GPT-3.5 version of just 3,000 words, allowing it to be used for long content creation, extended conversations, and document search and analysis.
The GPT-4 model was developed to improve the Alignment feature, which is the ability to track user intentions while making them clearer and generating less offensive or dangerous outputs.
GPT-4 also improved from the previous version (GPT-3.5) in terms of correctness and realism of answers, as the latest version makes fewer factual or logical errors, and GPT-4 has outperformed GPT-3.5 in OpenAI's internal real-life performance standard.
2- Understanding images:
The new GPT-4 model is able to understand images well, even with complex images, the model can accurately describe images, such as distinguishing between a VGA cable and a smartphone connected to the cable, as shown in the following image:
To ensure the readiness of the image input feature for the new model, OpenAI has collaborated with the Be My Eyes application for the visually impaired, which will receive with the new update the Virtual Volunteer feature, which uses the capabilities of the new model to extrapolate and analyze images, and describe them accurately to users, in addition to what they can accomplish with the components of those images.
This feature also allows the user to enter questions or text commands with images attached, so that text commands ask the GPT-4 model to perform tasks or answer questions related to the content of the images.
Greg Brockman, co-founder and president of OpenAI, explained some of the use cases for the GPT-4 model: the ability to read an image of a manually drawn form of a website, and create the site's code through the image.
3- Improve the ability to direct:
Steerability is one of the superior capabilities offered by the new GPT-4 version, which is the ability to change its behavior according to user requests, for example, the user can guide the model and set a list of instructions that he must adhere to, in the way he responds to requests, and provides various information.
The company gave an example of this feature, which is to develop a set of guidelines and guidelines for the new model to turn into a teacher for a student, so that these guidelines explain the method of education that the model must follow and the process of simplifying information, in order to reach the level of student perception in an easy and clear way.
The company pointed out that this feature will also allow developers to control the way their services work, which provide smart interaction with users, by setting a list of controls and personality traits of the model on which their services depend and the way they deal with users.
Is GPT-4 more secure than previous versions?
According to OpenAI, the researchers incorporated more human feedback, including from ChatGPT users, to improve GPT-4 performance, and the company relied on 50 human experts to provide feedback regarding AI safety.
The company confirmed that it has developed the new model to be able to reject 82% of users' attempts and questions seeking answers related to dangerous topics, such as manufacturing harmful chemicals or attempting to harm themselves, compared to the previous version GPT-3.5.
But she also acknowledged that the GPT-4 model is not infallible, despite the new tremendous capabilities and significant development in its performance, and noted that the model is still unable to provide information on developments after September 2021, nor does it learn from its previous experiences with user questions.
The company added that GPT-4 can sometimes make simple and naïve mistakes that do not seem to be in line with the efficiency of its performance, such as accepting obvious incorrect statements from the user. Sometimes it can fail in difficult problems in the same way that humans do, such as discovering vulnerabilities in the code it produces itself.
Top 5 differences between the GPT-4 and GPT-3.5 model
1- Model (GPT-4) is able to see and understand images:
The biggest change in GPT-4 compared to any previous version is that it is a multi-modal LLM, meaning that it can understand more than one way of the information entered into it.
The previous model (GPT-3) with the ChatGPT robot was limited to text only, as it could only accept text inputs, and generate text responses as well, but in the GPT-4 model you can enter images and the model will process them to find the information in them, in addition to word processing.
You can simply ask him to describe what's in the photos, and his integration into Be My Eyes – an app used by blind and visually impaired people – has led to a tool called Virtual Volunteer that describes what their phone sees by reading the photos and providing text describing them, answering any question about that image and providing immediate help for a variety of tasks.
In a video of the Be My Eyes app, the GPT-4 model was able to describe the pattern on the dress, identify the plants, explain how to get to a specific machine in the gym, translate the label on a food commodity and provide recipes that can be made using it, read the map, and perform many tasks based on images.
For example, if a user sends a photo of what's in their fridge, the virtual volunteer will not only be able to correctly identify what's inside, but also make suggestions about what can be prepared with these ingredients, the tool can provide a number of recipes that can be prepared using these ingredients and send a step-by-step guide on how to Manufacture.
But so far, the GPT-4 model is still limited to text responses and cannot produce images itself.
2- GPT-4 is hard to fool:
Chatbots tend to easily provide incorrect and inaccurate information, despite everything these bots present correctly today. Anyone who talks to her with a little persuasion can convince her to explain what the evil AI can do, play an evil chatbot, or do some other little fantasy that allows the model to say all sorts of strange things.
Many people have already exploited ChatGPT to create malicious content, in a process known as "jailbreaking AI-powered chatbots" using a variety of methods, most notably playing other roles.
Therefore, OpenAI trained the GPT-4 model on many, many malicious claims that the users themselves made to the robot during the past period, so the new model is much better than its predecessor in realistic answers, routing, and refusal to get out of protection barriers.
OpenAI has stated that the GPT-4 model is 40% better than the previous version (GPT-3.5) in providing realistic responses, and also has more advanced thinking capabilities than its predecessor.
The company also confirmed that it has developed the GPT-4 model to be able to reject 82% of users' attempts and questions seeking answers related to dangerous topics, such as manufacturing harmful chemicals or attempting to harm themselves, compared to the previous version GPT-3.5.
However, the company admitted that it is still not immune to Hallucinations, which is the tendency of artificial intelligence to generate error responses or logical errors. Sam Altman, co-founder and CEO of OpenAI, offered his usual admission to minimize objection to the robot's bugs: "It's the most compatible and capable model to date; but it's still flawed and limited, and it still looks more impressive when first used than after spending more time with it".
3-GPT-4 has longer memory:
Large language models practice millions of pages of web, books, and other textual data, but when you have an actual conversation with the user, there's a limit to how much you remember the conversation to preserve context. That limit in ChatGPT's GPT 3.5 model was only about 3,000 words, roughly three to four pages of the book, so it often didn't work efficiently in long dialogues.
But the GPT-4 model can handle more than 25,000 words, about 8 times as much as the previous version, which is equivalent to about 50 pages of text, enough for a full play or short story.
What this means is that in conversation or in text creation, it will be able to keep up to 50 pages, so it will remember what I talked about on page 20 of the chat, or it may refer to events that happened 35 pages ago. So it can be used in long content creation, extended conversations, and document search and analysis.
4- The GPT-4 model is more multilingual:
English speakers dominate the world of artificial intelligence, and everything from data and tests as well as research papers exist in this language. But of course, the capabilities of large language models are applicable in any written language and should be made available in it.
Open AI has taken a step towards doing this in the GPT-4 model by demonstrating its ability to answer thousands of multiple-choice questions with high accuracy across
26 languages, from Italian to Arabic to Turkish, and trained in 5 local Indian languages.
This initial test of language abilities is promising but far from fully supportive of multilingual abilities, as the test criteria have been translated from English for the model to understand, and the multiple choice questions cannot be compared to normal speech. But this step is important in the GPT-4 model because it means it may be friendlier with non-English speakers.
5- GPT-4 has different personalities:
The concept of (steerability) is one of the interesting concepts in artificial intelligence, as it refers to the ability of the model to change its behavior on demand, and this feature is present in the model (GPT-3.5) and was used to convince the ChatGPT robot to play the role of an empathetic listener, and there are those who used it to convince him to play the role of an evil chat robot.
In the GPT-4 model, the company developed this feature to be more useful, as it will allow the model to adopt different characters, albeit with limitations that the company has included in the system to avoid harmful results.
The company gave an example of this feature is to develop a set of guidelines and guidelines for the new model to become a teacher to a student, so that these guidelines explain the method of education that the model must follow and the process of simplifying information, so that it reaches the level of student perception in an easy and clear way.
The company pointed out that this feature will also allow developers to control the way their services work, which provide smart interaction with users, by setting a list of controls and personality traits of the model on which their services depend and the way they deal with users.