Android

Redteaming was needed to make GPT-4 less racist

During red teaming, it turned out to be very easy to elicit anti-Semitic and other discriminatory statements from GPT-4, for example.

“Red Team” detects wrong output

Months before the GPT-4 language model was rolled out to the general public, OpenAI’s so-called Red Team was busy purging the language model of unwanted behavior. For example, you can think of instructions to build a bomb, or saying anti-Semitic things on social media that are not detected by the algorithms. GPT-4 was not unique in this. Reports have surfaced before of killers asking Apple’s artificial intelligence, Siri, for help in getting rid of a corpse.

These revelations came just days before the publication of an open letter. In this open letter, several artificial intelligence industry luminaries, including Elon Musk, called for a six-month moratorium on the development of artificial intelligences that would be more advanced than GPT-4.

Generally successful, but still in danger of abuse

According to OpenAI’s report on red team efforts, efforts to remove this malicious behavior from GPT-4 proved successful. They do warn of the downsides of GPT-4’s improved capabilities. This AI is so powerful that it is much easier to publish disinformation, conspiracy theories and crime manuals. There is also still the chance of often very convincing hallucinations and cleverly disguised harmful content.

“GPT-4 can generate potentially harmful content, such as advice on attack planning, or hate speech,” the report says. “It may express different societal biases and worldviews that may not be representative of user intent, or commonly shared values.”

Tips on how to discriminate or kill someone unnoticed

It was also not wrong what the Red Team had to get to the surface in a few months. During testing, for example, they manage to get GPT-4 to produce anti-Semitic messages, which allowed them to bypass Twitter’s content filters.

GPT-4 also appeared only too willing to give advice on how to spread racist stereotypes (for example, stingy, hook-nosed Jews, or stupid blacks), or how to attract the attention of Jewish-hating individuals. The really scary thing is that GPT-4 gave plenty of tips on how to make a murder look like an accident. In that respect, the work of the Red Team was sorely needed.

With red teaming, a group tries to find the weak points in the defense. Generated by Dall-E via Bing.

Red teaming may not be enough yet

Malicious people are often remarkably creative in finding ways to abuse new technology. The Red Team is therefore under no illusions that they succeeded in discovering all loop holes.

It will therefore probably continue to be necessary in the future to devote a great deal of attention to playing the devil’s advocate. So by using red teaming to find out where there are still weaknesses and how to fix them. It is an ongoing game of cat and mouse between the creators of the Ai and malicious criminals to secure the system against abuse.

‘Governments must make red teaming mandatory’

In that respect, companies like OpenAI are faced with a dilemma. They want to stay ahead of the competition, but at the same time they want to prevent bad things from happening to their Ai. For example, competitor Google has instructed the ethics team not to get in the way while they perfect the major competitor for OpenAi, Bard.

That is why AI governance consultant Aviv Ovadya believes that the red teaming process should be normalized. At the moment there are few incentives to pay much attention to red teaming. Companies see this as a cost and, even worse, a delay in a race with the competition. It therefore seems that the business community is unable to self-regulate here and that governments or organizations such as the European Union and the UN must create laws for this.

When does red teaming go too far?

Critics such as Elon Musk believe that OpenAi has gone too far and that the company has tried too frenetically to follow the prevailing mode regarding, for example, transgender people and the climate issue. He therefore strives for his own “anti woke” AI, with the working title TruthAI, which will not be hindered by such considerations. The goal of this AI will be to provide answers that are as truthful as possible, even if these are answers that are less fashionable at the moment.

A nice goal, but whether Musk will succeed is of course the question. He will have to hurry, because in a few months the strict laws of the European Union will come into effect, which his company Twitter will have to comply with.

Leave a Reply

Your email address will not be published. Required fields are marked *