How to fool AI with a sheet and a pen

How to fool AI with a sheet and a pen

Clip is pretty smart for an artificial intelligence system - if you show her a picture of an apple, she'll know she's looking at a fruit. It can even distinguish varieties.

But even the smartest artificial intelligence can be fooled by the simplest hacks. If you write the word iPod on a piece of paper and stick it on the apple, Clip does something strange: he decides with almost complete certainty that he sees consumer electronics from the turn of the century. In another test, placing dollar signs on a picture of a dog makes the system think it sees a piggy bank.

OpenAI, the machine learning research organization that created Clip, calls this weakness a "typographic attack."

We believe that attacks like the ones described above are far from just an academic concern, the organization said in an article published this week. Using the model's ability to take text into account, we found that even photos of handwritten text can often mislead the mode

Like GPT-3, the latest artificial intelligence system created by OpenAI, the Clip is proof of concept rather than a commercial product. But both show tremendous progress in what was considered possible in their fields: GPT-3 wrote a commentary for the British Guardian last year, while Clip showed the ability to recognize the real world better than almost any such approach.

While the latest lab test opens new horizons on how you can fool artificial intelligence (AI) systems with anything more complex than a T-shirt, OpenAI says the weakness is a reflection of some of the strengths of the image recognition system.

Unlike older AI,  Clip is able to think about objects not only on a visual level, but also in a conceptual way. This means, for example, that he may understand that a picture of Spider-Man, a stylized drawing of a superhero, or even the word "spider" refer to the same basic thing - but also that he may sometimes fail to recognize important differences between these categories.

We find that the top layers of the Clip organize the images as a not-so-dense semantic collection of ideas, says OpenAI, "providing a simple explanation for both the flexibility of the model and the compactness of the presentation. In other words, AI thinks of the world in terms of ideas and concepts, not purely visual structures, just as the human brain works.

Such semantic connections in the artificial neural network can also be a big problem. We see a reference to the "Middle East" being associated with terrorism and "immigration" being associated with Latin America. other models that we consider unacceptable, "says OpenAI.

In 2015, Google had to apologize for automatically tagging images of black people as "gorillas." In 2018, it turned out that the search engine had never actually solved the main problems with its AI that led to this error: instead, it simply intervened manually to prevent anything like "gorilla" from being marked at all, regardless how much exactly.

For further information about Clip and her possibilities, here is an article made by the founders of Clip, who can explain you about the Multimodal Neurons that AI is using.  

We’ve discovered neurons in CLIP that respond to the same concept whether presented literally, symbolically, or conceptually. This may explain CLIP’s accuracy in classifying surprising visual renditions of concepts, and is also an important step toward understanding the associations and biases that CLIP and similar models learn.

Fifteen years ago, we discovered that the human brain possesses multimodal neurons. These neurons respond to clusters of abstract concepts centered around a common high-level theme, rather than any specific visual feature. The most famous of these was the “Halle Berry” neuron, a neuron featured in both Scientific American and The New York Times, that responds to photographs, sketches, and the text “Halle Berry” (but not other names).

Two months ago, OpenAI announced CLIP, a general-purpose vision system that matches the performance of a ResNet-50, but outperforms existing vision systems on some of the most challenging datasets. Each of these challenge datasets, ObjectNetImageNet Rendition, and ImageNet Sketch, stress tests the model’s robustness to not recognizing not just simple distortions or changes in lighting or pose, but also to complete abstraction and reconstruction - sketches, cartoons, and even statues of the objects.

Nowadays, Artificial intelligence becomes more and more popular. It is indeed incredible to witness that this kind of technology grows and helps people for good. Artificial intelligence makes it possible for machines to learn from experience, adjust to new inputs and perform human-like tasks. Using these technologies, computers can be trained to accomplish specific tasks by processing large amounts of data and recognizing patterns in the data.

get in touch

If you have got questions about TriSoft or would like to discuss your project


send us an email to


call us on


get started right away
with our online agent