DeepMind agents to track good citizens

“Today’s neural networks are trained and then implemented in a new environment where they no longer have the opportunity to learn. We would like to create agents which would learn throughout their whole life and adapt to new situations. Those areas are still to be explored,” says Janusz Marecki in the interview conducted by Monika Redzisz

Monika Redzisz: First, let us explain what some expressions mean. What is a single-agent system and what is a multi-agent system?

Janusz Marecki*: The single-agent system is one individual brain which thinks and solves problems within its own capacity. One of the examples of a single-agent system may be a human body which is centrally controlled by only one brain. In a multi-agent system there are several agents and each of them is controlled by a different brain, for instance by a different neural network. A good example of that is an ant colony. Here, knowledge is dispersed and decisions are made by a central unit. Agents can communicate between each other or may decide against it. You and I, talking here right now, form a two-agent system; we are two agents trying to communicate with each other.

If a human is a single-agent system, then a society is a multi-agent system.

Yes. But if we had a system with ten agents and if each of them had an insight into other brains and were able to control them remotely, it would be in fact one agent capable of seeing everything and therefore would have to be classified as a single-agent system. A multi-agent system is created only when we deal with swarm intelligence, i.e. many brains which do not have insight into other brains.

In Poland we have 38 million people which are able to communicate between each other in one way or another, but do not know everything about each other. If our country had a dictator which, just like Big Brother, had an insight into our brains and were capable of controlling us, we would then live in a single-agent system.

Referring to what you work on in DeepMind, could you explain what traditional artificial intelligence is?

From the very beginning, artificial neural networks have been designed as single-agent systems. AI originates from operational research the purpose of which was to solve a problem. Algorithms were created to find an answer to specific questions, much like the DeepMind algorithm which has become the world Go champion.

What is the point of multiplying agents when we have more powerful algorithms which can solve increasingly complicated problems? What is the purpose of engaging a second, a third and a fourth agent?

That is a good question. Centralization makes it possible to find optimal solutions for the whole group of units… Besides, it is easier. If you think about dictatorship, you will find it, logistically speaking, easier to handle than democracy. However, there are two big flaws with central control. First, there is no privacy. Centralization means that we must send all our information about ourselves to the central unit. And the lack of privacy affects other things. For example, creativity goes down because it is safer to follow the mainstream ideas.

Artificial general intelligence? It will be here in about 10 years. We already know what we are missing.

Why do algorithms need privacy?

When algorithms are deprived of privacy, they can sense that doing something against the central unit may not pay off. Let us assume that we have constructed an artificial general intelligence system consisting of ten agents, each of them being as intelligent as humans. The algorithms, being aware of the fact that the system knows everything about them, will not do anything their own way; they will follow the mainstream.

What about the other problem?

Centralized solutions are usually very fragile. If you eliminate the center, the whole system collapses. A decentralized system is less vulnerable; even if one of the agents is destroyed, others will survive. It is easier for them to adapt to a changing situation. They do not have to send data to the center or follow anyone’s orders. This is why DeepMind is trying to develop multi-agent systems.

What exactly does it look like?

We create environments in which several agents are implemented. We order them to pick virtual apples, for example. Agents try to pick as many apples as they can. And we analyze what makes them cooperate and what makes them fight between each other. Algorithms are purposely designed to be selfish. We want to know what should be done to make them “good citizens”. If we allow agents to do whatever they want, we may soon discover that one smart guy will take everything for itself and destroy the whole crop. To the detriment of the entire community. That mechanism is known as “tragedy of the commons” – a situation in which excessive individual gains lead to the tragedy of the whole community.

You can compare it to vaccination: individual agents having kids believe that it is more likely that their children will suffer from vaccine adverse events than that they will contract a serious disease, and do not vaccinate their kids. If everyone in the community acted like that, the probability of the outbreak of the disease would increase multifold.

Is this how agents behave if you do not interfere in their situation?

Yes. The agents destroy one another. Some win, others are left empty-handed. There is anarchy; in the anarchic system some units are perfectly fine, but the living standards of an average unit in the entire community is falling.

Even if you have resources galore? Even if there is enough resources for everybody?

We have already performed simulations in which the agents were picking apples but they also had lasers at their disposal which eliminated opponents for a certain period of time. As long as resources were abundant, the agents were picking their apples without quarreling. The scarcer the apples were getting, the more frequently the virtual laser was resorted to.

Centralized solutions are usually very fragile. If you eliminate the center, the whole system collapses. A decentralized system is less vulnerable. This is why DeepMind is trying to develop multi-agent systems.

When we design multi-agents systems, we cannot forget about what happens if there are a lot of agents and if the resources are limited.

Unfortunately, the more intelligent the agent was getting, the more efficient it was becoming in eliminating its opponents. The agent with the biggest brain eliminated other agents even if the resources were plentiful.

How to develop a system that would not allow the strong to destroy the weak?

It is necessary to apply appropriate rules or legal mechanisms. We have been looking, for example, for mechanisms that would make the agents less selfish. You can reward the agent for the actions that are beneficial to the whole community. We have been experimenting with different rewards, for instance with something that you might call “virtual reputation”. If one of the agents takes all the resources for itself, its reputation is tarnished. One thing is sure. If we do not instill morality, the strongest will take it all. Just as it happens in the real life.

But they are intelligent, learning systems. If the agents eat all the apples and eliminate their opponents, they will not go hungry this season, but in the future they will suffer the consequences of their selfishness. In the long term, egoism does not pay off. Why don’t they draw such conclusions?

They sometimes do. The problem is that in my opinion we apply an inappropriate AI technology. We train neural networks with short episodes. And that is the key thing. We show them how apples grow from January to December and not how they grow for ten years. We cannot show them what the next season will look like.

Why? Can’t you give them more data?

Such are the limits of the technology we have today. Our agents have short-term memory. When we train them for Go purposes, only 369 moves are possible. If we train them for Starcraft purposes, it is for an episode taking 5 to 10 minutes. They cannot be trained for longer episodes because they have problems with memory. If they were engaged in a long episode, they would only remember last moves. They are efficient in environments where you do not have to plan far ahead. When an algorithm plays chess or Go, it sees the board and makes decisions on the basis on what it can see. It does not remember (there is no need for it to remember) what happened several hundred steps back.

Some time ago I read that twin sisters had generated their own language that no one else could understand. Our agents do something similar. They create their own language of communication which other agents, let alone people, may not understand.

Besides, we do not know how to teach our agents one thing, then another one, and so on. Neural networks are function approximators which are very good at learning one sentence. When they see another sentence, they focus only on that sentence and forget about what happened before. They do not accumulate knowledge. It is called catastrophic interference. People do not have any problems with that. I am learning how to talk with my six-year-old son. Although I am learning, he is constantly changing, so every day I have to learn new things. But at the same time I do not forget about the old ones.

Will the multi-agent systems you have been trying to construct be like that?

Yes. Today’s neural networks are first trained and then implemented in an environment. At that stage they lose their capability to learn. Today we can, say, train an artificial bee and make it fly from one flower to another. But if something bad happens to it, it is over. The next bee will have to start from scratch.

We would like to create agents which would learn throughout their whole life. If they ever come across a dangerous situation, they will learn how to avoid it and adapt to a new scenario. And that exactly is going to be AGI – artificial general intelligence. Unfortunately we are unable to create agents that would operate in a changing environment. These areas are yet to be explored. We do not even have a distinctive name for that phenomenon.

In what way is artificial intelligence related to multi-agent systems?

Multi-agent systems create new challenges for themselves in a natural way. There is a hypothesis according to which humans became intelligent because there were so many of them. They had to constantly learn from one another and respond to new problems. A single-agent environment does not necessarily need that: one agent does not have to adapt to anyone. Sometimes it is not even aware of potential problems.

Only in a multi-agent system, where agents learn and adapt, can you see what problems may occur. It is a non-stationary environment which naturally generates new situations and unforeseen events. If a single agent can survive in it, it means that it can adapt to a changing environment.

Is it observable? Can you see agents communicating?

Yes, you can see how they convey information between one another and how they modify the environment. It is a little bit like ants which convey information by leaving pheromones.

If someone ever constructs a system with the level of intelligence of a mouse, then rescaling it from a mouse to a human will take a year or a month. When we solve the problem of intelligence of a mouse, things will get scary

Is that information understandable for humans?

That is an interesting question. This is a code that is their language. Some time ago I read that twin sisters had generated their own language that no one else could understand. Our agents do something similar. They create their own language of communication which other agents, let alone people, may not understand.

What happens if their intelligence surpasses ours?

That is a good question. Let me put this way: as long as neural networks experience problems I have mentioned before, I do not fear that their intelligence surpasses ours.

You said that artificial general intelligence would be created as early as in 10 to 20 years…

Researchers dealing with AI have been saying that since 1950s! If we said it would happen in less than 20 years, it would be easy to live to a moment when it would turn out we had been wrong. On the other hand, if we prophesized that it would take more than 20 years, nobody would be interested in such a long term perspective.

But in all honesty, I think that it will happen in about 10 years. We already know what we are missing. If someone ever constructs a system with the level of intelligence of a mouse, then rescaling it from a mouse to a human will take a year or a month. When we solve the problem of intelligence of a mouse, things will get scary.

Who knows? Maybe the only option for a human will be to merge with artificial intelligence? For now, the brain-computer interfaces are too slow – we are already able to control the movement of the cursor on the screen with the signals from our brain, but throughput is still an issue. Besides, it will give us only access to knowledge; the ability to reason is much more than that. So far we still have not understood how our brain, aka multi-agent chaotic system, works.

But surely, even today, it would be desirable to develop mechanisms that would make agents of multi-agent systems “good citizens”. Especially if they there are meant to be more intelligent than us. That is the DeepMind mission.

Przeczytaj polską wersję tego tekstu TUTAJ

*Janusz Marecki joined Google DeepMind in 2015. Since then he has focused on research on deep learning, reinforcement learning and multi-agent systems. Earlier he worked for seven years for IBM Watson. He comes from Bielsko-Biała. He graduated from Jagiellonian University in Kraków. He obtained his doctoral degree in artificial intelligence at the University of Southern California, Los Angeles.