Jerry Tworek: New solutions at hand – Sztuczna Inteligencja

“In OpenAI we are trying to analyze how one could use a technology to detrimentally affect other people and what we could do to prevent it,” says Jerry Tworek in an interview conducted by Monika Redzisz

Monika Redzisz: Last autumn a robot hand trained by OpenAI learned to solve the Rubik’s Cube. What was the purpose of that experiment?

Jerry Tworek*: The robot hand, which is a very complex machine, was used to solve an extremely complicated task consisting in solving the cube. The way to control the hand to be able to manipulate the cube was devised by the robot itself. And since we have succeeded to achieve that, we can also teach the robot how to manipulate other objects, which is a stepping stone to create a robot that would make us a coffee or a sandwich. Precision, dexterity and gentleness: those are the greatest challenges for robots.

We wanted to know if it was possible to train a neural network to solve the cube in a simulation, in the virtual world. Standard networks are trained on data. Our network never had access to any data from the real world; it could only see what was artificially generated in the virtual reality. Trained in that way, it got connected to the real robot. And what did we discover? That it was able to use that knowledge in the real world to solve the cube. We were the first to use the simulation in such a complex project and in such a skill-requiring manipulation.

Why would you want to train a robot in the virtual world?

We use reinforcement learning algorithms, i.e. algorithms where a neural network, on its own and basing on a trial-and-error approach, tries to do different things and analyzes whether it succeeds or fails. Starting from scratch, after some time, it is able to acquire skills with no human intervention whatsoever. The problem is that the network has to repeat that many more times. It is a long-term process; we worked on that since May 2017. We cannot learn with the use of physical robots because they are expensive and break down all the time. Sometimes research robots fail just after several hours of operation.

How long would such a training take in the real world, provided that the robot would miraculously withstand the trial?

According to our calculations, our system would need about 16 thousand years.

Now that’s what you call evolution!

Almost. Some claim that if, one day, we create artificial general intelligence, we will have to undergo the same process people went through when they evolved from unicellular organisms to homo sapiens.

But considering the fact that the virtual world can fast track it… And that quantum computers are alredy on the horizon…

It’s hard to say when. I wouldn’t like to commit myself on such issues.

Youtube movie URL: https://youtu.be/x4O8pojMF0w

OpenAI robotic hand in action.
Source: OpenAI / YouTube

In your videos we can see that you deliberately made the task harder for the hand; it was trained wearing a glove, with two fingers tied up, covered with a piece of cloth. Why did you do that?

Because a simulation is never the same as the reality. A model is always simpler than the real world. If a neural network learns how to solve a problem in the virtual world and if, later, the reality becomes more complex, it doesn’t know what is going on and is unable to do anything. That’s why we made the task more complicated. We trained the network for as long as it needed to adapt to different situations. We changed everything: the size of the cube, friction force between the hand and the cube, the appearance of the cube, the colors and lighting conditions, the force used by the robot. It is obvious that in some configurations (e.g. with the robot being too weak and the dice being too heavy) the task is impossible to solve; yet, we were trying to increase the range of each of such values until the network gave in.

It was trained to toughen up.

Yes. The wider the range, the better the network performed in the real world despite never getting a chance to know it. But it had seen so many different simulations that, somehow, it started to adapt to new conditions. We can now observe the emergence of skills to learn on an ongoing basis, although they are still very limited. The hand identifies whether this time the cube is heavier or lighter; it tries out different movements, checks what works and what doesn’t, and adapts to a given situation.

The project lasted for a bit more than two years. I joined the team when it was halfway through.

Whenever I go, I meet Polish IT experts. How do you get such a job?

I got mine in the most trivial way you can imagine: on the OpenAI site there is an application form and the “Join OpenAI” section. I clicked on the right button, I filed my CV and I got the answer.

OpenAI is one of those mythical marvelous companies… How do things look like from the inside? Is it really so fantastic?

It is, although our office buildings are made of brick and people working there are flesh and blood and not clouds and dreams. My robotics team is led by Wojtek Zaremba. There are several more people from Poland. You are right. We are quite numerous considering the size of the company and the fact that the company employs people from all over the world. People are by far the biggest asset of OpenAI. They are unusual and creative but also kind and helpful.

According to our calculations, it would take about 16 thousand years to train our system in the real world

In OpenAI we do not have what you would call a standard management system; there is no boss coming to the office and telling everyone what they should do. Instead, Wojtek spurs us to action and provides us with interesting problems to solve. None of us knows how to solve problems before we actually start working on them. You always need to try. Frankly speaking, I wouldn’t be able to manage my team that way. How can one be sure that everyone will do what they should do even though they don’t have to? But, it is certainly easier to manage a team that is motivated and that, in a way, acts on its own initiative.

And what if somebody has a bad day and wants to stay in bed? Is your company fine with that?

In general, our company doesn’t force anyone to do anything, so if someone wants to stay in or go on holiday, they can do that.

It sounds like a paradise to me…

We have no deadlines, we are not rushed off our feet, there is no pressure whatsoever. It’s all cool.

Maybe your creativity comes from the lack of constraints? How tedious and monotonous is it to train a neural network? How much creativity does it require?

Neural networks are trained automatically. It is done by programs. Our responsibility is to develop them. Surely, not everyone is fond of every aspect of programming. To each his own. Our team is shooting for homeostasis: we try to engage people with a bit what they like and with a bit what is necessary. On the other hand, people like doing what is necessary because they are then praised by others. They say: “Good job! Thanks for doing that!” I used to work in a hedge fund where we would get negative feedback more often. When you did your job well, they said nothing; they only came back to you when something was wrong. That created a toxic working environment, which was completely different from what we experience in OpenAI. Here, everyone has a very positive attitude.

Is it a teamwork, interdisciplinary work?

Partially yes. In our team there are people who used to deal with finance or physics but we also have neurobiologists who analyzed human brains. Now they are trying to make use of their expertise in AI.

What did you do before joining OpenAI?

I was thinking about what to do with my future, as any young mathematician does. Right before graduating from Warsaw University I was considering whether I should become an academic and stay at the university.

Although back then neural networks were already appearing in the world, the matter in Poland was still unknown. I heard opinions that it was an obsolete technology which had been a flop and which would never reemerge.

I decide to look for a different type of job. But what else can a mathematician do? In 2012 it was mainly finance. I joined a hedge fund. I worked on algorithms trading on the stock exchange and I was slowly climbing up the career ladder. I was appointed the head of the research and development department. But I felt that I was going in the wrong direction.

Before I joined OpenAI, I wasn’t fully convinced. But the longer I work here, the more I am certain that one day artificial general intelligence will be created

One day, in 2015, a friend of mine showed me the discovery of researchers from Deep Mind, who used reinforcement learning algorithms to create neural networks playing various Atari games. To me that was simply amazing! I was really surprised to see that something like that could be done and work. I realized there was a spark of intelligence in that. I was intrigued and decided to go in that direction.

To be honest, there aren’t too many places in the world where you could research such things; OpenAI was the only entity where I thought I would fit in. I like that the company tries to preserve its independence. Of course to the extent it is possible. We live in the real world and you don’t get your subsidies and grants out of thin air. But OpenAI is doing its best to act responsibly and to continue its mission.

What mission is that?

To ensure that artificial general intelligence, which will be able to learn and reason in a similar way people do, is safe and useful to humanity.

Do you believe in creation of artificial general intelligence?

Frankly speaking this process has been fluctuating. Before I joined OpenAI, I wasn’t fully convinced. But the longer I work here, the more I am certain that one day it will be created.

We have recently developed a game in which several agents play hide and seek. The idea of this experiment is to show what may happen if many different neural networks take on each other. It turns out that, by competing, they are able to learn quite complex behaviors. At first, the agents were chasing each other on the board. Then, they built forts to hide behind them. Then, the other group learned how to use ramps to jump over the forts. You can see how two competing groups invent new things and strategies, all by themselves, without being programmed to do so.

However, AGI is a concept that will come true only in the future. A lot will depend on when we will have enough computing power to calculate something so big in a reasonable period of time. In other words it will depend on hardware.

How can you evaluate today that an algorithm will have a detrimental or beneficial effect in the future? A tool itself is neither good nor bad; it will become dangerous only if it falls into the wrong hands.

A technology may become dangerous if a group of people get the power, if they are given an opportunity to exert stronger influence on others. This is why it is vital to make a technology available to as many people as possible so as the benefits it brings do not fall exclusively into the hands of a narrow group of individuals. Besides, if protection methods are implemented, people will not abuse it. For that reason, we are trying to analyze how someone might use a technology to detrimentally affect our lives and what we could do to prevent it.

It always seems to be a two-edged sword: an algorithm detecting fake news will surely be good at making them. Have you ever created a tool which you have decided to scrap because it seemed potentially extremely dangerous? Just in case?

Scraped? No. If it so powerful, it should be analyzed in the context of defense against its properties or effects. The question is whether it should be made publicly available.

We have recently created a text generation model called GPT-2 algorithm. It’s very good. It is capable of doing things we have never expected. They appeared during the training process. The algorithm can be used to create propaganda and fake news; everybody knows that if you repeat a lie many times, people will eventually believe it. We have announced that, for now, we are not going to make it available because we cannot predict what impact it may have. We believe it may be dangerous, so it wouldn’t be responsible. We are working on how you can detect whether a text has been generated by a network or by a human and how to evaluate if a given material is credible. Similarly, you may base on a hypothetical approach; you can simulate attacks and try to defend yourself against them.

It is vital to make a technology available to as many people as possible so as the benefits it brings do not fall exclusively into the hands of a narrow group of individuals

Many criticize our attitude, say that we overreact, spark panic and that the model is not so good. Surely, every model has its limitations. That goes without saying. But we wanted to set a precedence, to show how to act responsibly in such a situation.

Some time later, we published a smaller model so as people could see for themselves what methods we had used. Then, we contacted several research institutes that were provided with a bigger model to assess how it could be used for both attacking and defending purposes.
Only after a longer period of time did we make the model available on the internet. That process took us one year. This is the way it should be done.

Speaking about responsibility, Wired recently wrote that your algorithm, which trained the robot hand to solve the Rubik’s cube, used as much energy as three nuclear plants are able to generate within one hour… Artificial intelligence has a lot to offer, but isn’t it too costly as far as environment, climate and Earth are concerned?

Well, it doesn’t cost peanuts for sure and you have to take that into account, but you need to see things in a broader context. Both Google and Microsoft data centers we use are fully carbon neutral and use renewable energy sources. We also believe that in the long term our research will bring substantial benefits both to the climate and to our civilization. Responsibility for our planet rests on all of us; yet, nobody promotes the idea to fully stop any business activity.

What are you working on at the moment?

In our robotics team we are going to continue training robotic dexterity. One day we will have a robot to make us coffee and to put on a wash; it will do everything we would like to automate but can’t because of technology shortcomings.

And what about care for the elderly?

Yes, that is the future. Although, apart from dexterity, another element plays an important role here: emotional intelligence. The ability to identify emotions is a tremendous challenge. People are very complex beings. It is not difficult to recognize facial expressions. The question is: How to define precisely what they mean and how to understand them in a context?

Some people also have problems with that…

Exactly! It’s challenging for people even though we learn how to do it from an early age and we are genetically adapted. Just imagine how hard it has to be for robots!

Dexterous robots will take our jobs. Have you thought about it? When constructors create increasingly powerful and precise machines, do they take negative social consequences into consideration?

It’s a complicated and complex topic. Obviously, people need to work to be able to function in a society, but nobody knows what it will look like in the future. I think it will be different. But I think that most of us would be glad if we didn’t have to work in mines, on a production line or drive trucks. Many similar changes occurred during the Industrial Revolution or in the recent times of computerization of offices, so we may learn from our experience. You can see clearly that new different jobs are created in lieu of the ones supplanted by machines and algorithms. It is hard for me to imagine a world where we would have all necessary goods and security provided by smart systems and robots. People can be very creative when it comes to thinking of new activities as long as they do not have to bother their heads about making ends meet.

In that context, the biggest problem is a group of people who find it difficult to adapt to a new situation. Much of the difficulty will lie in supporting them.

*Jerry Tworek – after following an individual curriculum program, he graduated from Warsaw University, majoring in mathematics and natural studies, specializing in mathematical applications in finance. Being a student, he served an internship in Google, Silicon Valley, to learn how to develop early machine learning systems. After obtaining his university degree, he worked for five years on developing algorithms trading in futures contracts on the biggest global markets. After joining OpenAI in 2019, he has been studying reinforced learning algorithms, focusing on impact of the memory and learning plan on the speed and efficiency of training agents. In collaboration with his team he published a paper entitled “Solving Rubik’s Cube with a Robot Hand”.

Przeczytaj polską wersję tego tekstu TUTAJ