Training with million images or how to track down cancer

An algorithm capable of diagnosing breast cancer more accurately than an experienced radiologist – an interview with Krzysztof Geras, PhD, conducted by Monika Redzisz

Monika Redzisz: Your professional focus is on applying deep learning in medical imaging. A neural network created by your team can use mammographic pictures to detect breast cancer at a very early stage. How efficient is it?

Krzysztof Geras, PhD: It is a bit more accurate than a radiologist. To put in simple terms, the accuracy of the algorithm is about 90 percent, while the accuracy of the radiologists is ca. 80 percent. But the most interesting thing is that when we combine the predictions of the algorithm and of the radiologists and average them out, the result is even more accurate. It means that the model is able to see something that is hidden to a human eye and that radiologists can still pick out something that the algorithm cannot discern. It turns out that they contribute to improving the algorithm that is better than they are.

What do they have that the algorithm lacks?

That is a million dollar question. We have been analyzing cases where radiologists have proven superior but we haven’t come to any conclusion yet. It is hard to interpret algorithmic operations. We are trying to train learning models that are able to carry out a classification only if they can explain it by specifying the input elements. This issue will probably be subject to in depth analysis for another couple of years.

Where did you get the training data from?

From our system of hospitals. New York University is not only a university; it has its medical school and therefore is the owner of many hospitals and acute medical care units which have been gathering data for many years. This is the data we work on. We have access to hundreds of thousands of mammograms. The Department of Radiology collects medical images, while the Department of Pathology gathers results from pathology tests. We match the images with the results and label them. To train our model, we used about million images.

You can often hear that some results are off target because an algorithm was trained on specific data, e.g. only on mammograms of women having light skin.

Instead of saying they are off target, we prefer to see them as optimistic. This is the basic feature of learning algorithms: they are the most accurate in the case of the data that are very similar to those on which they were trained. On the other hand, they may prove inaccurate in the case of data coming from a different source. This is why it is so important to have a varied set of data and to test algorithms in a fair manner, i.e. on the data that will be used in practice.

The model is able to see something that is hidden to a human eye, while radiologists can still pick out something that the algorithm cannot discern.

We are lucky to live in New York, which is one of the most ethnically diverse places in the world. We don’t have to bend over backwards to train our model on mammograms of women of different nationalities or races.

Can we say that you have created a completely new model?

Not quite. We made use of past achievements of other research teams who had worked on deep neural networks. Our model is based on ResNet neural networks, which we adapted to the problem we were interested in. A lot of standard models are publicly available; different groups of people may use, improve and adapt them for various new purposes.

In the modern world it is impossible to invent anything from scratch and do your research in isolation. Although currently my team consists of six doctoral students, one postdoc, five master’s students, five radiologists and one medical physician, we can’t do everything on our own. Besides, the fact that we have published our model is not enough. Now, other researchers must test it to obtain the same results as we have, only that this time they have to use their own data. The model has to be tested on data collected from various populations in various countries with the use of various mammography units. Only then will we be able to tell if the model really works.

Research on early detection of breast cancer is conducted around the world by many different centers. I have read about achievements of MIT researchers whose algorithm can detect breast cancer five years before any symptoms occur.

There are many teams that do research on neural networks and medical imaging, but relatively few that have expertise in both of those fields.

There may not be a lot of talk about us in the media, but we are well known for our research in the information and medicine communities. The model that we have made available on GitHub [Editor’s note: one of the internet hosting platforms] is very popular. It has been downloaded by many research teams from all over the world who are now trying to modify it or to apply it with their own data. A draft version of our article was published already in March 2019. One year later, another group put out an article with their results which were generally the same as ours. Of course there is nothing wrong with anyone validating the results coming from a different source.

How did you take interest in this field and how did you get to New York University?

I defended my bachelor’s thesis at the Faculty of Mathematics, Informatics and Mechanics, Warsaw University. During the second year of my master’s studies I went to Edinburgh, under the Erasmus Program. Although I defended my master’s thesis in Warsaw, I decided to pursue my doctoral studies in machine learning back in Edinburgh. Most of doctoral students focused on machine learning deal with standard, well known data sets and with problems that have already been solved. But I was interested in development of machine learning algorithms in the areas where there is still a lot to be done. Medical imaging is one of such areas. After obtaining my PhD title, I applied for the postdoc position at New York University.

Can you imagine doing that sort of research in Poland?

Well, it would be hard to conduct that particular kind of research in Poland, because it requires huge outlays and gigantic data sets. Such projects are possible only in a big organization that has been involved in similar undertakings for many years. Such as NYU. That said, I have to admit that Polish centers carry out a lot of other machine learning research projects and that they are successful, one of the examples being the team of professor Jacek Tabor from Jagiellonian University.

Do our universities know how to properly prepare the students for that?

Their level is very good up to the master’s stage. I would even say that if someone graduates from information technology studies at Warsaw University is better prepared to be a programmer than someone else who graduates from Stanford.

There are many teams that do research on neural networks and medical imaging, but relatively few that have expertise in both of those fields.

I served many internships, I worked in Microsoft, Amazon and J.P. Morgan. I can’t think of any university in the world that would be able to prepare you for the job of a computer scientist considerably better than Warsaw University. Those who graduate from Warsaw University are not only well educated but also toughened up. They are determined and ready to do everything better and faster than the students from the best universities in the United Kingdom or America.

Why?

Our education system is kind of brutal. From the very beginning we are thrown in a typical examination regime: tests, quizzes, exams… The idea of the Polish education system is to examine students on a regular basis and in a very strict manner. Of course, that approach has its advantages and disadvantages. However, if you tough it out and if you don’t get demotivated, you will be able to get along in any other place in the world.

Does that mean that Polish computer scientists could be compared to Soviet ice skaters or Chinese gymnasts?

I guess you can say that…

The question is how many persons will not be able to survive that testing ground…

That’s a good point. As I said, the system is not flawless. In terms of our organizational, promotional and social skills we cannot compare ourselves with American or British students. Their education systems provide them with more soft skills, although that is done at the expense of knowing how to solve integrals. You can also say that Anglo-Saxons are more intellectually creative.

How much time will it take before your tool can be used by doctors?

Our model is accurate enough to be useful for them even today. But that is not what we are shooting for. First, it must be understandable; second, it must be integrated with the whole information system. Otherwise it will become burdensome and will delay the diagnostic process instead of speeding it up. The problem of integrating such tools is much more complicated than training of neural networks. Precise algorithms are already at our disposal. Now, we have to make sure that they will not fail at the most unexpected moment.

Krzysztof Geras, PhD – assistant professor at NYU School of Medicine and Center for Data Science. A graduate of Faculty of Mathematics, Informatics and Mechanics, Warsaw University. He took a PhD from the University of Edinburgh. He served internships in Microsoft, Amazon and J.P. Morgan. His professional focus is on unsupervised learning of neural networks, evaluation of machine learning models and applications of those techniques in medical imaging.