In terms of accuracy, the new crown prediction model built with one's own efforts in only one week has crushed those professional institutions with billions of dollars and decades of experience.
He is Youyang Gu, with a master's degree in electrical engineering and computer science from MIT, and a degree in mathematics.
But it is worth noting that he is a little white in medicine and epidemiology.
His model has even been used by famous data scientists fast.ai Founder Jeremy Howard said highly:
The only model that seems reasonable.
He's the only one who really looks at the data and does it right.
Moreover, his model was adopted by the CDC.
What kind of prediction model is it?
The time point goes back to the beginning of last year.
At that time, the epidemic had spread all over the world, so the public tried to use modeling to predict the impact of the next epidemic.
However, the prediction results of the two institutions are very different
Imperial College London: by the summer, the number of deaths from the new coronavirus in the United States will reach 2 million.
IHME: the death toll is expected to reach 60000 by August.
(it turned out that the death toll was 160000.)
Why is there such a big gap in the forecast data given by the two professional institutions?
This attracted the attention of Youyang Gu, who was only 26 years old at that time.
Although he has no medical or epidemiological experience, he firmly believes that data forecasting will be of great use at this time.
So in about mid April, Youyang Gu spent only a week at home building his own predictor and a website that can display relevant information.
Website created by △ Youyang Gu
However, the method used by Gu in this process is not so advanced, on the contrary, it is the relatively simple one.
He first considered the relationship between the number of new coronavirus tests, the number of hospitalizations and other factors, but in the process, Gu found that the data provided by various states and the federal government were inconsistent.
Gu believes that the most reliable data seems to be the daily death toll:
Other models use a lot of data sources, but I decided to use the past death toll to predict the future death toll.
So, what's the forecast?
It's pretty accurate.
When the model was just completed, he predicted that 80000 people would die in the United States on May 9, with the actual death toll of 79926 on that day.
Gu also predicted that the death toll would reach 90000 on May 18 and 100000 on May 27.
In addition to accurate numerical predictions, Gu predicts a second wave of large-scale infections and deaths based on the gradual transition of many states from a closed state to an open state.
Perhaps because of the accuracy of Gu's model, more and more people begin to pay attention to his works.
On twitter, Gu not only interviewed reporters, but also sent e-mails to epidemiologists, asking them to verify their data.
At the end of April last year, Carl Bergstrom, a famous biologist at the University of Washington, posted Gu's model on twitter.
Soon after, the Centers for Disease Control and Prevention released Gu's data on its new crown prediction website.
Not only that, with the development of the epidemic, Gu, a Chinese immigrant, also participated in regular meetings organized by a team of American experts, and everyone wanted to better improve his model.
His website visits also showed an explosive growth, with millions of people coming to see his data every day.
Generally, the data predicted by Gu's model will reach the target in a few weeks, which is very close to the actual death toll.
With the increasing number of similar prediction models, Nicholas Reich, an associate professor in the Department of Biostatistics and epidemiology at the University of Massachusetts in Amherst, has counted 50 such models
Gu's model has always been at the forefront.
In response, Reich said:
Youyang Gu is a very humble person. Seeing that other people's models are also well done, he feels that his work has been completed.
The month before Gu decided to stop the project, he predicted that the death toll would reach 231000 on November 1, compared with the actual number of 230995.
But Chris Murray of IHME thinks:
In this regard, Gu did not respond to the evaluation of the model. On the contrary, he said:
I am very grateful to Dr. Chris Murray and his team for their work; without them, I would not have been what I am today.
After a break, Gu is back in the business.
Gu has always wanted to find a job that would have a huge impact on society, while avoiding politics, prejudice and sometimes the burden of large institutions. He believes that:
In this field, there are many shortcomings that can be improved by people with my background.
Who is Youyang Gu?
Youyang Gu came from a Chinese American immigrant family and grew up in Illinois and California.
Gu was fond of mathematics and science since childhood. He didn't really touch computer science until he graduated from high school. He was able to enter the industry thanks to his father, who was a computer practitioner.
Gu Youyang doing chemistry experiment (picture from Clark scholarship project in 2010)
Gu studied at MIT for both his bachelor's and master's degrees, where he received a double bachelor's degree in computer science and mathematics, as well as a master's degree in computer science.
After graduation, he continued his research in NLP group of MIT's famous CSAIL Laboratory for one year, and published his paper in emnlp 2016 in the same year.
This is also his first contact with big data, from which he established a statistical model to predict the data.
However, instead of continuing his academic research, he entered the industry. After leaving MIT, he joined the financial industry, writing algorithms for high-frequency trading systems.
There, his data modeling ability has been further honed, because in financial transactions, the data must be very quantitative and as accurate as possible.
After that, he entered the sports industry and continued to study big data. This also provides him with rich interdisciplinary experience, so that he can successfully cope with the new field and know how to model more accurately.
In his own words, he specializes in using machine learning to understand data, separate signals from noise, and make accurate predictions.
In establishing the new crown death model, he initially considered the relationship between the number of confirmed cases, the number of hospitalizations and other factors. Then he found that the data reported by the States and the federal government were inconsistent, and the most reliable number was the number of deaths per day.
Gu believes that if the quality of input data is very low, the more data there is, the worse the output performance will be.
Within a week, he built a simple model based on the death data and put the prediction website online.
Since last April, Gu has voluntarily invested thousands of hours in this project, and it is free of charge.
In an interview with Eric Topol, editor in chief of Medscape, a medical website, Gu said that he is now working full-time on the new crown prediction website. He has no part-time job or income. He lives on his past savings.
Gu Youyang interviewed Eric Topol, editor in chief of Medscape
However, such a public welfare project has been criticized by some Twitter users, but he still insisted on it.
From December, covid 19- projections.com With the help of donations from netizens, we have now reached the goal of raising $50000.
In addition to the number of infected people, Gu's new website has a new function. Since last December, covid 19- projections.com Start to track and simulate the vaccination situation and the route of group immunization.
How to go in the future? What is Gu's career planning after the outbreak?
He said it's too early. Although his current job is to predict the development of the epidemic, it's difficult for him to predict what he will do in three months or a year.
Because of this work, universities and enterprises all over the world have thrown an olive branch to him.
- THE END -
Link to the original text:qubit