Home > News content

The Chinese Legion dominates KDD: Chinese doctors won the best papers, and Tsinghua Peking University’s China University of Science and Technology

via:博客园     time:2019/8/7 9:37:26     readed:207

Dried fish, sheep, chestnuts, Annie, one 璞, policy, from the concave temple

Quotation Report Public Number QbitAI


Another global AI summit will be a Chinese toast.

KDD, the world's highest level conference in data mining, was held in Alaska this year. The main awards and three major competitions have just been announced.

This year, the Chinese face is starry. Dong Kun, a Chinese doctor at Cornell University, won the best paper on the research track, the start-up company awards, the KDD CUP three major events, and was basically taken over by Chinese companies.

The details are transmitted as follows:

Double blind review for the first year, 15% acceptance rate

KDD, International Data Mining and Knowledge Discovery Conference, full name: ACM SIGKDD Conference on Knowledge Discovery and Data Mining, is the highest level conference in the field of data mining.

Since 1995, the KDD Conference has been held for more than 20 consecutive years, with an annual acceptance rate of no more than 20%, and this year's acceptance rate is less than 15%.

It is worth mentioning that this year is also the first year for KDD to adopt a double-blind review.


Still dividedResearch trackwithApplication track.

Among them, according to public information, the KDD Research Circuit received a total of 1,179 submissions, including111Articles were accepted as Oral papers, and 63 papers were accepted as Poster papers.14. 8%.


More than 700 papers were received on the application track, of which 45 were received as Oral papers and 100 were received as Poster papers.20. 7%.

In comparison, KDD received 181 research tracks in 2018, with a reception rate of 18.4% and 112 applications on the track.22. 5%.

Emphasize that the paper can be reproduced

And most importantly, KDD also highlighted the ——Reproducible—— and stipulates that this will be the best paper selection qualification, and the paper needs to submit additional content to display the reproducible content.

Including experimental methods, empirical evaluations and results, it is also encouraged to publicly study the code and data in the paper to describe the algorithms and resources used in the paper as completely as possible.


As a result, KDD 2019 has also become a topic of great concern.

See which teams have won the specific awards:

Research the best papers on the track

Network Density of States(state network density)


The paper is from Cornell University. The first author is Dong Kun, a Ph.D. student in applied mathematics at Cornell University, and a master's degree in UCLA.


Other authors include Austin Reilley Benson, assistant professor of computer science at Cornell University, and David Bindel, associate professor of computer science at Cornell, who is also a doctoral tutor at Dong Kun.

Spectral analysis links the graph structure to the eigenvalues ​​and eigenvectors of the correlation matrix. Many spectral theories come directly from spectral geometry, and the differentiable manifolds are studied by the spectrum of the relevant differential operators. However, the conversion from spectral geometry to spectral mapping is mainly focused on results involving only a few extreme eigenvalues ​​and their associated eigenvalues.

Unlike geometry, the study of graphs by the overall distribution of spectral values ​​(spectral density) is mainly limited to simple stochastic graph models. The interior of the real world map is still largely difficult to calculate and interpret and has not been explored.


In this article, the authors delve into the spectral density of real-world maps. We borrowed the energy density of the research tools in condensed matter physics and added new adaptability to deal with the spectral features of common graphic patterns. The resulting method is efficient, as illustrated in the paper by calculating the spectral density of a graph with more than one billion edges on a single compute node.

In addition to providing visually compelling graphical fingerprints, this paper demonstrates how spectral density estimation drives the calculation of many common central metrics and uses spectral density to estimate meaningful information about the structure of the graph. The value intrinsic is inferred.

Study the second paper on the track

Optimizing Impression Counts for Outdoor Advertising


This research solves the most cost-effective question about how outdoor advertising is delivered. The authors are from the Royal Melbourne Institute of Technology, Singapore Management University, Wuhan University and Huawei.


The specific problem is that the team first proposed, calledOutdoor advertising impressions(Impression Counts for Outdoor Advertising, ICOA).

There are so many advertisements along the road, but you can make a small impression, and most of them are forgotten. Therefore, as long as the advertisements leave an impression on your mind, the purpose of many advertisers is reached. Research is about how to make more people more impressed.

Due to the development of the mobile Internet, no matter what transportation method you use, driving, riding a motorcycle or bicycle, the trajectory of each trip can be recorded, so the researchers found the travel trajectory database T. In addition to this, there is also a given billboard database U and the advertiser's delivery budget B.

So, in a nutshell, the number of outdoor advertising impressions is such a problem:

There are so many billboards and budgets, and the road people will go so far. Under these given circumstances, how to improve the total impression left to the road people and let the money spend the most value?

Here are two issues to solve:

  1. Each ad should be seen by a passerby several times;
  2. How to layout the billboards so that more passersby can see the best number on the travel track.

The first problem was solved by previous research. It is a sigmoid function. As the number of advertisements increases, the impression left to passers-by is more and more profound. After repeated repetition, it is useless, and there are only side effects.

The second problem, the problem of billboard distribution, needs to be solved by an algorithm. The research team found that it is not feasible to use the greedy algorithm directly. Therefore, a tangent-based algorithm is proposed to calculate the sub-module function. In order to improve the efficiency, the θ termination method and the progressive upper bound estimation method are designed for optimization.

Finally, the research team validated the approach proposed in this study after experimenting with real trajectories and billboard datasets in two cities, New York and Los Angeles.

Apply the best papers on the track

The best paper to get the application track isActions Speak Louder then Goals:Valuing Player Actions in Soccer(Action is better than goal: Pay attention to the actions of players in football):


The authors of the paper include Tom Decroos from the University of Leuven, Belgium, Lotte Bransean and Jesse Davis from SciSports, and Jesse Davis from the University of Leuven, Belgium.


Assessing the impact of a football player's individual performance on the outcome of the game is a focus of the player's recruitment process. However, most traditional metrics are not satisfactory in solving this task because they either focus on individual footage and target behavior, or do not consider the context in which the player makes a certain action.

This paper mainly introduces: (1) a new language used to describe the movements of various players on the court.SPADL(2) A new framework for assessing player movement based on the impact of player performance on the outcome of the game.VAEP, taking into account the background of the action.

By summarizing the player's action value, they can quantify their contribution to the team's overall offense and defense.

The highlight of the study was the consideration of action background information that would normally be overlooked, and the research team used this method to collect a large number of use cases in the top European events of the 2016/2017 and 2017/2018 seasons.


Messi really is a self-contained class

Apply the second paper on the track

This is a study using wearables to detect cognitive dysfunction (probably a precursor to dementia) led by Apple.

Developing Measures of Cognitive Impairment in the Real World from Consumer-Grade Multimodal Sensor Streams



Wearables and mobile computing devices are now ubiquitous and technologically advanced; coupled with the diversity of sensors in Europe, these advances offer the potential to continuously monitor patients and monitor their daily activities.

With such a wealth of longitudinal information, Longitudinal Information can be used to analyze psychological and behavioral traits for cognitive dysfunction and to provide new and timely detection of mild cognitive impairment (MCI). way.

MCI is the state between normal cognition and dementia.

The study proposes a platform for remotely and non-invasively monitoring symptoms associated with cognitive dysfunction, relying solely on smart devices from several consumer machines.

The team demonstrated how the platform collected 16 terabytes of data in the Lilly Exploratory Digital Assessment Study, supporting a 12-week feasibility study: monitoring 31 patients with cognitive dysfunction, and 82 humans with no cognitive dysfunction and free living conditions.

The researchers also explained that rigorous Data Unification, Time-Alignment, and Imputation are how to deal with the lack of data inherent in reality, and finally show that such data is distinguishing symptoms. Play a role.

Time test award

This year's test of time award comes from CMU and Nielsen BuzzMetrics. The paper reads:

Cost-effective outbreak detection in networks



It was awarded this award because it was 12 years since the paper was first published in 2007, when Cure's Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen and Nielsen BuzzMetrics' Natalie Glance6 people Because this article won the best student thesis.

In this paper, the researchers have shown that many real-life outbreak detections (such as detection possibilities, detection of affected populations, etc.) exhibit the characteristics of the "submodule".

Researchers used sub-modules to develop an efficient algorithm calledCELFAlgorithm to optimize the efficiency of the Greedy Algorithm. The algorithm is as follows:


The results show that the CELF algorithm can be extended to more complex problems, and can be close to the optimal position, and at the same time, than the simple greedy algorithm.Fast 700Times.

Subsequently, they continued to test the CELF algorithm on several large real-world problems. Using the US Environmental Protection Agency's water distribution network model and real-time blog data, the resulting sensor locations proved close to the optimal solution, providing an optimal solution. Constant score. They also proved that this method can be extended to save storage space by several orders of magnitude.

Entrepreneurship Research Award

ACM SIGKDD launched the Entrepreneurship Research Award in 2017 to encourage early start-ups to participate in the data science arena. The award is determined by the Entrepreneurship Research Awards Committee from a number of competitive outcomes.

The four companies that won the award today are: Arkive, deepair, Rayleigh Wisdom and Sky Eye.


Arkive is a company that uses machine learning technology to manage knowledge and experience. The company is founded by two Chinese.

Deepair offers travel vendors an AI-based retail platform.

Relais Smart is a company that provides industrial predictive maintenance, industrial testing, unsupervised anti-fraud, and artificial intelligence systems.

Sky Eye has a high reputation in China and is a company that provides corporate big data to its customers.

Character award

At the opening ceremony of KDD, researchers at IBM WatsonCharu AggarwalSIGKDD was acquired for the lifetime achievement of data miningInnovation award(Innovation Award). He also published three papers at this conference.


Charu Aggarwal, picture from IBM official website

Charu Aggarwal received his bachelor's degree from Indian Institute of Technology, Kanpur in 1993, and later received his Ph.D. in 1996 and worked at the Massachusetts Institute of Technology.

He has worked extensively in the area of ​​data mining, with a particular focus on data flow, privacy, uncertain data, and social network analysis. He has published 19 books, published more than 350 papers, and has applied for or obtained more than 80 patents. He has won several awards for inventions and was named the inventor of IBM three times.

Also Balaji Krishnapuram from IBM Watson got KDDService award(Service Award) to reward him for his outstanding contribution to data mining.

It served as Chairman of ACM SIGKDD from 2014 to 2016 and IBM Watson Health in 2015 to develop AI solutions for the pharmaceutical industry.


Balaji Krishnapuram, picture from Twitter User Profile Prithwish Chakraborty

This year'sDissertation awardThe (Dissertation Award) was presented to Tim Althoff from the University of Washington, and the second place was the Chinese scholar Chao Zhang from UIUC.


KDD CUP 2019

This year's KDD CUP has 3 tracks:

  • Regular Machine Learning Competition (Regular ML Track)
  • Automated Machine Learning Competition (Auto-ML Track)
  • “Research for Humanity" Enhanced Learning Contest (Humanity RL Track)

This event has always been called “Big Data World Cup”, and the competition is fierce.

KDD official statistics show that more than 5,000 individuals from 39 countries submitted 17,000 applications this year.


From the final results, the Chinese Legion is particularly eye-catching and has won most of the awards.

First sayConventional machine learning competitionSponsored by Baidu, it is divided into two tasks.

The champion and runner-up of mission 1 are all from China. Among them, the champion came from Ant Financial; the runners came from Shanghai Weimeng, Trend Micro, Didi Chuan, Beijing University of Posts and Telecommunications, South China University of Technology, Jingdong and other units.

The champion of Task 2 came from the Japanese telecommunications company NTT DOCOMO, and the runner-up came from Southeast University.

In addition, the additional PaddlePaddle Special Awards will be spent at the University of Science and Technology of China.

Followed byAutomatic machine learning competition, sponsored by the fourth paradigm.

The champions are from China's Shenlan Technology and Peking University; the runner-up is from the National University of Singapore; the third is from Alibaba and Georgia Institute of Technology.

Finally say“Research for Humanity& rdquo; Reinforcement Learning Competition, sponsored by IBM and Hexagon ML.

The champion came from Taiwan, the National Cheng Kung University; the runner-up came from Tsinghua University, Jingdong and Beijing University of Aeronautics and Astronautics, and the third came from the seeds.

Competition details address:


One more thing

More than papers and competitions have been dominated by the Chinese Legion.

KDD 2019 sponsors, China's power is also particularly conspicuous.


Baidu, Tencent, Didi, Alibaba, Quick Hands, Inspur and Byte Beat and Squirrel AI are all on the sponsorship list.

Therefore, there are also ridicules. It is time for AI to be held in China. After all, it is closer to the core players and there is no visa worry.

What do you say?


China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments