Home > News content

Tencent AI defeated Wang Zhenghong Professional Team by self-study and trained for 440 years a day.

via:博客园     time:2019/8/3 13:52:36     readed:172

data-copyright=0

Question farming originated from Maihao Temple

Quantum bit public number QbitAI

King's Canyon, wind and cloud mutation.

A fierce match is under way, with a team of five human professional e-sports masters on the left and the other on the othe

No Their opponents are all there.

This is the highest specification e-sports event in Kuala Lumpur and Honor of Kings last night.Competitive Professional PlayersThe competition area team formed against Tencent King Glory AI will never realize.

In the final battle of 5v5, the AI team with different ideas took 16 minutes and 15 seconds. The regiment destroyed the competitive professional team and pushed out all 9 towers and highland crystals.

This means that Tencent AI's absolute insight ability has been refined to the professional level of King's Glory Competition.

Of course, for non-professional players, they cut melons and vegetables.

On the same day, China Joy in Shanghai, absolutely insightfulTop Amateur PlayersOpen for four days of 1v1 experience testing. In 504 tests on the first day, the winning rate was 99.8%, losing only one game (the other side was the first descendant of Honor of Kings National Service).

First defeat of the competitive professional team

In this competition, five professional electronics players form the competition area team. The lineups they chose were: ESTARPRO. XIXIXI, EMC. SUN, NOVA. SEEK, KZ. NIGHT and M8HEXA. MIKE.

Tencent AI absolute insight (Wukong) choose the lineup: Damo (AI_001), Athena (AI_011), Wang Zhaojun (AI_100), Yu Ji (AI_000), Niu Devil (AI_010).

At the beginning of the game, the Crystal of the Human Team is in the lower left corner.

data-ratio=0.4386712095400341

At the beginning, Wuwu did not choose the traditional human alignment strategy, but first let go of the road. Yuji, a double C hero, and Wang Zhaojun cleaned up the first wave of the line together to suppress the enemy's Zhongfu. Then he turned to suppress Cao Cao's blood line.

data-copyright=0

There is no economic bias in this distribution. Two people eat line, maximize economic benefits, each person can get 80%. The commentary on the spot indicated that AI had a thorough understanding of the right to grab the middle route.

In the first two minutes, Wuwu took the lead in pushing down the first tower of the team in the competition area and enlarging the economic advantage to 5.1k:4.3k. In the opening two and a half points, Cao Cao, the team from the competition area, killed the Wuwu AI Yuji and took the blood. The economy of both sides equaled 6.4k.

data-copyright=0

In 4 minutes and 24 seconds, the four of them chased Naklululu. Damo kicked Naklulu back into the AI and beat her in groups. Finally, Damo took down the first head of AI.

data-copyright=0

During this period, Athena was single-banded and the other four AIs were holding together. In 7 minutes and 20 seconds, Athena succeeded in anti-blue. At this time, she won the top 20.9k economy of three towers, four people and the top 19.7k economy of two towers and three people.

The onsite explanation considered that the efficiency of the insight and the collaboration of the team were excellent.

The two sides then entered a period of fierce confrontation.

8 minutes 48 seconds, absolutely realize a wave of battles 0 for 2, and in the case of residual blood of the whole team, take the initiative to chase Cao Cao, who has a healthy blood volume of the team in the competition area. Nevertheless, absolute enlightenment also lost the least amount of blood in Damo, 1 for 1. Next, we will never comprehend the situation and start to dominate. Subsequently, the resurrected team players came to the game area, the team destroyed the battle team, and pulled out two towers in the middle of the road, anti-blue success.

The on-site explanation points out that AI's strategy is

In 9 minutes and 48 seconds, Juewu won the economy of 28.2k for the first eight people in five towers and 28.9k for the first eight people in four towers in the competition area.

Another minute later, Juewu took four heads in a row. Including 10 minutes and 25 seconds, Cao Cao was killed on the road by AI Yuji. Demonstrate good real-time strategy ability when absolute insight is 1v1.

data-copyright=0

Another minute, the enlightenment pushed off the Middle Road Heights Tower. However, after the team's team coach Cao Cao went around, he immediately came back with a counter-attack and took away four people. The AI ​​team only had Athena to escape.

However, the division team failed to continue to push the tower and did not take the lead.

In 14 minutes, let the savvy kill and dominate. At this time, the ignorance scored 7 towers and 13 heads of 45.1k economy, and the division team won 6 towers and 12 heads for 43.3k economy. Then, the enlightenment began to constantly clean up the various soldiers.

In 15 minutes and 20 seconds, the four people went to the road and held the group. With the support of the leader, they opened the tower. The fierce group battle broke out between the two sides. Under the blessing of AI Wang Zhaojun and the Niu Mo Da Zhao, the enlightenment finally hit a wave of 1 for 5, and the team of the division was destroyed.

data-copyright=0

data-copyright=0

However, when the opponents teamed out and the two dominated the Pioneer on the high ground, the enlightenment did not choose to push the crystal directly, but showed a wave of operation ……

Four people were enlightened, taking turns to resist the tower, without the support of the soldiers, and pushed off the last high-altitude tower with two-thirds of the blood. The live commentary shouted: "It's too much." ”

data-copyright=0

16 minutes and 15 seconds, the enlightenment pushes off the crystal and defeats the division.

In the end, the ignorance won the 9 tower 18 people head 56.2k economy, the division team 6 tower 13 heads 48.0k economy.

The shipping and data of both parties are as follows:

data-ratio=0.41465594280607687

data-ratio=0.40347826086956523

Q&A team

After this historic confrontation, the quiz and the enlightenment team have further exchanges.

Qubit: What is the situation of this enemies?

Tencent Enthusiasm:5v5 Battle of the highest specification electric competition —— The special session of the World Champions Cup semi-final is a division of the Chinese mainland / Hong Kong / South Korea / Malaysia region. The level test of the special section of the World Cup is the first time the 5v5 version has reached the professional level.

The 1v1 version is significantly less difficult to develop than the 5v5 version. ChinaJoy tested the 1v1 version for the first public test, for the top amateur players, the overall strength of the AI ​​is very strong.

data-copyright=0

Qubit: How many heroes do you know now? Is BP also done by itself?

Tencent Enthusiasm:The 5v5 version is the ten heroes fixed this time, and professional players are free to play. In the future, I hope to continue to expand the size of the hero pool.

Qubit: What is the limit of the operating speed of the enlightenment?

Tencent Enthusiasm:It is set to be similar to the human limit hand speed, because the game itself has both attack and skill limits, so the overall is a relatively fair test.

Quotation: How long has the episode been trained? What kind of computing resources are invested?

Tencent Enthusiasm:The training uses 384 GPUs and 8.5w core CPUs. The average number of self-confronted battles per day is equivalent to 440 years of human training, and the training period lasts for more than half a month.

Qubit: What kind of network and computing resources are needed to support the game?

Tencent Enthusiasm:Network decoding does not require too much resources, just a normal server. The 1v1 version already has a mobile version and is currently open for testing on top players at ChinaJoy.

data-copyright=0

Qubit: What are the weaknesses of Enlightenment? Is there a problem that has not been solved by the player?

Tencent Enthusiasm:Some of us won't be called weaknesses, but they are very interesting behaviors.

For example, in this test, do not push the crystal at the end, to maximize the reward? At the end of the race, after the death of the human team, the enlightenment did not directly push the crystal, but after calculating the overall income, choose to push the last highland tower first, then push the crystal until victory. This is something that humans generally don't do, but it is in line with AI's value setting, which is to maximize economic benefits.

Qubit: How do human opponents, especially professional players, evaluate their enlightenment?

Tencent Enthusiasm:In the early stage of the strategy, many AI groups were very early, and even the soldiers were willing to sacrifice the blood line, in exchange for the blood volume advantage; the medium-term super-strong military operation strategy; the long-term strategy is to maintain the initiative of the game; the goal selection and control connection of the group battle is also It is perfect and reflects a strong teamwork ability.

Qubit: Let's introduce the team.

Tencent Enthusiasm:It is a team that has long been committed to game AI and multi-agent research, and some members are from the team of Go AI.

The research and development of ignorance is a combination of algorithm and computing power. It requires an extremely optimized computing platform and continuous improvement optimization algorithm. The team integrates AI Lab's research and engineering talent resources, and also unites our Tencent Technology and Engineering Division. (TEG)'s infrastructure platform division. The main work includes models, features, computing power, data optimization, machine virtualization, building and optimizing data processing, parallel computing and machine learning training platforms.

Tencent AI Lab has always been a pioneer in this type of agent research. Since 2016, the research and development of the AI ​​Fine Art (Fine Art), is now the special AI for the Chinese National Go team training; in 2017, the launch of the enlightenment research and development; in 2018, the realization of the amateur top level, Tencent is still shooting top AI competition VizDoom won the championship and first developed the agent that beat the built-in AI in StarCraft II.

Qubit: How can ordinary people fight with ignorance?

Tencent Enthusiasm:The current epiphany is only an experimental stage and is not open in the game.

The 1v1 version will be tested very short-term on specific occasions. For example, ChinaJoy, the international digital interactive entertainment exhibition held in Shanghai from August 2nd, will open a four-day experience test to top amateur players.

Road to enlightenment

Enlightenment is the frontier research project of Tencent AI Lab and the glory of the king: strategic collaboration AI.

The name of the ignorance, meaning "excellent comprehension". The development of this AI began in December 2017. In December 2018, the enlightenment 5v5 against the "King of the Glory" king of the human player, the battle of 250 games, scored a 48% winning percentage. Now, the enlightenment has surpassed the level of the king and reached the level of professional e-sports players.

The enlightened version of this time in Kuala Lumpur and Shanghai established a deep-reinforcing learning model based on “Observation-Action-Reward”, without human data, starting with Whiteboard Learning (Tabula Rasa) and let AI play against itself.

AI day training intensity is up to human440 years.

data-copyright=0

According to Tencent, AI explored successful experiences from 0 to 1 and learned hard and hard, and learned the common sense of how to stand, play wild, assist protection and avoid injuries. Moreover, AI has also explored new strategies that are different from the normal practices of humans. In the above match, we can already feel the difference in enlightenment.

The enlightened R&D team also created the One Model model to improve training efficiency, optimize communication efficiency, and improve the teamwork ability of AI. The zero-sum reward and punishment mechanism allows AI to maximize the team's interests and make it play decisively.

The difficulty in testing in the game is that AI needs to make complex and fast decisions in the case of incomplete information and high complexity.

On a large and incomplete map, 10 participants faced a large number of uninterrupted and immediate choices in strategic planning, hero selection, skill application, path exploration and teamwork, which brought an extremely complicated situation. Expected to be up to 1020000Operational possibilities, and the total number of atoms in the universe is only 1080.

If AI can learn, analyze, understand, reason, and make decisions in real-time in such a complex environment, it can play a greater role in a changing and complex real environment.

Yao Xing, vice president of Tencent, said that “e-sports” will become the main application scenario of strategic cooperation AI<quo; ignorance; In the long-term application, the realization will be a key step for Tencent to overcome the general artificial intelligence (AGI).

Previously, another AI ensemble of Tencent had killed the Quartet in the Go game. Of course, for artificial intelligence, the glory of the king is a problem that is much more complicated than Go.

data-copyright=0

The technology behind the ignorance

For this ignorance, Tencent AI Lab said it will further share technical details through papers and other forms, and help and inspire more researchers through open research.

Here we recall that Tencent has previously published a paper on the glory of the king. In this paper, Tencent said that Enlightenment is a learning-based Hierarchical Macro Strategy model. Through the edification of this model, the agent that controls each hero can make decisions independently and not forget to communicate with teammates to become top players.

The “layering” in the name refers to the model divided into the Attention Layer and the Phase layer. The former is used to predict where the hero should go. The latter is responsible for identifying the stage of the game. Is it early, right or late?

data-copyright=0

Let's see firstAttention layerThat is, how AI judges where its hero should go.

To cultivate this ability, we must first have the right training data. In the glory of the king, we want to judge the hero. “Where is it,” the most appropriate standard is “to play here”.

Therefore, when Tencent marked the training data, the location where the next attack occurred was defined as the place where the hero should go now.

data-copyright=0

For example, the above picture takes Han Xin as an example to show where the hero should go when the game starts. The left side shows the state of the game at the initial stage s-1, and the middle and right red boxes indicate ysys+1It shows the location of Hanxin's first and second attacks, that is, where he should go in the two stages of s-1 and s.

The goal of AI is to learn to go to the y position in the s-1 phase and the y in the s phase.s+1position.

Using this kind of data to train the attention layer, you can let AI master the meaning of hero movement.

Knowing where to go is not enough. If you want to go to the king, you have to judge the situation and adjust the strategy. This isPeriod layerWork now.

I want to know if the game is going to the early stage, the line period or the later stage. Fortunately, the status and stage of the main resources in the game are inseparable. For example, if the hero is still targeting the tyrant (Little Dragon), then the game must have just started; if it hits the enemy's home, it is of course late.

Therefore, teaching AI to judge the situation is based on the attack on the enemy's main resources, including the tower, the tyrant, the lord (Dragon) and the base.

data-copyright=0

The above picture shows the main resources of the enemy concerned in the period. The model should learn from it. It is based on the resource status to judge what major resources should be hit now, and further determine which small targets to accomplish.

For example, the following pictures show that stealing blue buff (wild monster) and clearing the line of troops are all small targets of this period.

data-copyright=0

To be able to analyze the situation, set goals, and know where to go, the rest isCommunication and coordination among teammatesNow.

But to learn how to communicate, there is really no human combat data can be used for training. After all, the communication of human teammates is full of complaints.

So Tencent designed a new cross-agent communication mechanism to train AI with teammates'attention tags, so that it could learn to predict where teammates would go and make decisions accordingly.

In this way, five intelligence agents in a team can work together.

One More Thing

Finally, please watch the video confrontation in the original text.

China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments