Home > News content

Google brain publishes concept activation vectors to understand new ways of thinking about neural networks

via:博客园     time:2019/7/23 20:21:05     readed:197


Big data abstracts

Compilation: Li Ke,Zhang Qiuzhen, Liu Junyu

Interpretability remains one of the biggest challenges of modern deep learning applications. Recent advances in computational models and deep learning research have enabled us to create extremely complex models, including thousands of hidden layers and tens of millions of neurons. The striking frontier depth neural network models are relatively simple to construct, but understanding how these models create and use knowledge remains a challenge.

Recently, researchers at the Google Brain team published a paper that proposed a new method called Concept Activation Vectors (CAV) that provides a new perspective on the interpretability of deep learning models.

Interpretability vs accuracy

To understand CAV technology, you need to understand the nature of the interpretable problem in the deep learning model. In today's generation of deep learning techniques, there is an eternal contradiction between the accuracy and interpretability of the model. Interpretability-accuracy contradictions exist between the ability to complete complex knowledge tasks and the ability to understand how these tasks are accomplished. Knowledge and control, performance and verifiability, efficiency and simplicity... Any choice is actually a trade-off between accuracy and interpretability.

Are you concerned about getting the best results, or are you concerned about how the results are generated? This is a question that data scientists need to answer in every deep learning scenario. Many deep learning techniques are inherently complex, and although they are accurate in many scenarios, they are very difficult to interpret. If we draw some of the most famous deep learning models in an accuracy-interpretability chart, we will get the following results:


The interpretability in the deep learning model is not a single concept. We can understand it from multiple levels:


To get the interpretability of each layer definition in the above diagram, several basic building blocks are required. In a recent paper, Google researchers outlined some of the basic building blocks that they seem to be interpretable.

Google summarizes the following interpretability principles:

  • Understand the role of the hidden layer: Most of the knowledge in the deep learning model is formed in the hidden layer. Understanding the functions of different hidden layers at the macro level is critical to explaining the deep learning model.
  • Understand how nodes are activatedThe key to interpretability is not to understand the function of the individual neurons in the network, but to understand the interconnected neuron population that are excited together in the same spatial location. Segmenting a neural network by interconnecting neuron groups allows us to understand its function from a simpler level of abstraction.
  • Understanding the formation process of concepts: Understanding how deep neural networks form a single concept that makes up the final output is another key building block of interpretability.

These principles are the theoretical foundation behind Google's new CAV technology.

Concept activation vector

Following the ideas discussed above, the interpretability that is generally considered is to describe its predictions through the input characteristics of the deep learning model. A logistic regression classifier is a typical example, and its coefficient weights are often interpreted as the importance of each feature. However, most deep learning models operate on features such as pixel values ​​that do not correspond to advanced concepts that are easily understood by humans. In addition, the internal values ​​of the model (for example, neuron activation) are also very obscure. While techniques such as saliency maps can effectively measure the importance of specific pixel regions, they cannot be associated with higher level concepts.

The core idea behind CAV is to measure the relevance of a concept in the model's output. A conceptual CAV is a set of vectors of values ​​(eg, activation) of instances of the concept in different directions. In the paper, the Google research team outlined a linear interpretable method called Testing with CAV (TCAV) that uses partial derivatives to quantify the sensitivity of predicting potential high-level concepts of CAV representation. They conceived that the TCAV definition has four goals:

  • Easy to understand: Users rarely need machine learning expertise.
  • personalise: Adapt to any concept (eg, gender) and is not limited to the concepts involved in training.
  • Plug and play: Operates without retraining or modifying the machine learning model.
  • Global quantification: A single quantitative measure can be used to explain all or all instances, rather than just a single data input.


To achieve the above objectives, the TCAV method is divided into three basic steps:

1) Define relevant concepts for the model.

2) Understand the sensitivity of predictions to these concepts.

3) Infer a global quantitative interpretation of the relative importance of each concept to each model's prediction class.


The first step in the TCAV approach is to define the related concepts (CAV). To accomplish this, TCAV selects a set of instances that represent the concept or looks for a separate data set labeled as that concept. We can learn CAV by training linear classifiers to distinguish between activations generated by concept instances and instances in layers.

The second step is to generate a TCAV score that is used to quantify the sensitivity of the prediction to a particular concept. TCAV uses a partial derivative that measures the sensitivity of the ML predictor in a conceptual direction to the input sensitivity at the active layer.

The final step is to assess the global relevance of the learned CAV and avoid relying on unrelated CAVs. After all, one of the drawbacks of TCAV technology is that it is possible to learn meaningless CAVs, because CAV can still be obtained using a randomly selected set of images, and testing on this stochastic concept is unlikely to make sense. To cope with this dilemma, TCAV introduced a statistical significance test that evaluates CAV with a random number of trainings (usually 500). The basic idea is that meaningful concepts should result in consistent TCAV scores in multiple training sessions.

TCAV operation

The team conducted several experiments to evaluate the efficiency of TCAV compared to other interpretable methods. In one of the most compelling tests, the team used a notable map to try to predict the relevance of the taxi concept to the title or image. The output of the salient map is as follows:


Using these images as test data sets, the Google Brain team invited 50 people to experiment on Amazon Mechanical Turk. Each experimenter performs a series of six random sequence tasks for a single model (3 types of objects x 2 saliency maps).

In each task, the experimenter first saw four images and corresponding saliency masks. They then assess how important the image is to the model (10-point scale), how important the title is to the model (10-point scale), and how confident they are in the answer (5-point scale). The experimenter rated a total of 60 different images (120 different saliency maps).

The basic fact of the experiment is that the image concept is more relevant than the title concept. However, when looking at a salient map, one thinks that the heading concept is more important (a model of 0% noise), or that it does not discern the difference (a model with 100% noise). In contrast, the TCAV results correctly indicate that the image concept is more important.


TCAV is one of the most innovative neural network interpretation methods of the past few years. The initial code can be seen on GitHub. Many mainstream deep learning frameworks may adopt these ideas in the near future.

Related reports:


China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments