Paperweekly Original · Author ｜ Shi Zhuangwei

School ｜ Master of Nankai University

Research direction ｜ Machine learning, map neural network

**Thesis title:**

GMNN: Graph Markov Neural Networks

ICML 2019

**Thesis address:**

https://arxiv.org/abs/1905.06214

**code address:**

https://github.com/DeepGraphLearning/GMNN

This article [1] studied the classification of semi -supervision nodes on the figure. In previous literatures, methods based on statistical relations (such as Malkov) and diagram neural networks (such as diagram Internet networks) have been widely used in such problems. Statistical relations learning methods Reliers through the dependency relationship of object labels, while the map neural network is in the form of end -to -end training to improve the efficiency of graph learning.

In this article, the author proposes the Graph Markov Neural Networks (GMNN). GMNN uses the combined distribution of the model model label of conditional airport, and uses the variable EM algorithm for effective training. In E-Step, a GNN is learned to test the distribution vector for fitting labels. In M-Step, another GNN is used for modeling label dependencies. The experimental results show that GMNN has achieved superior results.

**Related work**

Consider a picture in the semi -supervision learning, where V is the collection of nodes, E is the collection between nodes,is a collection of all node features. Known some tags, L∈V, our task is to predict the remaining tags，U = V \ L。

Statistical relationship learning

ψ is a potential function on the edge, which is generally a linear combination of artificially defined feature function.

In this case, predicting the unknown label task is regarded as a inferred problem, we have to calculate the post -test distribution of the location label, [2] is a typical method based on Gaussmarkev’s random airport and labeling. However, due to the complex structural relationship of the label, it is difficult to find it later.

Compared with SRL, GNN ignores the dependence of labels and only pays attention to the characteristics of the node. Since GNN regards the independence between labels, the combined distribution of labels in this case is expressed:

Prediction tag through aggregation node features

**GMNN**

GMNN uses CRF to model the combined distribution between the modeling label through the object attribute (node characteristics):, optimize the use of pseudo -like changing EM algorithms. Among them, the characteristics of using a GNN to learn nodes in E-Step are expressed by predicting label attributes. M-Step uses another GNN to model the dependency between labels. As shown in Figure 1.

The author uses CRF’s prediction model:, of whichis a model parameter. What we have to do is optimize this parameter to find the greatest grant of the known label:. Because there are a large number of unknown labels, it is difficult to maximize the maximum number of maximum numbers. Therefore, we adopt the method of changing the division inference and use the variation distributionapproximation, to maximize the lower boundary (ELBO):

The

(3) formula can be optimized by changing the EM algorithm [3] [4]. In M-Step, this is equivalent to optimization (4). However, it is difficult to directly optimize (4) formula, because it is optimized to optimize the entire conditions of the entire conditions, and you need to calculateThe PARTITION FUNCTION, the denominator in the (1) type. based on

The independence of, we can convert (4) to optimize (5).

Among them, NB (N) is the neighbor of node N. (5) Formula is called pseudolikelihood function. In the like -minded function (4), the label of a node is related to all other nodes on the graph; in the pseudo -like function (5) formula, the label of a node is only related to its neighboring node; at this time, the maximum is passed at this time, the maximum is the maximum, and the maximum is through the maximum. The pseudo -pseudo -like function is required to take the node label, and only the information of the neighboring domain is required.

The significance of the

(5) type is that the label information and feature information of the adjacent domain is obtained by maximizing the pseudo -like function. Because GNN is a process of aggregating neighborhood information and transmission of messages, $ P _ {\ phi} $ can be implemented through a GNN.

Next discussion, due to its independence, the average field theory is:

The same reason,can be implemented through a GNN.

Maximize the like -minded function:

See Appendix

(8) proof, and a similar formula proof process is also given in the reference [4]. In the (8) formula, use sampling instead of expectations:

(10) in the formula,is a GNN with characteristic dissemination. Learn a mapping from features to labels.is a GNN that spreads the label. Learn a mapping from the labeled node label to the label node label. To train GMNN, we are pre -training first: Use the characteristics of all nodes as input, the label node label as the supervision information, and learn “pseudo -labels” for all nodes. optimize the target:

Then, input the generated “pseudo tag”, the training goal is to make the label generated as close to the “pseudo -label” as possible, which is the significance of the (5) type. According to the (8) (9) formula, the formula (5) is simplified to:

Finally, enter the node features again, the training goal is to make the labels generated and

The label generatedshould be as close as possible, and this timeOutput labels are used as prediction results. Training goals:

So:

pseudo code is as follows:

**Experiment and application
**

** In addition to being applied to semi -supervised node classification issues,**

GMNN can also be applied to unsupervised learning problems and link forecasting issues.

In the unsupervised learning, because there is no label node, we have changed to what neighbor nodes that predict each node. This method of “tagging the neighborhood” is widely used in the previous unsupervised learning algorithm (such as Deepwalk [5]).

In the link prediction problem, the link prediction problem is converted to node classification problems with dual graph [6]. The schematic diagram of the puppet map is as follows:

Experiment on the classification of semi -supervision nodes (using Cora, CITESEER, PUBMED three node classification data sets):

Experimental experiments on unsupervised learning issues:

Experiment on link forecasting:

Experiment on Few-SHOT Learning: For each data set, the 5 tag nodes under each class are randomly selected as training data. GMNN is significantly better than GCN and GAT. This improvement is even greater than the training of semi -supervised learning (that is, 20 tag nodes for each class). The result of this observation proves the effectiveness of GMNN, even if the mark object is very limited.

**References**

[1] Meng Qu, Yoshua Bengio, and Jian Tang. GMNN: Graph Markov Neural Networks. In ICML, 2019.

[2] Jingdong Wang, Fei Wang, Changshui Zhang, Helen C Shen, and Long Quan. Linear neighborhood propagation and its applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(9):1600–1615, 2009.

[3] R. M. Neal and G. E. Hinton. A view of the em algorithm that justifies incremental, sparse, and other variants. In Learning in graphical models, pp. 355–368. Springer, 1998.

[4] D. M. Blei, A. Kucukelbir and J.D. McAuliffe. Variational Inference: A Review for Statisticians. Journal of the American Statistical Association, 112(518):859-877, 2017.

[5] B, Perozzi, R. Al-Rfou, and S. Skiena, Deepwalk: Online learning of social representations. In KDD, 2014.

[6] B. Taskar, M. Wong, P. Abbeel and D. Koller. Link prediction in relational data. In NeurIPS, 2004.

**More reading**

**# 6 6**

**Let your papers be seen by more people**

How can we get more high -quality content to the reader group with a shorter path and shorten the cost of looking for high -quality content for readers?**The answer is: the person you don’t know.**

There are always some people you don’t know, knowing what you want to know. Paperweekly may become a bridge, prompting scholars in different backgrounds and different directions to collide with each other, and more possibilities are out.

Paperweekly encourages college laboratories or individuals to share various high -quality content on our platform, which can be**Latest thesis interpretation**, can also be**Learning experience**or**Technical dry goods**. There is only one purpose of our purpose, so that knowledge really flows.

???? **Standard for manuscripts:**

• The manuscript is indeed an individual**Original works**, the author needs to indicate the author’s personal information (name+school/work unit+education/position+research direction)

• If the article is not the first, please remind and attach all the published links during the submission

• Paperweekly defaults that each article is the first, and the “original” logo will be added

???? **Submitting mailbox:**

• Submit mailbox:[email protected]

• All articles are matched with pictures, please send it alone in the attachment

• Please leave the instant contact information (WeChat or mobile phone) so that we can communicate with the author when editing

????

Now, in**“Zhihu”**can I find us too

Enter Zhihu homepage search**「PaperWeekly」**

Click**“Follow”**Subscribe to our column

**About Paperweekly**

Paperweekly is an academic platform that recommends, interprets, discuss, and reports the results of the forefront of artificial intelligence. If you study or engage in the AI field, please click on the background of the public account**“Communication Group”**, the little assistant will bring you into the communication group of Paperweekly.