Effective Graph Learning with Adaptive Knowledge Exchange

Summary

Briefly summarize the paper and its contributions. This is not the place to critique the paper; the authors should generally agree with a well-written summary.

This paper introduces a framework to exchange information among GNN models which are trained on different graph views. The graph view generation is through stochastic graph augmentation. The key knowledge exchange method is to discriminate information channels and redundant channels through entropy estimation, and exchange the output channels accordingly.

Strengths And Weaknesses

Please provide a thorough assessment of the strengths and weaknesses of the paper, touching on each of the following dimensions: originality, quality, clarity and significance. You can incorporate Markdown and Latex into your review. See /faq.

Strengths: 1. This paper proposes a novel method to exchange information learned by models on different graph views. 2. This framework is evaluated on diverse tasks, with adequate experiments to elaborate on problems like over-smoothing, few-shot, etc.

Weaknesses:

The proposed framework seems not really restricted to GNN models (except the graph views generation is only for graph data, which is not the focus of this paper), as it only cares about the layer weight matrix and channels. Thus I wonder are there any other knowledge exchange methods in other domains (e.g., image data), and should they be included as baselines?
As the quality or method of generating graph views may largely affect the effect of knowledge exchange, such relation should be comprehensively and quantitatively discussed.
The graph datasets used in experiments are relatively small, which are usually used in full-graph training. Thus there remain concerns about the scalability of the proposed method. It will be better to include experiments for stochastic training on graphs.
Minor: The presentation could be improved. Though the paper is generally understandable, some terms can be more explicitly defined before usage to avoid confusion, e.g., weight matrix (which is layer's parameter matrix instead of weighted adjacency matrix) and so on. And Alg 1 should be better rephrased in pseudo-code instead of natural language.

Questions

Please list up and carefully describe any questions and suggestions for the authors. Think of the things where a response from the author can change your opinion, clarify a confusion or address a limitation. This can be very important for a productive rebuttal and discussion phase with the authors.

Limitations

Have the authors adequately addressed the limitations and potential negative societal impact of their work? If not, please include constructive suggestions for improvement. Authors should be rewarded rather than punished for being up front about the limitations of their work and any potential negative societal impact.

adaptively exchanges knowledge from the multiple views generated from the original input graph.

To capture different views of the original graph, following previous 82 work [18, 17], we apply stochastic augmentation functions to generate multiple views of the original 83 graph and then feed them into GNNs. Formally, a different view of the original graph (X, A) is obtained by Xf, Ae 84 = C (X, A), where C(·) is an augmentation function.

Minor: Alg 1