GraphNetworks

This is a very detailed and clear intro to Graph Networks byDeepmind.

Graph Definition (Box 3 &3.2.1)

We define a graph to have node, edge, and global attributes. Globalattribute, u, for example canbe the gravitational field in a three body problem setting.

Global attribute is there to give a chance to any local node / edgeto know what’s happening in a global perspective (mostly places far awyfrom it). If we exclude the global <spanclass="math inline">u</span> (which aggregates information fromacross the nodes and edges), the information that a node has access toafter m steps of propagation is determined by the set of nodes and edgesthat are at most m hops away (Figure 7). This can be interpreted asbreaking down a complex computation into smaller elementary steps.

Graph Update (3.2.2 & 3.2.3& 4.2)

Graph Network Algorithm

The whole GN consists of the following 6 functions:

\(\begin{array}{l}{\mathbf{e}_{k}^{\prime}=\phi^{e}\left(\mathbf{e}_{k},\mathbf{v}_{r_{k}},\mathbf{v}_{s_{k}},\mathbf{u}\right)}\\ {\mathbf{v}_{i}^{\prime}=\phi^{v}\left(}}_{i}^{\prime},\mathbf{v}_{i},\mathbf{u}\right)}\\ {\mathbf{u}^{\prime}=\phi^{u}\left({\bar{\mathbf{e}}}^{\prime},{\bar{\mathbf{v}}}^{\prime},\mathbf{u}\right)}\end{array}\begin{array}{l}{\bar}_{i}^{\prime}=\rho^{e\to v}\left(E_{i}^{\prime}\right)}\\ {\bar}^{\prime}=\rho^{e\to u}\left(E^{\prime}\right)}\\ {\bar}^{\prime}=\rho^{v\to u}\left(V^{\prime}\right)}\end{array}\)<p>ϕe isan edge specific function and is applied to every edge, same for nodeand global. ϕ can be anyfunction, people usually use Neural Networks.</p><p>On the other hand, ρ takesin a set as input, so needs to be an aggregate function invariant topermutations, like sum, mean, max, min, etc. These functions are notaffected by the order they are sent in.</p><h3 id="composing-multi-graph-block-architecture-4.3">Composing MultiGraph Block Architecture (4.3)</h3><p>A GN is simply three aggregate functions and three update functionsfor node, edge, global respectively. It always takes in a graphcomprised of edge, node, and global attributes as input, and returning agraph comprised of edge, node, and global attribute. The GN operates onany graph without caring about size of the graph, so it is extremelyeasy to stack multiple GN block together.</p><p>Figure 6 shows 3 commons ways of stacking GN blocks.</p><h3 id="graph-networks-advantage-3.2.4">Graph Networks Advantage(3.2.4)</h3><ol type="1"><li>First, graphs can express arbitrary relationships among entities, sothe way input interacts is not limited to a fixed modelarchitecture.</li><li>Graphs represent entities and their relations as sets, which areinvariant to permutations. However, we can still impose ordering byencoding the indices in the node or edge attribute.</li><li>A GN’s per-edge and per-node functions are reused across all edgesand nodes, respectively. Therefore, GN can easily operate on graphs ofany size.</li></ol><h3 id="limitations-5.2">Limitations (5.2)</h3><p>Notions like recursion, control flow, and conditional iteration arenot straightforward to represent with graphs.</p><h2 id="graphcast"><ahref=”https://www.science.org/doi/10.1126/science.adi2336”>GraphCast</a></h2><p>GraphCast by Deepmind is a perfect example to understandencode-process-decode GN structure and it is clearly written. Note themodel architecture is inside the <ahref=”https://www.science.org/doi/suppl/10.1126/science.adi2336/suppl_file/science.adi2336_sm.pdf”>supplementarymaterial</a>. The following notes mostly come from Section3 GraphCastmodel.</p><p>They defined two space for this task. One on the grid space based onearth’s latitude and logitude, where the atmospheric information comesfrom. The other on a “mesh space” based on a regular sphere, where theactual processing happens. The design was made on the observation that“Using the latitude-longitude grid is not an advisable representationdue to its spatial inhomogeneity, and high resolution at the poles whichdemands disproportionate compute resources.”</p>

<imgsrc=”https://www.science.org/cms/10.1126/science.adi2336/asset/98816fa7-3765-4307-bbe7-29ebbd5e26a3/assets/images/large/science.adi2336-f1.jpg”alt=”Mesh Space” /><figcaption aria-hidden="true">Mesh Space</figcaption>
<p>As for the mesh space, it is not only 1 mesh space, but 7 mesh spacelayered together. The first mesh space (M0) is a regular icosahefron (12nodes and 20 faces). The second (M1) is the first layer divided 6 times.The third (M2) is the second divided 6 times and so forth. The M0 layeris the base and the coarsest one. Among all layers, It has the fewestpoints and the greatest distance between each point. On the other hand,M6 layer has the finest resolution, most points, and each point is onlya few units away from each other.</p><p>When we update a particular point on the net, we look at each layer:M0, M1, …, M6. For each layer Mi, we look at this point’s neighbor onthis mesh and use these edges to update this point’s value. On M0, theneighbors are far away, so you get information from away. On M6, theneighbors are very close, so you get information from close points. Thisis exactly what depicted on paper’s Fig1 e) - every point has input fromdifferent distance.</p><p>What’s most important is that since the finer resolution layer isobtained by dividing the coarser layer, every point and edge on thecoarser layer can be directly found on the finer layer. Therefore, inmemory we only have to store a single layer M6.</p><p>You might ask what if a point on M6 is not originally on M0? Will itstill have neighbors on M0 properly defined? Yes (I think), because youcan think of the mesh construction process in reverse: if you start withan arbitrary point on the finest mesh and make the resolution coarser,you will always be able to get back to some regular icosahefron with thesame strcuture as the M0 we started with (though not the exact same onebecause this point you started with might indeed not be on the M0 webuilt the multi-mesh upon).</p><p>The encoder is a graph mapping grid nodes to mesh nodes. Theprocesser happens on mesh nodes only. The decoder maps mesh nodes backto the grid nodes.</p>