GNN学习记录_1
Traditional Feature-based Methods(Graphs)
Traditional ML Pipieline
Hand-crafted feature+ML model
Hand-crafted features for graph data
Node level:
Node degree,centrality,clustering coefficient,graphlets
Link-level:
Distance-based feature
local/global neighborhood overlap
Graph-level:
Graphlet kernel ,WL kernel
Node
Centrality
Eigenvector centrality:
$$
c_{v}=\tfrac{1}{\lambda }\sum_{u\in N(v)}^{}c_{u}
$$
$$
\lambda c=Ac
$$
A:Adjacency matrix
$$
A_{uv}=1\ \ \ if\ \ u\in N(v)
$$
Betweenness centrality:
$$
c_{v}=\sum_{s\neq v\neq t}^{}\frac{(shortest\ paths\ between\ s and\ t\ that\ contains\ v)}{(shortest\ path\ between\ s\ and\ t ) }
$$
A node is important if it lies on many shortest paths between other nodes.
Closeness centrality:
$$
c_{v}=\sum_{u\neq v}^{}\frac{1}{shortest\ path\ length\ between\ u\ and\ v }
$$
A node is important if it has small shortest path lengths to all other nodes.
Clustering Coefficient
$$
e_{v}=\frac{(edges\ among\ neighboring\ nodes)}{\binom{k_{v}}{2}}\in [0,1]
$$
Graphlet
Graphlet Degree Vector (GDV):A count vector of graphlets rooted at a given node.
Link
Two formulations of the link prediction task:
Links missing at random: Remove a random set of links and then aim to predict them
Links over time: Given G[to, to] a graph on edges up to timet, output a ranked list L of links (not in G[to, tó]) that are predicted to appear in G[t, t!]
How to compute paths between two nodes?
Adjacency Matrix
Distance-based feature
Uses the shortest path length between two nodes but does not capture how neighborhood overlaps
local/global neighborhood overlap
Local:
Captures how many neighboring nodes are shared by two nodes.
Becomes zero when no neighbor nodes are shared
Global:
Uses global graph structure to score two nodes.
Katz index counts #paths of all lengths between two nodes.
(Discrete Mathematics Final Homework)
Graph
Graphlet kernel :
Graph is represented as Bag-of-graphlets.
Counting the graphlets that combine into graph is difficult.
WL kernel:
Apply K-step color refinement algorithm to enrich node color.