专栏名称: 数据派THU

本订阅号是“THU数据派”的姊妹账号，致力于传播大数据价值、培养数据思维。

原创 | 结构熵理论及其应用（四）

数据派THU · 公众号 · 大数据 · 2025-01-18 17:00

正文

作者：王佳鑫
本文约7500字，建议阅读10+分钟
本文我们将重点讲述结构熵在不同网络类型、图神经网络、强化学习，以及其他医疗、交通领域的应用。

在上一篇《结构熵理论及其应用（三）》中我们详细讲述了结构熵在无监督/半监督聚类/社区检测、社交机器人检测、角色识别等领域。本文我们将重点讲述结构熵在不同网络类型、图神经网络、强化学习，以及其他医疗、交通领域的应用。

Table of Contents

四、基于不同类型网络改进的结构熵

1.《Multi-Relational Structural Entropy》

2.《Incremental Measurement of Structural Entropy for Dynamic Graphs》

五、与图神经网络、强化学习的结合

1.《USER: Unsupervised Structural Entropy-Based Robust Graph Neural Network》

2.《Minimum Entropy Principle Guided Graph Neural Networks》

3.《LSEnet: Lorentz Structural Entropy Neural Network for Deep Graph Clustering》

4.《SE-GSL: A General and Effective Graph Structure Learning Framework through Structural Entropy Optimization》

5.《Effective Reinforcement Learning Based on Structural Information Principles》

六、其他在医疗、交通领域的应用

1.《Hierarchical State Abstraction Based on Structural Information Principles》

2.《Multispans: A multi-range spatial-temporal transformer network for traffic forecast via structural entropy optimization》

3.《Unsupervised skin lesion segmentation via structural entropy minimization on multi-scale superpixel graphs》

四、基于不同类型网络改进的结构熵

1.《Multi-Relational Structural Entropy》

Cao Y, Peng H, Li A, et al. Multi-Relational Structural Entropy[C]//The 40th Conference on Uncertainty in Artificial Intelligence.

Abstract: Structural Entropy (SE) measures the structural information contained in a graph. Minimizing or maximizing SE helps to reveal or obscure the intrinsic structural patterns underlying graphs in an interpretable manner, finding applications in various tasks driven by networked data. However, SE ignores the heterogeneity inherent in the graph relations, which is ubiquitous in modern networks. In this work, we extend SE to consider heterogeneous relations and propose the first metric for multi-relational graph structural information, namely, multi-relational structural entropy (MrSE). To this end, we first cast SE through the novel lens of the stationary distribution from random surfing, which readily extends to multi-relational networks by considering the choices of both nodes and relation types simultaneously at each step. The resulting MrSE is then optimized by a new greedy algorithm to reveal the essential structures within a multi-relational network. Experimental results highlight that the proposed MrSE offers a more insightful interpretation of the structure of multi-relational graphs compared to SE. Additionally, it enhances the performance of two tasks that involve real-world multi-relational graphs, including node clustering and social event detection.

结构熵测量图中包含的结构信息。最小化或最大化 SE 有助于以可解释的方式揭示或掩盖图背后的内在结构模式。然而，SE 忽略了图关系固有的异质性，SE 假设节点之间只存在单一类型的关系（还不能简单将多关系网络处理成单关系图）。本文扩展SE以考虑异构关系，并提出多关系图结构信息的第一个度量，即多关系结构熵(MrSE)：

首先通过随机游走的平稳分布的新颖视角来投射SE，通过在每一步同时考虑节点和关系类型的选择扩展到多关系网络。
基于随机游走的SE解释：

从多关系随机游走到MrSE：

然后通过新的贪婪算法对生成的 MrSE 进行优化，以揭示多关系网络内的基本结构：

MERGE操作定义：

合并操作MERGE(αo1, αo2) 将编码树中的两个非根节点 αo1 和 αo2 移除，并添加一个新节点 αn。αn 的子节点是 αo1 和 αo2 的子节点的组合。

算法步骤：

初始情况下，每个节点被分配到其自己的簇。
在每一步，贪心地合并编码树中能导致最大|ΔMrSE|的两个节点，直到没有进一步的合并可以导致 ΔMrSE < 0。

输出与最小2D MrSE相关的优化编码树 T。
实验结果表明，与 SE 相比，MrSE 为多关系图的结构提供了更深入的解释。
此外还增强了节点聚类和社交事件检测两项任务的性能。

2.《Incremental Measurement of Structural Entropy for Dynamic Graphs》

Yang R, Peng H, Liu C, et al. Incremental measurement of structural entropy for dynamic graphs[J]. Artificial Intelligence, 2024: 104175.

Abstract: Structural entropy is a metric that measures the amount of information embedded in graph structure data under a strategy of hierarchical abstracting. To measure the structural entropy of a dynamic graph, we need to decode the optimal encoding tree corresponding to the best community partitioning for each snapshot. However, the current methods do not support dynamic encoding tree updating and incremental structural entropy computation. To address this issue, we propose Incre-2dSE, a novel incremental measurement framework that dynamically adjusts the community partitioning and efficiently computes the updated structural entropy for each updated graph. Specifically, Incre-2dSE includes incremental algorithms based on two dynamic adjustment strategies for two-dimensional encoding trees, i.e., the naive adjustment strategy and the node-shifting adjustment strategy, which support theoretical analysis of updated structural entropy and incrementally optimize community partitioning towards a lower structural entropy. We conduct extensive experiments on 3 artificial datasets generated by Hawkes Process and 3 real-world datasets. Experimental results confirm that our incremental algorithms effectively capture the dynamic evolution of the communities, reduce time consumption, and provide great interpretability.

结构熵是一种度量在分层抽象策略下图结构数据中嵌入的信息量的度量。为了测量动态图的结构熵，我们需要解码对应于每个快照的最佳社区划分的最优编码树。然而，目前的方法不支持动态编码树更新和增量结构熵计算。
本文提出了一种新的增量度量框架——增量度量框架(Increment-2dse)，它可以动态调整群落划分，并有效地计算每个更新图的更新结构熵。其中，Increment-2dse包括基于两种二维编码树动态调整策略的增量算法，即朴素调整策略和节点移位调整策略，支持更新结构熵的理论分析和向更低结构熵的增量优化群落划分。

挑战一：每个更新的图都需要重建编码树会导致巨大的时间消耗——为了解决这个问题，我们提出了两种二维编码树动态调整策略，即朴素调整策略和节点移位调整策略。前一种策略保持了旧的社区划分，并支持理论结构熵分析，而后者基于结构熵最小化原理，通过在社区之间移动节点来动态调整社区划分。
挑战二：传统定义的结构熵计算的时间复杂度很高。重要统计量从原始图到更新图的变化，然后通过新设计的增量公式计算更新后的结构熵。
将增量方法推广到无向、有向加权图。
动态图定义：

Naive Adjustment Strategy用于在图结构发生变化时，快速调整社区划分并重新计算结构熵。该策略主要思想是通过简单地更新受增量影响的部分图结构，避免对整个图进行重新计算，从而提高计算效率。（仅限二维编码树k=2）。

朴素调整策略的局限性：首先该策略无法同时处理多个增量边，例如出现一个新节点，与两个不同的现有节点连接。另一种解决方案是将某个时间戳上的所有增量边排列成一个序列，通过该序列我们可以将边逐一添加，同时保持图的连通性。这样，新引入节点的社区必然与增量边的输入顺序相关。其次，它无法处理边缘或节点删除。第三，现有节点的社区保持不变，这在大多数情况下是次优的。之后就需要节点移位调整策略了。

节点移位调整策略的核心思想是：当图结构发生变化时，通过移动节点并重新分配其连接的边来最小化图的结构熵。该策略通过对受影响的节点进行移动，动态调整社区划分，以实现更好的优化效果。

‍节点移动调整策略的局限性：首先很难给出全局不变量和更新的结构熵之间差距的界限以供进一步的理论分析。其次，移点调整策略在某些情况下可能不收敛（图7给出了例子），这迫使我们设置最大迭代次数。

相似之处：

（1）两种动态调整策略都是为了增量改变原有的二维编码树，以适应动态场景下增量的边和节点。

（2）两种策略计算更新结构熵的时间复杂度均显着低于原始计算公式（详细分析见4.3节）。

（3）两种策略都无法处理新社区的诞生和现有社区的解散。

区别：

（1）两种策略的侧重点不同。朴素调整策略强调理论分析，例如有界性和收敛性分析，并充当实验评估中的快速增量基线。相比之下，节点转移调整策略主要侧重于解决朴素策略（第3.1.5节）的局限性以及现有社区向较低结构熵的动态优化。

（2）这两种策略更新编码树或更新社区划分的方式不同。在朴素调整策略中，新的边不会改变现有节点的社区，新节点被分配给直接邻居的社区。而在节点转移调整策略中，考虑了新边对社区调整的影响，新节点的社区也由增量边决定。

（3）朴素调整策略的时间复杂度是固定的，而节点平移策略的时间复杂度随迭代次数𝑁几乎线性增长。实验表明，在大多数情况下，朴素策略比节点转移策略更快，且 𝑁 ≥ 5（图 13）。

对霍克斯过程生成的3个人工数据集和3个真实数据集进行了广泛的实验。实验结果证明，我们的增量算法有效地捕捉了社区的动态演变，减少了时间消耗，并提供了良好的可解释性。

五、与图神经网络、强化学习的结合

1.《USER: Unsupervised Structural Entropy-Based Robust Graph Neural Network》

Wang Y, Wang Y, Zhang Z, et al. User: Unsupervised structural entropy-based robust graph neural network[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2023, 37(8): 10235-10243.

Abstract: Unsupervised/self-supervised graph neural networks (GNN) are susceptible to the inherent randomness in the input graph data, which adversely affects the model’s performance in downstream tasks. In this paper, we propose USER, an unsupervised and robust version of GNN based on structural entropy, to alleviate the interference of graph perturbations and learn appropriate representations of nodes without label information. To mitigate the effects of undesirable perturbations, we analyze the property of intrinsic connectivity and define the intrinsic connectivity graph. We also identify the rank of the adjacency matrix as a crucial factor in revealing a graph that provides the same embeddings as the intrinsic connectivity graph. To capture such a graph, we introduce structural entropy in the objective function. Extensive experiments conducted on clustering and link prediction tasks under random perturbation and meta-attack over three datasets show that USER outperforms benchmarks and is robust to heavier perturbations.

无监督/自监督图神经网络（GNN）容易受到输入图数据固有随机性的影响，这会对模型在下游任务中的性能产生不利影响。
作者提出了两个主要挑战：‍

如何定义一个操作性标准来缓解图随机性的干扰。
如何开发一种无监督方法来学习能够生成这样图的模型。

本文提出了 USER，一种基于结构熵的无监督鲁棒GNN，以减轻图扰动的干扰（节点）并在没有标签信息的情况下学习节点的适当表示：

为了减轻不良扰动的影响，分析了内在连通性的属性并定义了内在连通性图，还将邻接矩阵的秩确定为揭示提供与内在连接图相同的嵌入的图的关键因素
寻找无害图的问题被转化为最小化LN的优化问题，这个问题同时考虑了图结构（通过NPSI）和节点特征（通过DBI）。为了捕获这样的图，在目标函数中引入结构熵。
NPSI (Network Partition Structural Information)：NPSI是一种结构熵度量，用于捕获图结构中的内在信息。它基于图的分区来定义，反映了图的社区结构。NPSI的计算涉及节点分区、边的分布和信息熵的概念。在论文中，NPSI被表示为矩阵形式，便于在GNN模型中使用。最小化NPSI有助于学习满足无害图必要条件的邻接矩阵。

DBI (Davies-Bouldin Index)：DBI是一种广泛使用的聚类评估指标。在这篇论文中，DBI被用来分析同一分区内节点特征的相似性。DBI考虑了簇内的紧密度和簇间的分离度。较低的DBI值表示更好的聚类结果，即同一分区内的节点特征更相似。在USER框架中，DBI被用来满足假设1中的组级特征平滑性。
这两个指标的结合使得USER框架能够同时考虑图的结构信息（通过NPSI）和节点特征信息（通过DBI）。通过最小化包含这两个组成部分的目标函数，USER能够学习到既保留图结构特性又考虑节点特征相似性的无害图，从而提高GNN模型的鲁棒性。

在三个数据集上的随机扰动和元攻击下的聚类和链接预测任务上进行的大量实验表明，USER 的性能优于基准，并且对较重的扰动具有鲁棒性。

2.《Minimum Entropy Principle Guided Graph Neural Networks》

Yang Z, Zhang G, Wu J, et al. Minimum entropy principle guided graph neural networks[C]//Proceedings of the sixteenth ACM international conference on web search and data mining. 2023: 114-122.

Abstract: Graph neural networks (GNNs) are now the mainstream method for mining graph-structured data and learning low-dimensional node- and graph-level embeddings to serve downstream tasks. However, limited by the bottleneck of interpretability that deep neural networks present, existing GNNs have ignored the issue of estimating the appropriate number of dimensions for the embeddings. Hence, we propose a novel framework called Minimum Graph Entropy principle-guided Dimension Estimation, i.e. MGEDE, that learns the appropriate embedding dimensions for both node and graph representations. In terms of node-level estimation, a minimum entropy function that counts both structure and attribute entropy, appraises the appropriate number of dimensions. In terms of graph-level estimation, each graph is assigned a customized embedding dimension from a candidate set based on the number of dimensions estimated for the node-level embeddings. Comprehensive experiments with node and graph classification tasks and nine benchmark datasets verify the effectiveness and generalizability of MGEDE.

图神经网络是目前挖掘图结构数据和学习低维节点和图级嵌入以服务于下游任务的主流方法。然而，受深度神经网络存在的可解释性瓶颈的限制，现有的gnn忽略了估计嵌入的适当维数的问题。
目前GNN中估计合适嵌入维度这个关键参数需要手动完成，面临两大挑战：

现有理论研究忽视了如何估计合适的嵌入维度，实践中通常通过枚举搜索来选择，效率低下。
不同图可能需要不同的合适嵌入维度，但现有图级GNN忽视了这种多样性，对所有图使用统一的嵌入维度。

提出了一个新的框架，称为最小图熵原理指导的维度估计，即MGEDE，它为节点和图表示学习适当的嵌入维数。

在节点级估计方面，计算结构和属性熵的最小熵函数评估适当数量的维度。

The appropriate node representation dimension 𝑑 is the one that makes 𝐻𝐺 = 0.

在图级估计方面，基于估计的节点级嵌入维数，从候选集中为每个图分配自定义的嵌入维数。

节点和图分类任务和9个基准数据集的综合实验验证MGEDE的有效性和泛化性。

3.《LSEnet: Lorentz Structural Entropy Neural Network for Deep Graph Clustering》

Sun L, Huang Z, Peng H, et al. LSEnet: Lorentz Structural Entropy Neural Network for Deep Graph Clustering[C]//Forty-first International Conference on Machine Learning.

Abstract: Graph clustering is a fundamental problem in machine learning. Deep learning methods achieve the state-of-the-art results in recent years, but they still cannot work without predefined cluster numbers. Such limitation motivates us to pose a more challenging problem of graph clustering with unknown cluster numbers. We propose to address this problem from a fresh perspective of graph information theory (i.e., structural information). In the literature, structural information has not yet been introduced to deep clustering, and its classic definition falls short of discrete formulation and modeling node features. In this work, we first formulate a differentiable structural information (DSI) in the continuous realm, accompanied by several theoretical results. By minimizing DSI, we construct the optimal partitioning tree where densely connected nodes in the graph tend to have the same assignment, revealing the cluster structure. DSI is also theoretically presented as a new graph clustering objective, not requiring the pre-defined cluster number. Furthermore, we design a neural LSEnet in the Lorentz model of hyper-bolic space, where we integrate node features to structural information via manifold-valued graph convolution. Extensive empirical results on real graphs show the superiority of our approach.

图聚类是机器学习中的一个基本问题。近年来，深度学习方法取得了最先进的成果，但如果没有预定义的聚类数，它们仍然无法工作。肘部或贝叶斯信息准则等经验方法需要重复训练深度模型，计算成本过于昂贵。这种局限性促使我们提出了一个具有未知簇数的图聚类问题。
本文从图信息理论（即结构信息）的新视角来解决这个问题。在文献中，结构信息尚未被引入到深度聚类中，其经典定义缺乏离散化表述和建模节点特征【Second, the discrete formulation prevents the gradient backpropagation, posing a fundamental challenge to train a deep model. Third, the classic definition neglects the node features, which are often equally important to graph clustering.】
首先在连续领域中提出了一个可微结构信息(Differentiable Structural Information，DSI)，并给出了几个理论结果。通过最小化DSI，我们构建了最优分区树，其中图中密集连接的节点往往具有相同的分配，从而揭示了聚类结构。

从理论上讲，DSI是一种新的图聚类目标，不需要预先定义的聚类数。
DSI 是在连续领域中提出的一种新型结构信息的表达方式。传统的结构信息在离散形式下进行定义，难以进行微分和连续优化。DSI 的引入使得结构信息可以在连续域内进行优化和建模，从而更好地适应于深度学习模型中。

连续形式的定义：将结构信息定义为在连续空间中可微分的函数，使得可以利用优化技术来最小化或最大化这种信息。
最优划分树构建：通过最小化DSI，构建了一棵最优划分树，该树能够揭示图中密集连接的节点之间的聚类结构，而无需预先知道聚类的数量。

此外，我们在双曲空间的Lorentz模型中设计了一个神经LSEnet，通过流形值图卷积将节点特征整合到结构信息中。

在图聚类问题中，传统的欧几里得空间和欧几里得距离常常无法充分捕捉图的复杂结构和节点之间的非线性关系。为了解决这一问题，作者引入了洛伦兹超球面模型，该模型在超几何空间中能更好地表达节点之间的关系。
洛伦兹超球面模型是一个具有曲率的超几何空间，其在深度学习中的应用能够有效地处理非欧几里得结构数据。在这个模型中，节点被映射到一个超球面上，节点之间的相似度通过超球面上的测地线距离进行度量，而非传统的欧氏距离。
LSEnet 是基于洛伦兹超球面模型设计的神经网络架构，用于图聚类任务。其主要组成部分包括：

Manifold-Valued Graph Convolution: 流形值图卷积层，用于在洛伦兹超球面模型中处理图结构数据。这些卷积层能够在超球面上有效地捕捉节点特征之间的复杂关系。
Integration of Node Features and Structural Information: 将节点特征与结构信息整合在一起，通过洛伦兹超球面模型的流形值图卷积，提高了对节点之间非线性关系的建模能力。

在实际图上的大量实证结果表明了我们方法的优越性。

4.《SE-GSL: A General and Effective Graph Structure Learning Framework through Structural Entropy Optimization》

Zou D, Peng H, Huang X, et al. Se-gsl: A general and effective graph structure learning framework through structural entropy optimization[C]//Proceedings of the ACM Web Conference 2023. 2023: 499-510.

Abstract: Graph Neural Networks (GNNs) are de facto solutions to structural data learning. However, it is susceptible to low-quality and unreliable structure, which has been a norm rather than an exception in real-world graphs. Existing graph structure learning (GSL) frameworks still lack robustness and interpretability. This paper proposes a general GSL framework, SE-GSL, through structural entropy and the graph hierarchy abstracted in the encoding tree. Particularly, we exploit the one-dimensional structural entropy to maximize embedded information content when auxiliary neighborhood attributes are fused to enhance the original graph. A new scheme of constructing optimal encoding trees is proposed to minimize the uncertainty and noises in the graph whilst assuring proper community partition in hierarchical abstraction. We present a novel sample-based mechanism for restoring the graph structure via node structural entropy distribution. It increases the connectivity among nodes with larger uncertainty in lower-level communities. SE-GSL is compatible with various GNN models and enhances the robustness towards noisy and heterophily structures. Extensive experiments show significant improvements in the effectiveness and robustness of structure learning and node representation learning.

图神经网络是结构化数据学习的实际解决方案。然而，它容易受到低质量和不可靠结构的影响【GNN很容易受到攻击，因为原始图拓扑与节点特征解耦，攻击者可以很容易地在完全不同的节点之间制造链接】，这在现实世界的图形中是一种常态而不是例外。现有的图结构学习框架仍然缺乏鲁棒性和可解释性【如何融合节点和拓扑特征】。
本文通过结构熵和编码树【编码树表示将图多粒度划分为分层社区和子社区，从而提供了更好的可解释性途径】中抽象的图层次，第一次提出了通用的GSL框架SE-GSL。特别是在融合辅助邻域属性增强原始图的同时，利用一维结构熵最大化嵌入的信息量。
提出了一种构造最优编码树的新方案，以最大限度地减少图中的不确定性和噪声，同时在分层抽象中保证适当的社团划分。
我们提出了一种新的基于样本的机制，通过节点结构熵分布来恢复图结构。它增加了低层社区中不确定性较大的节点之间的连通性。
SE-GSL兼容多种GNN模型，增强了对噪声和异质性结构的鲁棒性。大量的实验表明，结构学习和节点表示学习的有效性和鲁棒性都有显著提高。

使用Pearson相关系数基于相似性的边重新加权机制【将拓扑信息与顶点属性和邻近度相结合】。在一维结构熵最大化的指导下，利用 kNN图结构化辅助边缘信息，筛选出最合适的 k，目标是最大化结构熵 H1(G)以指导 k的选择，确保增强图包含最优结构信息。

5.《Effective Reinforcement Learning Based on Structural Information Principles》

Zeng X, Peng H, Su D, et al. Effective Reinforcement Learning Based on Structural Information Principles[J]. arXiv preprint arXiv:2404.09760, 2024.

Abstract: Although Reinforcement Learning (RL) algorithms acquire sequential behavioral patterns through interactions with the environment, their effectiveness in noisy and high-dimensional scenarios typically relies on specific structural priors. In this paper, we propose a novel and general Structural Information principles-based framework for effective Decision-Making, namely SIDM, approached from an information-theoretic perspective. This paper presents a specific unsupervised partitioning method that forms vertex communities in the state and action spaces based on their feature similarities. An aggregation function, which utilizes structural entropy as the vertex weight, is devised within each community to obtain its embedding, thereby facilitating hierarchical state and action abstractions. By extracting abstract elements from historical trajectories, a directed, weighted, homogeneous transition graph is constructed. The minimization of this graph’s high-dimensional entropy leads to the generation of an optimal encoding tree. An innovative two-layer skill-based learning mechanism is introduced to compute the common path entropy of each state transition as its identified probability, thereby obviating the requirement for expert knowledge. Moreover, SIDM can be flexibly incorporated into various single-agent and multi-agent RL algorithms, enhancing their performance. Finally, extensive evaluations on challenging benchmarks demonstrate that, compared with SOTA baselines, our framework significantly and consistently improves the policy’s quality, stability, and efficiency up to 32.70%, 88.26%, and 64.86%, respectively.

六、其他在医疗、交通领域的应用

1.《Hierarchical State Abstraction Based on Structural Information Principles》

Zeng X, Peng H, Li A, et al. Hierarchical state abstraction based on structural information principles[C]//Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. 2023: 4549-4557.

Abstract: State abstraction optimizes decision-making by ignoring irrelevant environmental information in reinforcement learning with rich observations. Nevertheless, recent approaches focus on adequate representational capacities resulting in essential information loss, affecting their performances on challenging tasks. In this article, we propose a novel mathematical Structural Information principles-based State Abstraction framework, namely SISA, from the information-theoretic perspective. Specifically, an unsupervised, adaptive hierarchical state clustering method without requiring manual assistance is presented, and meanwhile, an optimal encoding tree is generated. On each non-root tree node, a new aggregation function and condition structural entropy are designed to achieve hierarchical state abstraction and compensate for sampling-induced essential information loss in state abstraction. Empirical evaluations on a visual grid world domain and six continuous control benchmarks demonstrate that, compared with five SOTA state abstraction approaches, SISA significantly improves mean episode reward and sample efficiency up to 18.98 and 44.44%, respectively. Besides, we experimentally show that SISA is a general framework that can be flexibly integrated with different representation-learning objectives to improve their performances further.

2. 《Multispans: A multi-range spatial-temporal transformer network for traffic forecast via structural entropy optimization》

Zou D, Wang S, Li X, et al. Multispans: A multi-range spatial-temporal transformer network for traffic forecast via structural entropy optimization[C]//Proceedings of the 17th ACM International Conference on Web Search and Data Mining. 2024: 1032-1041.

Abstract: Traffic forecasting is a complex multivariate time-series regression task of paramount importance for traffic management and planning. However, existing approaches often struggle to model complex multi-range dependencies using local spatiotemporal features and road network hierarchical knowledge. To address this, we propose MultiSPANS. First, considering that an individual recording point cannot reflect critical spatiotemporal local patterns, we design multi-filter convolution modules for generating informative ST-token embeddings to facilitate attention computation. Then, based on ST-token and spatial-temporal position encoding, we employ the Transformers to capture long-range temporal and spatial dependencies. Furthermore, we introduce structural entropy theory to optimize the spatial attention mechanism. Specifically, The structural entropy minimization algorithm is used to generate optimal road network hierarchies, i.e., encoding trees. Based on this, we propose a relative structural entropy-based position encoding and a multi-head attention masking scheme based on multi-layer encoding trees. Extensive experiments demonstrate the superiority of the presented framework over several state-of-the-art methods in real-world traffic datasets, and the longer historical windows are effectively utilized. The code is available at https://github.com/SELGroup/MultiSPANS.

3.《Unsupervised skin lesion segmentation via structural entropy minimization on multi-scale superpixel graphs》

Zeng G, Peng H, Li A, et al. Unsupervised skin lesion segmentation via structural entropy minimization on multi-scale superpixel graphs[C]//2023 IEEE International Conference on Data Mining (ICDM). IEEE, 2023: 768-777.

Abstract: Skin lesion segmentation is a fundamental task in dermoscopic image analysis. The complex features of pixels in the lesion region impede the lesion segmentation accuracy, and existing deep learning-based methods often lack interpretability to this problem. In this work, we propose a novel unsupervised Skin Lesion segmentation framework based on structural entropy and isolation forest outlier Detection, namely SLED. Specifically, skin lesions are segmented by minimizing the structural entropy of a superpixel graph constructed from the dermoscopic image. Then, we characterize the consistency of healthy skin features and devise a novel multi-scale segmentation mechanism by outlier detection, which enhances the segmentation accuracy by leveraging the superpixel features from multiple scales. We conduct experiments on four skin lesion benchmarks and compare SLED with nine representative unsupervised segmentation methods. Experimental results demonstrate the superiority of the proposed framework. Additionally, some case studies are analyzed to demonstrate the effectiveness of SLED.

编辑：黄继彦

作者简介

王佳鑫，南京大学信息管理学院博士生。

数据派研究部介绍

数据派研究部成立于2017年初，以兴趣为核心划分多个组别，各组既遵循研究部整体的知识分享和实践项目规划，又各具特色：

算法模型组：积极组队参加kaggle等比赛，原创手把手教系列文章；

调研分析组：通过专访等方式调研大数据的应用，探索数据产品之美；

系统平台组：追踪大数据&人工智能系统平台技术前沿，对话专家；

自然语言处理组：重于实践，积极参加比赛及策划各类文本分析项目；

制造业大数据组：秉工业强国之梦，产学研政结合，挖掘数据价值；

数据可视化组：将信息与艺术融合，探索数据之美，学用可视化讲故事；

网络爬虫组：爬取网络信息，配合其他各组开发创意项目。

点击文末“阅读原文”，报名数据派研究部志愿者，总有一组适合你~

转载须知

如需转载，请在开篇显著位置注明作者和出处（转自：数据派THUID：DatapiTHU），并在文章结尾放置数据派醒目二维码。有原创标识文章，请发送【文章名称-待授权公众号名称及ID】至联系邮箱，申请白名单授权并按要求编辑。

未经许可的转载以及改编者，我们将依法追究其法律责任。

关于我们

数据派THU作为数据科学类公众号，背靠清华大学大数据研究中心，分享前沿数据科学与大数据技术创新研究动态、持续传播数据科学知识，努力建设数据人才聚集平台、打造中国大数据最强集团军。

新浪微博：@数据派THU

微信视频号：数据派THU

今日头条：数据派THU

点击“阅读原文”拥抱组织

原创 | 结构熵理论及其应用（四）

正文

四、基于不同类型网络改进的结构熵

1.《Multi-Relational Structural Entropy》

Cao Y, Peng H, Li A, et al. Multi-Relational Structural Entropy[C]//The 40th Conference on Uncertainty in Artificial Intelligence.

首先通过随机游走的平稳分布的新颖视角来投射SE，通过在每一步同时考虑节点和关系类型的选择扩展到多关系网络。

基于随机游走的SE解释：

从多关系随机游走到MrSE：

然后通过新的贪婪算法对生成的 MrSE 进行优化，以揭示多关系网络内的基本结构：

MERGE操作定义：

合并操作MERGE(αo1, αo2) 将编码树中的两个非根节点 αo1 和 αo2 移除，并添加一个新节点 αn。αn 的子节点是 αo1 和 αo2 的子节点的组合。

算法步骤：

初始情况下，每个节点被分配到其自己的簇。

在每一步，贪心地合并编码树中能导致最大|ΔMrSE|的两个节点，直到没有进一步的合并可以导致 ΔMrSE < 0。

输出与最小2D MrSE相关的优化编码树 T。

实验结果表明，与 SE 相比，MrSE 为多关系图的结构提供了更深入的解释。

此外还增强了节点聚类和社交事件检测两项任务的性能。

2.《Incremental Measurement of Structural Entropy for Dynamic Graphs》

Yang R, Peng H, Liu C, et al. Incremental measurement of structural entropy for dynamic graphs[J]. Artificial Intelligence, 2024: 104175.

结构熵是一种度量在分层抽象策略下图结构数据中嵌入的信息量的度量。为了测量动态图的结构熵，我们需要解码对应于每个快照的最佳社区划分的最优编码树。然而，目前的方法不支持动态编码树更新和增量结构熵计算。

挑战二：传统定义的结构熵计算的时间复杂度很高。重要统计量从原始图到更新图的变化，然后通过新设计的增量公式计算更新后的结构熵。

将增量方法推广到无向、有向加权图。

动态图定义：

Naive Adjustment Strategy用于在图结构发生变化时，快速调整社区划分并重新计算结构熵。该策略主要思想是通过简单地更新受增量影响的部分图结构，避免对整个图进行重新计算，从而提高计算效率。（仅限二维编码树k=2）。

相似之处：

（1）两种动态调整策略都是为了增量改变原有的二维编码树，以适应动态场景下增量的边和节点。

（2）两种策略计算更新结构熵的时间复杂度均显着低于原始计算公式（详细分析见4.3节）。

（3）两种策略都无法处理新社区的诞生和现有社区的解散。

区别：

（3）朴素调整策略的时间复杂度是固定的，而节点平移策略的时间复杂度随迭代次数𝑁几乎线性增长。实验表明，在大多数情况下，朴素策略比节点转移策略更快，且 𝑁 ≥ 5（图 13）。

对霍克斯过程生成的3个人工数据集和3个真实数据集进行了广泛的实验。实验结果证明，我们的增量算法有效地捕捉了社区的动态演变，减少了时间消耗，并提供了良好的可解释性。

五、与图神经网络、强化学习的结合

1.《USER: Unsupervised Structural Entropy-Based Robust Graph Neural Network》

Wang Y, Wang Y, Zhang Z, et al. User: Unsupervised structural entropy-based robust graph neural network[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2023, 37(8): 10235-10243.

无监督/自监督图神经网络（GNN）容易受到输入图数据固有随机性的影响，这会对模型在下游任务中的性能产生不利影响。

作者提出了两个主要挑战：‍

如何定义一个操作性标准来缓解图随机性的干扰。

如何开发一种无监督方法来学习能够生成这样图的模型。

本文提出了 USER，一种基于结构熵的无监督鲁棒GNN，以减轻图扰动的干扰（节点）并在没有标签信息的情况下学习节点的适当表示：

为了减轻不良扰动的影响，分析了内在连通性的属性并定义了内在连通性图，还将邻接矩阵的秩确定为揭示提供与内在连接图相同的嵌入的图的关键因素

寻找无害图的问题被转化为最小化LN的优化问题，这个问题同时考虑了图结构（通过NPSI）和节点特征（通过DBI）。为了捕获这样的图，在目标函数中引入结构熵。

在三个数据集上的随机扰动和元攻击下的聚类和链接预测任务上进行的大量实验表明，USER 的性能优于基准，并且对较重的扰动具有鲁棒性。

2.《Minimum Entropy Principle Guided Graph Neural Networks》

Yang Z, Zhang G, Wu J, et al. Minimum entropy principle guided graph neural networks[C]//Proceedings of the sixteenth ACM international conference on web search and data mining. 2023: 114-122.

图神经网络是目前挖掘图结构数据和学习低维节点和图级嵌入以服务于下游任务的主流方法。然而，受深度神经网络存在的可解释性瓶颈的限制，现有的gnn忽略了估计嵌入的适当维数的问题。

目前GNN中估计合适嵌入维度这个关键参数需要手动完成，面临两大挑战：

现有理论研究忽视了如何估计合适的嵌入维度，实践中通常通过枚举搜索来选择，效率低下。

不同图可能需要不同的合适嵌入维度，但现有图级GNN忽视了这种多样性，对所有图使用统一的嵌入维度。

提出了一个新的框架，称为最小图熵原理指导的维度估计，即MGEDE，它为节点和图表示学习适当的嵌入维数。

在节点级估计方面，计算结构和属性熵的最小熵函数评估适当数量的维度。

The appropriate node representation dimension 𝑑 is the one that makes 𝐻𝐺 = 0.

在图级估计方面，基于估计的节点级嵌入维数，从候选集中为每个图分配自定义的嵌入维数。

节点和图分类任务和9个基准数据集的综合实验验证MGEDE的有效性和泛化性。

3.《LSEnet: Lorentz Structural Entropy Neural Network for Deep Graph Clustering》

Sun L, Huang Z, Peng H, et al. LSEnet: Lorentz Structural Entropy Neural Network for Deep Graph Clustering[C]//Forty-first International Conference on Machine Learning.

首先在连续领域中提出了一个可微结构信息(Differentiable Structural Information，DSI)，并给出了几个理论结果。通过最小化DSI，我们构建了最优分区树，其中图中密集连接的节点往往具有相同的分配，从而揭示了聚类结构。

从理论上讲，DSI是一种新的图聚类目标，不需要预先定义的聚类数。

DSI 是在连续领域中提出的一种新型结构信息的表达方式。传统的结构信息在离散形式下进行定义，难以进行微分和连续优化。DSI 的引入使得结构信息可以在连续域内进行优化和建模，从而更好地适应于深度学习模型中。

连续形式的定义：将结构信息定义为在连续空间中可微分的函数，使得可以利用优化技术来最小化或最大化这种信息。

最优划分树构建：通过最小化DSI，构建了一棵最优划分树，该树能够揭示图中密集连接的节点之间的聚类结构，而无需预先知道聚类的数量。

此外，我们在双曲空间的Lorentz模型中设计了一个神经LSEnet，通过流形值图卷积将节点特征整合到结构信息中。

在图聚类问题中，传统的欧几里得空间和欧几里得距离常常无法充分捕捉图的复杂结构和节点之间的非线性关系。为了解决这一问题，作者引入了洛伦兹超球面模型，该模型在超几何空间中能更好地表达节点之间的关系。

LSEnet 是基于洛伦兹超球面模型设计的神经网络架构，用于图聚类任务。其主要组成部分包括：

Manifold-Valued Graph Convolution: 流形值图卷积层，用于在洛伦兹超球面模型中处理图结构数据。这些卷积层能够在超球面上有效地捕捉节点特征之间的复杂关系。

Integration of Node Features and Structural Information: 将节点特征与结构信息整合在一起，通过洛伦兹超球面模型的流形值图卷积，提高了对节点之间非线性关系的建模能力。

在实际图上的大量实证结果表明了我们方法的优越性。

4.《SE-GSL: A General and Effective Graph Structure Learning Framework through Structural Entropy Optimization》

Zou D, Peng H, Huang X, et al. Se-gsl: A general and effective graph structure learning framework through structural entropy optimization[C]//Proceedings of the ACM Web Conference 2023. 2023: 499-510.

提出了一种构造最优编码树的新方案，以最大限度地减少图中的不确定性和噪声，同时在分层抽象中保证适当的社团划分。

我们提出了一种新的基于样本的机制，通过节点结构熵分布来恢复图结构。它增加了低层社区中不确定性较大的节点之间的连通性。

SE-GSL兼容多种GNN模型，增强了对噪声和异质性结构的鲁棒性。大量的实验表明，结构学习和节点表示学习的有效性和鲁棒性都有显著提高。

5.《Effective Reinforcement Learning Based on Structural Information Principles》

Zeng X, Peng H, Su D, et al. Effective Reinforcement Learning Based on Structural Information Principles[J]. arXiv preprint arXiv:2404.09760, 2024.

六、其他在医疗、交通领域的应用

1.《Hierarchical State Abstraction Based on Structural Information Principles》

Zeng X, Peng H, Li A, et al. Hierarchical state abstraction based on structural information principles[C]//Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. 2023: 4549-4557.

2. 《Multispans: A multi-range spatial-temporal transformer network for traffic forecast via structural entropy optimization》

Zou D, Wang S, Li X, et al. Multispans: A multi-range spatial-temporal transformer network for traffic forecast via structural entropy optimization[C]//Proceedings of the 17th ACM International Conference on Web Search and Data Mining. 2024: 1032-1041.

3.《Unsupervised skin lesion segmentation via structural entropy minimization on multi-scale superpixel graphs》

Zeng G, Peng H, Li A, et al. Unsupervised skin lesion segmentation via structural entropy minimization on multi-scale superpixel graphs[C]//2023 IEEE International Conference on Data Mining (ICDM). IEEE, 2023: 768-777.