DuConTE: Dual-Granularity Text Encoder with Topology-Constrained Attention for Text-attributed Graphs

arXiv cs.CL / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces DuConTE, a dual-granularity text encoder designed for text-attributed graphs that combine node text semantics with graph topology.
  • DuConTE uses a cascaded architecture of two pretrained language models to encode semantics first at the word-token level and then at the node level.
  • It introduces topology-constrained attention by dynamically adjusting each LM’s attention mask according to node connectivity, so semantic interactions reflect structural dependencies.
  • When aggregating token embeddings into node representations, the method separately assesses token importance under both center-node and neighborhood contexts to improve context relevance.
  • Experiments on multiple benchmarks show DuConTE achieving state-of-the-art results on most datasets for text-attributed graph tasks.

Abstract

Text-attributed graphs integrate semantic information of node texts with topological structure, offering significant value in various applications such as document classification and information extraction. Existing approaches typically encode textual content using language models (LMs), followed by graph neural networks (GNNs) to process structural information. However, during the LM-based text encoding phase, most methods not only perform semantic interaction solely at the word-token granularity, but also neglect the structural dependencies among texts from different nodes. In this work, we propose DuConTE, a dual-granularity text encoder with topology-constrained attention. The model employs a cascaded architecture of two pretrained LMs, encoding semantics first at the word-token granularity and then at the node granularity. During the self-attention computation in each LM, we dynamically adjust the attention mask matrix based on node connectivity, guiding the model to learn semantic correlations informed by the graph structure. Furthermore, when composing node representations from word-token embeddings, we separately evaluate the importance of tokens under the center-node context and the neighborhood context, enabling the capture of more contextually relevant semantic information. Extensive experiments on multiple benchmark datasets demonstrate that DuConTE achieves state-of-the-art performance on the majority of them.