Generating Hierarchical JSON Representations of Scientific Sentences Using LLMs

arXiv cs.CL / 3/26/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper examines whether structured, hierarchical JSON representations can preserve the meaning of scientific sentences.
  • It fine-tunes a lightweight LLM with a novel structural loss function to generate hierarchical JSON from sentences sourced from scientific articles.
  • The generated hierarchical JSON is then used as input to a generative model to reconstruct the original scientific text.
  • Experiments compare original vs. reconstructed sentences using semantic and lexical similarity metrics, concluding that hierarchical formats retain scientific-text information effectively.

Abstract

This paper investigates whether structured representations can preserve the meaning of scientific sentences. To test this, a lightweight LLM is fine-tuned using a novel structural loss function to generate hierarchical JSON structures from sentences collected from scientific articles. These JSONs are then used by a generative model to reconstruct the original text. Comparing the original and reconstructed sentences using semantic and lexical similarity we show that hierarchical formats are capable of retaining information of scientific texts effectively.