Towards Linguistically-informed Representations for English as a Second or Foreign Language: Review, Construction and Application

arXiv cs.CL / 4/13/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that English as a Second or Foreign Language (ESFL) should be treated as a distinct linguistic system, not just a deviation from standard English, requiring specialized representations.
  • It surveys existing ESFL resources, highlights their limitations, and proposes a construction-based framework grounded in constructivist theory to model the syntax–semantics interface.
  • The approach aims to capture ESFL phenomena via syntactico-semantic mappings to English while preserving ESFL-specific characteristics.
  • The authors introduce a gold-standard syntactico-semantic dataset (“sembank”) containing 1,643 annotated ESFL sentences.
  • They validate the dataset’s utility with a pilot study related to the Linguistic Niche Hypothesis for Second Language Acquisition research.

Abstract

The widespread use of English as a Second or Foreign Language (ESFL) has sparked a paradigm shift: ESFL is not seen merely as a deviation from standard English but as a distinct linguistic system in its own right. This shift highlights the need for dedicated, knowledge-intensive representations of ESFL. In response, this paper surveys existing ESFL resources, identifies their limitations, and proposes a novel solution. Grounded in constructivist theories, the paper treats constructions as the fundamental units of analysis, allowing it to model the syntax--semantics interface of both ESFL and standard English. This design captures a wide range of ESFL phenomena by referring to syntactico-semantic mappings of English while preserving ESFL's unique characteristics, resulting a gold-standard syntactico-semantic resource comprising 1643 annotated ESFL sentences. To demonstrate the sembank's practical utility, we conduct a pilot study testing the Linguistic Niche Hypothesis, highlighting its potential as a valuable tool in Second Language Acquisition research.