Language ambiguity causes misunderstandings in political, commercial, and cultural contexts. We use translation behavior as a diagnostic to detect it automatically.
Ambiguous sentences are everywhere. They cause misunderstandings that range from everyday confusion to geopolitical conflict. The United Nations, operating across six official languages, faces this challenge at every session of every committee. Commercial contracts, medical instructions, and software specifications all suffer from the same fundamental problem: natural language is deeply, structurally ambiguous.
Automatic detection of ambiguous sentences is, despite appearances, an unsolved problem. Existing approaches require large annotated datasets or rely on narrow heuristics. Neither scales.
We propose using multilingual LLM translation behavior as a diagnostic tool. The key hypothesis is straightforward:
We investigate whether multilingual LLMs encode ambiguity in their internal representations or whether they collapse to a single meaning for ambiguous inputs. The answer determines both the feasibility of the approach and what it tells us about how LLMs handle meaning.
Create algorithms that determine whether a given text contains ambiguity, using cross-lingual consistency as the signal.
Improve existing multilingual translators' handling of ambiguous input — either by flagging ambiguity or by producing all valid interpretations.
Develop components for a global human language that is structurally free of ambiguity — a contribution to the GHLO project.
Language ambiguity causes misunderstandings and conflicts in:
The United Nations is identified as a major potential beneficiary of this work, given the complexity of its multilingual negotiation environment.
This project is a foundational component of the Global Human Language Optimization (GHLO) initiative. An AI-generated global language must, by design, be ambiguity-free. The detection algorithms developed here will be used to validate GHLO's generated language outputs.