Language Ambiguity Detection

The Problem

Ambiguous sentences are everywhere. They cause misunderstandings that range from everyday confusion to geopolitical conflict. The United Nations, operating across six official languages, faces this challenge at every session of every committee. Commercial contracts, medical instructions, and software specifications all suffer from the same fundamental problem: natural language is deeply, structurally ambiguous.

Automatic detection of ambiguous sentences is, despite appearances, an unsolved problem. Existing approaches require large annotated datasets or rely on narrow heuristics. Neither scales.

The Approach

We propose using multilingual LLM translation behavior as a diagnostic tool. The key hypothesis is straightforward:

An ambiguous sentence should translate consistently — either to ambiguous or to unambiguous forms — across all target languages. Inconsistency in translation output is a signal of underlying ambiguity.

We investigate whether multilingual LLMs encode ambiguity in their internal representations or whether they collapse to a single meaning for ambiguous inputs. The answer determines both the feasibility of the approach and what it tells us about how LLMs handle meaning.

Research Goals

🔍

Detection Algorithm

Create algorithms that determine whether a given text contains ambiguity, using cross-lingual consistency as the signal.

🌐

Translation Improvement

Improve existing multilingual translators' handling of ambiguous input — either by flagging ambiguity or by producing all valid interpretations.

✨

Ambiguity-Free Language

Develop components for a global human language that is structurally free of ambiguity — a contribution to the GHLO project.

Real-World Significance

Language ambiguity causes misunderstandings and conflicts in:

Political negotiations — where both parties may ratify the same text understanding different things
Commercial contracts — where ambiguous clauses create legal disputes
Cultural exchange — where nuance is lost in translation, creating offense or confusion
AI systems — where ambiguous input leads to inconsistent outputs and unreliable performance

The United Nations is identified as a major potential beneficiary of this work, given the complexity of its multilingual negotiation environment.

Connection to GHLO

This project is a foundational component of the Global Human Language Optimization (GHLO) initiative. An AI-generated global language must, by design, be ambiguity-free. The detection algorithms developed here will be used to validate GHLO's generated language outputs.