Multilingual corpora: translation and version hierarchy
Translating a canon does not consist only in moving a text from one language to another. It consists in preserving an authority perimeter, exclusions, temporality, scope, and sometimes a jurisdiction. In other words: what must survive translation is not merely the general meaning. It is the normative structure of what can be asserted.
A multilingual corpus therefore does not naturally produce a single stable truth. It produces several linguistic surfaces that can be equivalent, partially equivalent, locally adapted, or temporarily desynchronized. Without a declared hierarchy, a synthesis system often treats this plurality as one shared reservoir available for recomposition.
This page does not require perfect simultaneity or worldwide uniformity. It establishes something more demanding: in multilingual environments, one must govern what may be combined, what must prevail, and what must not travel without conditions.
1. A translated canon is not necessarily line-by-line identical
Two language versions can be canonically compatible without being textually symmetrical. A wording may need adaptation to preserve a legal nuance, a local usage, a sector distinction, or a readability level.
The doctrinal requirement is therefore not verbal identity. The requirement is the stability of:
- the boundary of what may be deduced;
- the hierarchy between assertion, condition, and exclusion;
- the date or version of validity;
- the relation between general rule and local variant.
A text can be lexically well translated and still be doctrinally false if it weakens an exclusion, universalizes an exception, or suggests that a secondary language prevails on an attribute it does not govern.
2. The main multilingual drifts
The most visible case is already documented in Multilingual and temporality: when FR and EN versions do not age together. But temporal desynchronization is only one case among others.
The most structuring multilingual drifts are usually these:
- hybrid recomposition: an answer combines fragments from several languages without declaring their status;
- scope shift: a local adaptation is read as a universal rule;
- erased exclusions: the translation keeps the assertion but loses the negation;
- asymmetric archiving: one language becomes the unintended archive of the other;
- implicit primacy: the most detailed or easiest-to-summarize language wins, even if it should not govern the attribute at stake.
In all these cases, the problem is not merely lexical. It is hierarchical.
3. What a multilingual corpus must declare
A governable multilingual corpus must be able to answer, for critical attributes, simple questions:
- which language is the reference for which type of information;
- when a local version prevails over the general version;
- which temporal gaps are tolerated;
- which information is pending translation and must not be combined;
- which elements must remain strictly synchronized.
This discipline matters especially for attributes that engage the real scope of an entity: offering, availability, location, compliance, lead times, prices, exclusions, contact procedures, roles, and perimeters.
It directly connects with product sources: FR documentation and EN pricing can both be perfectly accurate in isolation, then together produce a description that exists nowhere.
4. Translate negations, silences, and conditions too
A canon is not made only of positive statements. It is also made of intentional silences, boundaries, inference prohibitions, and response conditions.
This is why canonical silence must never be treated as a translation omission. What is not said in one language is not automatically fillable from another. A governed multilingual synthesis must be able to distinguish:
- what is truly equivalent across languages;
- what remains unspecified in all languages;
- what is stated locally but not exportable;
- what is temporarily absent and must not be completed.
Translating assertions without translating exclusions and conditions simply opens a wider inference space in one language than in another.
5. Temporality, memory, and persistence of old versions
In multilingual environments, system memory is not limited to internal archives. It also includes former language versions, captures, partially reused translations, indexed excerpts, and external citations.
This is why interpretive remanence is often stronger in bilingual or multiregional corpora. A correction in one language does not automatically shift the memory of the other.
Version power must therefore be read together with memory governance: what remains accessible in a secondary language can continue to govern synthesis, even if the primary language has been corrected.
6. What governing multilingual corpora does not mean
Governing multilingual corpora does not mean:
- imposing absolute textual symmetry;
- assuming that one language must always prevail over everything;
- forbidding local adaptations;
- requiring instantaneous translation of every change.
It means making visible the priority rules, the admissible gaps, and the non-combinable zones. Without those rules, translation stops being an equivalence device and becomes a reserve of fragments for opportunistic synthesis.
7. Doctrinal scope
This page extends doctrine to an object that is often under-governed: the coexistence of partially aligned linguistic truths. It does not replace a translation policy, a legal process, or a localization architecture.
It establishes only this: a multilingual site does not merely manage multiple languages. It manages multiple jurisdictions of interpretation.
Canonical connectors
- Multilingual and temporality: when FR and EN versions do not age together
- Endogenous governance: canonizing the entity on-site
- Version power in a web interpreted by AI
- Documentation, help center, pricing, and changelog: product source hierarchy
- Synthesis surfaces and silent authority reallocation
- Reading the SSA-E + A2 + Dual Web doctrine