Soluling home   Document home

Language Id

Language Id

Soluling deals with languages, and this is why it has to be able to identify languages. To do that, it uses a language id that is a combination of language, script, country, and variant. The format is:

ll[-Ssss][-CC[-variant]]
Part Description Required
ll ISO-639 (Wikipedia) language code. Two or three lower case characters. Most languages use two characters code (ISO-639-1). Yes
Ssss ISO 15924 (Wikipedia) script code. Four characters starting with upper case characters. This is only included if the script is different than the default script of the language. Yes when not using the primary script
CC ISO-3166 (Wikipedia) country code. Two upper case characters. No
variant Variant code. If a variant is given but no country, then two hyphens must be used between language and variant. For example, de--phoneb No

The format uses the IETF language tag specification (Wikipedia).

Examples

Some language id examples:

Value Description Notes
de German  
de-DE German (Germany)  
de-CH German (Switzerland)  
de--phoneb German with phonebook sort order Special German used in phonebooks.
de-DE-phoneb German (Germany) with phonebook sort order Special German (Germany) used in phonebooks.
zh-Hans Simplified Chinese  
zh Simplified Chinese Script part (Hans) is not necessarily needed because the default Chinese is Simplified Chinese.
zh-CN Simplified Chinese (PRC) Script part (Hans) is not needed because in PRC only Simplified Chinese is used.
zh-Hant Traditional Chinese Script part (Hant) is added to zh because the default Chinese is Simplified Chinese.
zh-TW Traditional Chinese (Taiwan) Script part (Hant) is not needed because in Taiwan only Traditional Chinese is used.

Learn more about languages that need special care.

Soluling uses the above language id internally. However, when reading and writing localized files or exporting files that contain language ids, Soluling can use different language ids. The separator can be "_" instead of "-". The id might be without the script part. The case of the id can be different such as all lower case or all upper case. XML, TMX, XLIFF, and TBX file formats use language ids. Use Write options sheets (TMX, XLIFF, XML) of a source dialog or properties sheets (TMX, XLIFF, TBX) of the export, create translation package and translate wizards to configure the language id format.

Language id format

Specifies how to format language ids used in xml:lang attributes. Possible values are:

Value Description
Standard: la[-Ssss][-CC] Standard language id that contains an ISO 639-1 language id (la) and optional script id (Ssss) and/or ISO-3166 country id (CC).
Legacy: la[-CC] Legacy language id that contains an ISO 639-1 language id (la) and an optional ISO-3166 country id (CC). If the id uses the secondary script of the language (e.g., Traditional Chinese), then the country id is required.
Underline: ll[_CC] As above, but separator character is underline, "_", instead of the hyphen, "-".

It is recommended to use the default language ids (either Use the same format as in the original file or Standard). Use other formats only if your files are consumed by a system that requires legacy ids.

Language id case

Specifies the case of the language attributes that are used in the file. Possible values are:

Value Description
Standard (la-CC) The language part is written in lower case. The optional country part is written in the upper case. For example, en-US is for English in the United States. This is the default value.
Lower (la-cc) Both language and country parts are written in lower case. For example en-us
Upper (LL-CC) Both language and country parts are written in upper case. For example EN-US

Include script into ids

Specifies when the script part is included in the language id. This is visible when language id format is set to Standard. Possible values are:

Value Description
Standard Script id is included if it is absolutely needed to make id unambiguous. For example, English is en, Simplified Chinese is zh, and Traditional Chinese is zh-Hant.
On multiple scripts Script id is included if the language can use more than one script. For example, English is en, Simplified Chinese is zh-Hans, and Traditional Chinese is zh-Hant.
Always Script id is always included. For example, English is en-Latn, Simplified Chinese is zh-Hans, and Traditional Chinese is zh-Hant.

Include country into multi-script ids

Specifies when the script part is included in the language id. This is visible when language id format is set to Legacy or Underline. Possible values are:

Value Description
No A country id is not added if not implicational given. For example, English is en, Simplified Chinese is zh, and Traditional Chinese is zh-TW.
Yes A country id is included if the language has more than one possible script. For example, English is en, Simplified Chinese is zh-CN, and Traditional Chinese is zh-TW.