Soluling home   Document home

XML Localization and Internationalization

XML Localization and Internationalization

Extensible Markup Language (XML) (Wikipedia) is a very popular text-based open standard that is used to store and exchange data. Soluling localization tool and service support XML. You might have a standalone XML file that needs to be localized, or your other file (e.g., application) contains embedded XML data that needs to be localized. Soluling can handle them both.

Localization process

If you want to localize an XML file, it means that you most like have text data in your XML file. Take a look at the following simple XML file.

<?xml version="1.0" encoding="utf-8"?>
<simple>
  <product>Skim milk</product>
  <price>1.15</price>
  <onsale>0</onsale>
</simple>

You can find the file from <data-dir>\Samples\XML\Simple\Simple.xml. The file has a product element that contains the English text: "Skim milk." This needs to be translated. If we translate the above file into German, we will get:

<?xml version="1.0" encoding="utf-8"?>
<simple>
  <product>Magermilch</product>
  <price>1.15</price>
  <onsale>0</onsale>
</simple>

The structure of the German XML file is identical to the structure of the original English XML file; only the text in the product element has been translated into German.

When you add an XML file into the Soluling project, you need to select what elements are localized. This is done on the XML items sheet of Project Wizard or Source dialog.

Localization method

You have three choices to select what to localize:

Localize all items of selected types

If you leave the Localize all items of selected types checked, Soluling extracts everything that contains the text. From the above XML file the product element would be translated, and the price and onsale elements would be ignored. The default feature is to translate all string data, but you can add other data types too.

Localize all items of selected types

When you click Next, the Project Wizard shows the data types sheet that lets you select the data types that are localized. By default, the string type is checked, and all others unchecked. If you want to localize the numbers too, check either Float or Integer or both. You can also configure if and how the localize, context and language attributes are used.

Localize all items of selected types

Even all XML data is text data, the actual value can and in most cases, is something else but text. It can be a number, color value, boolean value, or even an image encoded into base64 string. Soluling automatically detects the format. Of course, this detection is not always 100% accurate. For example, the string "0" can either mean number 0, the boolean value false, or the string "0". Also, string "true" can either be boolean value true or the string "true." If the default detection of Soluling does not get the type correctly or you need a more sophisticated way to choose what elements are translated and how the element data is interpreted to use the Select items you want to localize.

Select items you want to localize

If you check Select items you want to localize radio button Soluling shows selection tree that contains the structure of the XML file where you can check those elements that you want to localize. To check or uncheck an element, double click it. To specify the data type of the element right-click and set the format.

Select items you want to localize

Data types sheet is not visible when you have turned on the item selections. This is because the selection also specifies the data type of each element that is selected. Data types sheet is not needed.

Use rules rules to select elements

If you seletc Use rules to select elements radio button Soluling shows the element select rule editor. Use the editor you add one ore more XPath based element rules. Each rule selects all the elements that match the XPath expression given in the rule.

Select items you want to localize

Click the Add button to show the Select Rule dialog. Enter product in the Value expression field and click OK.

Select items you want to localize

Click OK to add the rule. Now you have told Soluling to localize all description elements in your XML file. The Id expression field is optional. If the XML element contains an attribute or sibling element that specifies the id, enter the XPath to that element. It tells Soluling to use the value of the attribute or element as a context value for the value.

Localize attribute

Sometimes you need finer control over what elements are localized than the selection tree can offer. For example, you might be several elements with the same name, but there is one element that you don't what to localize. For example, the following XML file contains one value element that should be localized and another that should not be localized.

<?xml version="1.0" encoding="UTF-8"?>
<sample> 
  <value>Translate this</value> 
  <value>Do not translate this</value>
</sample>

How can you solve this? The answer is to use a localize attribute. It is a boolean attribute that either sets a positive or negative localize flag. A positive attribute contains a value (e.g. "true", "yes" or "1"). A negative attribute, contains a negative value (e.g. "false", "no" or "0"). If you add a negative localize attribute Soluling will not localize that value element even if you check it in the selection tree.

<?xml version="1.0" encoding="UTF-8"?>
<sample> 
  <value>Translate this</value> 
  <value localize="false">Do not translate this</value>
</sample>

Here is a positive localize attribute.

<?xml version="1.0" encoding="UTF-8"?>
<sample> 
  <value localize="true">Translate this</value> 
  <value>Do not translate this</value>
</sample>

Of course, you can add both negative and positive attributes in the same XML file.

<?xml version="1.0" encoding="UTF-8"?>
<sample> 
  <value localize="true">Translate this</value> 
  <value localize="false">Do not translate this</value>
</sample>

Use the Options sheet to specify what values make positive and negative localize attributes. The sheet contains Localize attribute group box that is used to specify positive and negative localize attribute values.

Localize

The localize attribute is stronger than selection. Even if you have not checked an element, but it contains a positive localize attribute, the element is localized. If you check to Localize only those elements that have positive localize attribute checkbox, then Soluling localizes only those elements that have a positive localize attribute. So even element has been checked in the selection but if it does not have a positive localize attribute it is not localized. By default, Soluling will handle "localize" and "translate" attributes as localize attributes. You can turn any attribute into a localize attribute by using the selection tree.

Custom localize attribute

Soluling did not detect loc as localize attribute, but the user has manually set the loc attribute as the localize attribute.

Context

XML context is a very important issue. Whenever Soluling extracts an item for a source file, it has to give the items a unique context. The context is used to identify the extracted elements. Let's study how this context is generated. By default, the context is a combination of the element names (element + parent + parent's parent + etc.) and the index of the element. The following XML file (<data-dir>\Samples\XML\Sport\SportSimple.xml) contains sport names.

<?xml version="1.0" encoding="utf-8"?>
<sports>
  <sport>Soccer</sport>
  <sport>Ice hockey</sport>
  <sport>Basketball</sport>
</sports>

When Soluling extracts the names it gives each value a unique context. The following table shows the values and context values.

Value Context
Soccer sports.sport[0].value
Ice hockey sports.sport[1].value
Basketball sports.sport[2].value

If you never change your XML file, the default context works just fine. However, if you modify the XML file, the context values of the existing elements may change. If we add a new Skiing element between Soccer and Ice hockey, the XML file looks just fine.

<?xml version="1.0" encoding="utf-8"?>
<sports>
  <sport>Soccer</sport>
  <sport>Skiing</sport>
  <sport>Ice hockey</sport>
  <sport>Basketball</sport>
</sports>

However, the context values change in a dangerous way. Only Soccer keeps its old context value, but all others get new context values.

Value New context Old context
Soccer sports.sport[0].value sports.sport[0].value
Skiing sports.sport[1].value -
Ice hockey sports.sport[2].value sports.sport[1].value
Basketball sports.sport[3].value sports.sport[2].value

This will lead to a situation where the Soluling notices that either the original value or context has changed. In that case, Soluling makes everything it can to keep the translations in sync. If it cannot, it has to invalidate or drop the existing translations. The result may cause a massive loss of existing translations. How can we prevent this? Fortunately, there is a simple way where we use context values. Our original XML file did not contain any context values, so we need to add them. The following XML file (<data-dir>\Samples\XML\Sport\SportSimpleId.xml) is a modified file that contains context values in id attributes.

<?xml version="1.0" encoding="utf-8"?>
<sports>
  <sport id="soccer">Soccer</sport>
  <sport id="hockey">Ice hockey</sport>
  <sport od="basketball">Basketball</sport>
</sports>

Most real XML files contain context value. In most cases, it is either the id or context attribute. However, it might use some other attribute name of a sub-element instead of an attribute.

The following table shows the values and context values.

Value Context
Soccer sports.sport[soccer].value
Ice hockey sports.sport[hockey].value
Basketball sports.sport[basketball].value

If we now add the Skiing element, it won't change the existing context values.

<?xml version="1.0" encoding="utf-8"?>
<sports>
  <sport id="soccer">Soccer</sport>
  <sport id="skiing">Skiing</sport>
  <sport id="hockey">Ice hockey</sport>
  <sport od="basketball">Basketball</sport>
</sports>

The following table shows the values, old and new context values.

Value New context Old context
Soccer sports.sport[soccer].value sports.sport[soccer].value
Skiing sports.sport[skiing].value  
Ice hockey sports.sport[hockey].value sports.sport[hockey].value
Basketball sports.sport[basketball].value sports.sport[basketball].value

As you can see, the existing context values remain unchanged. Soluling detects context attributes automatically if you use either id or context named attribute. If you use some other attribute, then you have to specify it manually. Use the Items sheet of Project Wizard or Source dialog.

If you use Localize all items of selected types method you can specify if and what attributes are handled as context values. The default is id and context attributes.

Context attribute

If you use Select items you want to localize method, you can select any attribute to be a context attribute by right-clicking and selecting Context value.

Context attribute

Multilingual XML

Final topic about XML localization is XML's language attribute xml:lang. It is used to specify that it is the language used in the element. xml:lang attribute is specified in XML's specification, so it is an integral part of XML. If you have a monolingual XML file, there is no need to use xml:lang attribute. However, if the same XML file contains data in more than one language, then you should use xml:lang attribute to specify every text element. The following sample contains modified product XML when the product element contains the language attribute.

<?xml version="1.0" encoding="utf-8"?>
<simple>
  <product xml:lang="en">Skim milk</product>
  <price>1.15</price>
  <onsale>0</onsale>
</simple>

Using the language attribute, it is possible to create multilingual XML files. Here is English-German XML file.

<?xml version="1.0" encoding="utf-8"?>
<simple>
  <product xml:lang="en">Skim milk</product>
  <product xml:lang="de">Magermilch</product>
  <price>1.15</price>
  <onsale>0</onsale>
</simple>

Normally Soluling creates a localized XML file. This means that there will be one translated XML file for each target language. Of the original file is C:\Files\Sample.xml German XML file will be created into C:\Files\de\Sample.xml and French into C:\Files\fr\Sample.xml. Using the language attribute it is possible to create one multilingual output file as shown above. Soluling lets you choose the output file(s). You can either select localized files or multilingual file or both. Use the Output sheet of Source dialog to specify the output files. The following picture contains settings where localized files are turned on, but multilingual files are turned off. This is the default setting. If you want Soluling to make also multilingual files, check Multilingual checkbox.

Output

If your original file contains no xml:lang attribute of contains only one language, then you can choose both localized and/or multilingual output files. However, if the original XML file contains data in more than one language, then only the multilingual output file is enabled.

In addition to default XML's language attribute, xml:lang, you can use any attribute as language attribute. Use the Items sheet to set an attribute as a language attribute.

XML data in component properties

If you have a property that contains XML instead of plain textm, you might have to select what element of XML are localized and what is left intact. Scan rules let you do that. First, you have to set the format to XML string, and then you need to add the elements that you want to localize. Let's have an example. We have a Windows Forms resource file (.resx) that contains the following property

<data name="c1TrueDBGrid1.PropBag" xml:space="preserve">
  <value>...</value>
</data>

The property is Propbag property of ComponentOne TrueDB Grid component. The simplified property data looks like this

<?xml version="1.0"?>
<Blob>
  <DataCols>
    <C1DataColumn Caption="One" DataField="">
      <FilterCancelText>Close</FilterCancelText>
      <FilterClearText>Clear</FilterClearText>
    </C1DataColumn>
    <C1DataColumn Caption="Two" DataField="">
      <FilterCancelText>Close</FilterCancelText>
      <FilterClearText>Clear</FilterClearText>
    </C1DataColumn>
  </DataCols>
</Blob>

The actual XML is much more complex, but for this example, we use the elements that contain text that has been marked in green. To select an item, we have to specify the full path of the element (excluding the root element Blob). Separate each element by @ character. Add # character before the attribute name. Path for FilterCancelText is "DataCols@C1DataColumn@#Caption", path for FilterCancelText is "DataCols@C1DataColumn@FilterCancelText", and path for FilterClearText is "DataCols@C1DataColumn@FilterClearText". A single rule will select all elements matching the same so in above sample single rule for Caption will select both Caption attributes.

Samples

GitHub and <data-dir>\Samples\XML contains following XML sample directories:

Directory Description
Simple A simple XML sample file. Study this first.
Data A sample file that contains base-64 encoded binary data (text, image and sound).
Entity A sample file that uses custom entities.
Image A sample file that contains base64, base32, base16/hex and URL encoded images.
Localize Sample files that show how to use localize attribute.
Multilingual Sample files that show how to localize multilingual XML files.
Pair A sample file that shows how to localize a file what contains original/target language pairs.
Sport Sample files that contains text, numbers and images.
Types A sample file that contains various data types.

Configuring XML Localization

You can configure how to localize your XML file or data by selecting the item in the project tree, right-clicking, and choosing the Options menu. A source dialog appears that lets you edit the options. This source uses the following option sheets.

Settings

Read more about other data files such as XML, XSL, JSON, YAML, INI, Excel, SVG, TMX, XLIFF, text and binary files.