BOM Tools

Byte Order Mark

BOM stands for Byte Order Mark and is a special marker included in the beginning of text files to indicate how the multi-byte characters are encoded. Detailed information is available in Wikipedia: byte order mark.

Many programs use one specific character encoding. Hence the users never need to know or care, which one is used.

For programs that allow saving texts using different encodings, there must be a hint present in the content of the file that would indicate, how the characters are encoded.

For example, a notepad.exe program shipped with Windows supports the following multi-byte encodings: UTF-8, Unicode, Unicode Big Endian.

Notepad will read the BOM during the loading of the file. The same BOM will be written during file save. The user only needs to specify the BOM once. BOM can be changed during the "Save as..." operation. However BOM cannot be removed from a text file using notepad.exe.

In the absense of the BOM, a program may rely on default, or try to guess the encoding of the content based on some ad hoc approaches. They are not reliable and may result in incorrect interpretation of the file content.

Similarly to notepad, Gnumeric allows the user to specify the encoding to be used for data in files, (but doesn't write the marker).

BOM Manipulation Tools

To manipulate the file BOM the following tools may be used:

Note that none of these tools changes the encoding of the content of the file, it is copied verbatim. Only the BOM marker is affected.

Both programs are for Windows command line tools. To open a command line prompt, press Win+R on your keyboard.

It is easiest to work when all the necessary files are in the current directory. Specify full paths if this is not true. If any path contains a space, enclose it with double quotes.

Use of BOM in Survey Solutions

Survey Solutions requires the prefill sample files to be saved in UTF-8 encoding. If the BOM confuses the program, it must be removed with killbom.exe tool above. To restore the files to the original state setbom.exe tool can be used.

Downloads

The following versions are from November 06, 2014:

killbom.exe v1.0, size: 7,168 bytes, md5=a4af4230c04002f8a43c81c1711f0ea8
setbom.exe v1.0, size: 6,144 bytes, md5=8fe37fa54a038ec0d78cc3c2c185b7aa

Author and support

These tools were written by Sergiy Radyakin, Economist, The World Bank. For questions and feedback, write to sradyakin/at/worldbank.org.