usexmlex - import XML files to Stata by Sergiy Radyakin

usexmlex will import XML data to Stata.

Note that there are multiple ways to store data in XML format, and this program imports only one particular way, which is compatible with the Microsoft Excel XML quieries. Stata itself provides a command xmluse, which imports data stored in a different way.

Description

The usexmlex program will import xml formatted data into Stata. This is an incredibly inefficient format, in which every value is wrapped into a tag of the field, such as (fragment):

<PRESIDENTS>

  <PRESIDENT>
  <lastname>Washington</lastname>
  </PRESIDENT>

  <PRESIDENT>
  <lastname>Adams</lastname>
  </PRESIDENT>

  <PRESIDENT>
  <lastname>Jefferson</lastname>
  </PRESIDENT>

</PRESIDENTS>

This format however is used as an exchange format between various software packages and as a source for quieries to xml data in Microsoft Excel.

This converter makes several assumptions regarding the input data:

Installation instructions

Minimum requirement for usexmlex is Stata 13.0, usexmlex is implemented in Mata and should work on any platform, on which Stata itself can run, but the testing was done on MS Windows platform only.

Usexmlex is part of Statistical Software Components (SSC). Individual files are available from this RePEc page. Most users don't need to download individual files and can install with a single command.

To install -usexmlex- type in Stata:

findit usexmlex
then follow the link to install the program.

Check with SSC regularly to receive further updates:

adoupdate usexmlex

Syntax

The main syntax is trivial:

usexmlex

will show the dialog to select the file name of the file to be imported and any additional parameters.

usexmlex using "datafile.xml"

will import the data from the specified data file. Specify full path if necessary. The word using is mandatory and cannot be omitted from the syntax.

usexmlex about

will display program version and author information.

To load only a subset of data a variables list may be passed in the varloadlist option. For example, to load only two variables id and name from the same file one can write:

usexmlex using "test.xml", varloadlist(id name)

If necessary, an option clear can be added to allow clearing the memory when it contains unsaved changes.

Note that many XML files produced with modern software are stored in a unicode encoding. Stata does not work with unicode, so import such files with caution. For non-latin characters of UTF-8 encoding you will get unrecognizable text in Stata and for other Unicode versions the program may produce unexpected results.

Tutorial and Examples

In the following example we instruct Stata to import an example xml data file

usexmlex using "http://www.radyakin.org/transfer/usexmlex/testdata/test.xml"

The last observation contains a missing value for the name variable. Review the xml file to understand how missing values are stored.

Author and support

In case you are experiencing a problem importing a valid dataset, and you think the error is mine, kindly let me know.

usexmlex was written by Sergiy Radyakin. To contact the author send email to
sradyakin/at/worldbank.org.

Or write your questions to Statalist with tag usexmlex (preferred)