savespss - a module to export Stata data into SPSS system file format (*.sav) by Sergiy Radyakin

Description

savespss will export data from Stata's memory into an SPSS system (aka binary file) datafile (*.sav).

If instead you want to import SPSS data to Stata, then you need the -usespss- program

savespss facilitates exchange of data between Stata and SPSS. SPSS does not support direct import of datasets in recent Stata formats. Reportedly formats up to Stata 9 are supported by SPSS's import facility. SPSS of versions prior to 14.0 could not import Stata's datasets at all.

This program solves this by allowing Stata users and programmers to export their data directly in SPSS system file format.

The description of assumptions and limitations is organized into the following sections:

Supported features

savespss supports all the data attributes that the users normally expect from a statistical data conversion package, but be sure to also check the non-supported features.

List of supported features:

  • numeric variables - exported as doubles without loss of precision
  • short string variables (<=244 chars in all versions of Stata)
  • long string variables (Stata 13.0 or newer)
  • long string variables of strL type (up to the maximum length permitted by SPSS);
  • long variable names (>8 chars)
  • variable labels
  • date/time formats (%td, %tc) and date/time correction (for difference in zero-hour);
  • value labels for date/time formats;
  • variable characteristics
  • dataset characteristics
  • system missing and extended missing values (.a, .b, .c); all other missings -> SPSS_sys_miss
  • autodetection of 'convenient' per-variable mapping of extended missing values (user can specify exact values to be used for extended missing values in all variables if necessary, or switch off extended missing values all together)
  • value labels (tentatively, including value labels for extended missings)
  • variable formatting (only %X.Yf format is supported for numeric variables and converted to FX.Y format of SPSS, all other are written as F8.0 format of SPSS);
  • variable names starting with an underscore (not supported by SPSS) - _xyz -->renamed to-->@xyz;
  • codepages for non-ASCII characters, e.g Russian, German, Greek, Turkish, Arabic, and some other languages.

Not supported features and limitations

savespss supports all the data attributes that the users normally expect from a statistical data conversion package, but there are naturally some limitations.

List of [currently] not supported features:

  • variable names collisions due to case insensitivity of SPSS are detected by savespss, but it is up to the user to rename the variables;
  • savespss does not compress the data. There are no plans to implement data compression. Use zip to pack the output file if necessary. The resulting file will be of comparable or smaller size than the datafile in the SPSS compressed format. Note that recent versions of Stata have a built-in zip compression command zipfile. Alternatively, once the data is exported in the SPSS system file format (*.sav) it can be opened within SPSS and resaved. The option to compress the data is often the default option. .
  • Stata 13 introduced long strings (strLs) with potential use as a storage for various binary resources. It is not clear whether any sequence of bytes will be treated as valid by SPSS. Furthermore, long content may be truncated on export. An attempt to recover a binary resource from truncated content may lead to unforseen consequences. What this means is that if you store a digital photograph in a strL, and it gets truncated during export, there will be only half of the picture in SPSS. An attempt to open that half in your image editor may crash the image editor. Naturally.
  • savespss has been written in the times when Stata didn't support Unicode (before Stata 14.0), hence it does not expect unicode content and may produce incorrect output if the user attempts to output such content. Your best option is to convert all content to a single extended ASCII codepage, such as Arabic or Cyrillic, and then export the data in that codepage.

Installation

Savespss is part of Statistical Software Components (SSC). Individual files are available from this RePEc page. Most users don't need to download individual files and can install with a single command.

To install -savespss- type in Stata:


findit savespss

then follow the link to install the program.

Note: check with SSC regularly to receive further updates!


adoupdate savespss


If you installed a beta version of savespss from this site and not from SSC earlier, uninstall the beta version and reinstall from SSC.

Requirements

savespss is implemented in Mata programming language and requires Stata version 10.0 or later. It is platform-independent and designed to work equally well in Windows, MacOS, and Linux environments.

savespss is a genuine writer. It does write the data into SPSS binary format without any need for additional packages or converters. In particular, it doesn't require SPSS (PASW) or StatTransfer to be installed on the user machine.

Usage

The main syntax is trivial:


savespss

will show information about the program and ask for filename to export the current data from memory.


savespss "datafile.sav"

will export the current data to the specified file. Specify full path if necessary.


savespss "datafile.sav", extmiss("97 98 99")

will export the current data to the specified file, recoding the extended missing values .a, .b, and .c to 97, 98, and 99 respectively for all numeric variables. If option extmiss is not specified, the 'convenient' values will be detected automatically, based on the user data. This takes a bit of time.

From the version 1.73 the extended missing values can not be suppressed as in the earlier version alltogether by specifying an option extmiss("off"). This option is also not available in the dialog. This option is likely to return in a future version.

To convert a datafile the following sequence can be used:


use "C:\DATA\mydata.dta"
savespss "C:\DATA\mydata.sav"

If desired, an option replace can be added to allow overwriting existing file.

For non-ASCII content, option codepage can be used:


savespss "C:\Data\mydata.sav", codepage(1254)

In Stata 13 and newer a strlmax() option may be added to limit the length of strings. This option may be necessary to prevent the output file growing uncontrollably or when it is known the long content will not be required.

Options if and in were frequently requested. I am still investigating, but for now a second, extended version of the dilog exists, which provides these options:


db savespsssome

Note that SPSS software blocks data files that it opens for write access by other programs. You will get an error if you attempt to write (export) data to the file currently open in SPSS on your system. This is normal by-design operation (of SPSS) and is not a bug of either side.

Codepages

Stata as of version 13.1 does not support unicode for strings that it processes, including variable and value labels, and variable names. Hence the string contents is saved as ASCII content.

An option codepage can be specified to instruct savespss to apply a certain codepage to the strings in output. This option must be specified by the user directly since Stata does not contain it neither for the dataset in memory nor for the files saved to disk.

The term codepage refers to interpretation of characters with codes 128-255.

Codepages list

Option codepage takes values from the following list:

To specify the codepage, simply add the codepage option to the syntax of savespss, like in the following example:

savespss "C:\Data\mydata.sav", codepage(1254)

Only one codepage may be specified for each one file and it will be applied to all strings in that file.

Note, that the default codepage is: 1252 (Latin-1).

Examples

Example: auto.dta-->auto.sav


Let's consider the standard example of auto.dta dataset. Here is how it looks in Stata 10:


sysuse auto
browse

Save it to SPSS system file format (auto.sav):


savespss "T:\SAVESPSS\auto.sav"

And view the result in the PSPP. Here is the data:


Here is the data with value labels:


And here are the variables.


And here is how it is described: PSPP output

Author and support

In case you are experiencing a problem exporting a dataset, and you think the error is mine, kindly let me know.

savespss was written by Sergiy Radyakin.

To contact the author send email to: sradyakin/at/worldbank.org.

Or write your questions to Statalist with tag savespss (preferred)

Version and updates history

Date Version Description
01 Sep 2014 1.77 this revision fixes the problem with dialog file not installed, fixes problem with declared but missing value labels reported by Ellen Van Loo, and behaves more intelligently with respect to extended missing values in the original file.
30 Jul 2014 1.73 the first version published to SSC; this revision includes support of strLs, dates conversion, better reporting on the recoding of missing values, enhanced dialog, and other improvements; it also fixes the issue with long value labels reported by Daniel Bela
07 Jul 2014 1.61 experimental support of output in codepages (requested by Alexander Staudt)
28 Jan 2014 1.51 experimental support of long string variables (Stata 13)
27 Jan 2014 1.22 fixed problem reported by P.H.E. van Rooij (thank you) with exporting datasets containing numeric variables that are completely missing observations.
25 Oct 2013 - fixed program from freezes under an identifiable rare condition.
22 Oct 2013 - added autodetection of 'convenient' values for extended missings
21 Oct 2013 - fixed problem reported by Dr. Dirk Enzmann regarding buffer overflow under some conditions. Thank you!
21 Oct 2013 - fixed recurrently appearing dialog