Applied Panel Data Analysis for Economic and Social Surveys


Welcome to the website of our textbook on panel data analysis!


Hans-Jürgen Andreß

Katrin Golsch

Alexander W. Schmidt


Introduction to the website

All computations, estimations, and most of the figures for this textbook have been made with the statistical software package Stata. The book's website provides all necessary data sets and Stata syntax files to replicate our findings. For readers not being familiar with this software we also include the printed output, so that they can follow the computations without having to apply the software itself. In the future, the website may also provide syntax files for other statistical software packages.


·        Download a zip file including all data sets and Stata syntax files!


We are also interested in your feedback and therefore, would like to encourage you to send comments to the email address mentioned on the website. We have made every endeavor to keep the textbook as error-free as possible. However, if you think that you have encountered an error, please send us an email and we will include it in a list of errors that is also provided on the website.


·        Report any errors and comments to hja<at>

·        Download a list of errors.

Data from the German Socio-Economic Panel Study (SOEP)

With respect to the data provided on the website, a special note is necessary for the examples based on data from the German Socio-Economic Panel Study (SOEP). Due to German data protection regulations, scientists can only use SOEP data if they have signed a data distribution contract. We tried to provide the data for our SOEP examples in such a way that this precondition can be ignored. The Research Data Center SOEP allowed us to disseminate the data, if we make sure that single persons or households cannot be identified. As indicated in the descriptions of the data sets (see Section), we applied various measures to anonymize the SOEP data. All SOEP identification numbers were replaced by arbitrary numbers and in some cases we took random samples of the original data. Moreover, sensitive information was alienated either by aggregating categories or by adding random numbers. All anonymized data sets can be identified by the term “lehrversion” in the data set name.


Unfortunately, anonymizing the SOEP data invalidates the replication of some of the journal articles, which were the basis of our application examples. Users will only be able to find similar, but not identical estimates of the model parameters. Therefore, we include the computer output that was produced with the original data. By reading these computer outputs you can check whether and how one would be able to replicate the findings of the journal articles, if one had access to the original data (as we had).

Organization of the zip file

All general information including this description can be found in the root directory of the zip file that includes all data sets and Stata syntax files. Information pertaining to the single chapters and the appendix is stored in separate folders. Among other things, you find the Stata syntax files in these chapter-specific folders. These syntax files read information from a subfolder called “input” and write/save information into a subfolder called “output.” The only exception is the folder “chapter 2”, which includes a folder “chapter 2\data” with the anonymized SOEP data and an empty folder “chapter 2\input”, which we used for the original SOEP data and cannot make public. If you only want to read the computer output that the syntax files produce, you find the corresponding Stata log-files in the subfolder “output”. This is especially interesting for the examples using SOEP data, because these log-files will show the results derived from the original (non-anonymized) SOEP data only we had access to.

How to use the data and syntax files on your computer?

We suggest copying the contents of the zip file to your local computer into a similar folder structure. At the beginning of each Stata syntax file you find suggestions how to adapt the paths for input and output. Syntax files using SOEP data refer to the original non-anonymized data (we used this syntax to produce the log-files). Since these data are not supplied in the zip file, you also have to adapt the file names to the corresponding “lehrversion”, which is supplied in the zip file.

Description of the data sets

A description of the following data sets can be found in the document “data sets.pdf”:

·        garmit (Garrett and Mitchell 2001)

·        genderdiff (SOEP)

·        soep 2004-06 (SOEP)

·        hank (SOEP)

·        heineck (BHPS)

·        johnson_wu (Johnson andWu 2002)

·        postmat (SOEP)

·        wagepan (NLS Youth Sample)

·        wpgen (SOEP)