# Economics Statistical Services (ESS)

The Economics Statistical Services (ESS) unit provides data analysis assistance to students enrolled in the economics department working on Junior Independent Work (JIW), Senior Thesis, or dissertation work.

The ESS team can help with any aspect of what we called the *Anatomy of Data Analysis*: data collection, data preparation, data cleaning, data merge/append, data visualization, descriptive statistics, linear or non-linear model, panel data analysis, time series, model selection, output interpretation, data presentation, and/or issues related with the use of Stata or R/RStudio for data analysis.

**On crafting a research question, finding da****ta, and beyond…**

Probably one of the most difficult parts of writing an academic paper is crafting the research question. A well-defined research question will guide the rest of the process: what data do you need? What statistical and econometrics procedures do you need to apply? What do you expect (or not) to find? And, eventually, what did you actually find (or not)?

There are two interconnected primal components to the process of crafting a research question: 1) what is it that you like? What calls your attention? What general topics or issues are you drawn to: inequality, poverty, environmental issues, health issues, housing issues, economic growth, economic development, etc.; 2) then you may need to ask yourself what do we know, so far, about the issue(s) that matter to you? What has academic research revealed about that topic(s)? What have been the main findings? What are the controversial points? What is the consensus about it? What is missing? To answer all these questions you need to look for academic papers that either address the topic or issue you want to know more about, or academic papers that work on something closer to the topic you want. On the latter, let’s say that you are interest on an issue that affects country *A*, but there is no research done for country* A*, however, somebody did something similar for country *B*, you could apply some of that research process to your analysis for country *A*.

Researching for academic papers may also help you narrow down your topic, for example if you are interested in a broad topic like inequality but you are not sure exactly what in particular calls you attention to it, it does help to learn more. Finding papers on that general topic will show you what sub-topics or angles have caught the attention to other researches and then, maybe one of those sub-topics may also caught your eye. Once you found the topic of interest, that same research will show you how they did it: what research questions they asked? What data they used? Where did they find the data? How did they process the data? What statistical and/or econometric methods did they use? How did they interpret their results? What did they conclude? What were the limitations of their findings? What else needs to be done?

For those of you just starting doing academic research there are no expectations of knowing everything; there is, however the expectation of you using the resources available to you so you can learn, and eventually know everything. When you do reach that point, you will then realize that there is more to be learn, but this time you will know how to go about it. For additional information on crafting a research question and beyond, see here.

Princeton University provides a wealth of resources for academic research.

- To find academic papers on a myriad of topics: EBSCO , JSTOR, EconLit, and more (see here).
- To find data, some sources that may be of interest are Data Planet, Social Explorer, Global Insight, Wharton Research Data Service, World Development Indicators, World Values Survey, ICPSR (for more see here).

**Online Introduction to Data Analysis using Stata.**

Index/Do-file – The videos below follow this do-file, browse through the do-file until you find the a section of interest, then open the corresponding video according to the line number where the code is.

Part 1 – Overview. The importance of starting early.

Part 2 – Description of Stata’s Graphical Unit Interface (GUI), setting working directory, opening the do-file editor, setting the log file.

Part 3 – Import data into Stata (*.csv and *.xls*), save data in Stata format, open Stata files, browse data, rename variables, looping using *foreach* convert string to numeric, drop row of cases, drop selected cases, drop variables, replace values within variables, use frequencies to check variables, reshape data wide to long.

Part 4 – Merge 1:1 (on-to-one), merge m:1 (many-to-one), generate a variable from other variables, common tasks: open a Stata file, get frequencies, save data in Stata format.

Part 5 – Describing data using *describe* (general information about the data), summarizing data using *summarize* (basic descriptive statistics), declaring data as panel data using *xtset*, assigning numeric values to a categorical variable using *encode*, plotting time series graphs (line graphs), estimating population-adjusted variables, estimating daily values using time series operators, estimating 7-day moving averages, estimating growth rates (percentage change from one time period to the next).

Part 6 – Plotting scatterplots, adding a linear fit or trendline to the scatter, correlation matrix.

Part 7 – Summary statistics, recoding variables, one-way ANOVA test, t-tests, linear regression, histograms, log transformation, regression output interpretation

Part 8 – Producing nice regression tables using *outreg2*, producing nice Stata output using *asdoc*

**Tutorials**

There are a number of tutorials covering basic procedures in Stata in this link.

A quick guide:

**Data preparation/descriptives**

- If you are collecting your own data, make sure variables are in columns and cases in rows, see examples on page 3 in this document.
- To import Excel or *.csv data to Stata see page 20 in this document (Stata 17 can also import SPSS -extension *.sav or *.por- and SAS data -extension *.sas7bdat, *.xport, or *.xpt)
- To merge or append data, in this document.
- Descriptive statistics, page 22 in this document.
- To run frequencies, see page 23 in this document.
- To run crosstabulations see pages 25-27 in this document.
- Visualization, in this document.

**Regression models**

- For OLS regression, page 6 in this document.
- For a logit regression, page 4 in this document.
- For a logit regression (odds ratio), page 6 in this document.
- For marginal effects or predicted probabilities, in this document.
- Nice regression outputs in this document:

**Panel data**

- Cross-sectional time series data look like the example on page 2 in this document.
- In Stata, you need to first set the data as panel, see here page 5 in this document.
- After setting the data as panel you can run a fixed or random effects regression, see pages 19, and 27 in this document.

**Time series**

- Basic time series procedures (date conversion, lag operators) in this document.

The documentation above will be reviewed and updated in the following months and links will be updated as well. On its current format the tutorials help with most of the common tasks needed for data analysis.

If you have any questions contact Oscar Torres-Reyna at otorres@princeton.edu and will try to reply as soon as possible.

Note that if you need help with homework or problem sets please refer to your teaching assistants (AI).