Data management made easy
Data management and data cleaning are critically important steps in any data analysis. Many of us learned this lesson the hard way. Have you ever fit a model that includes age as a covariate and forgotten to convert the missing value codes of -99 to missing values? I have. Or maybe you overlooked a data entry error that resulted in an age of 354 that should have been 54. I’ve done that too. Careful data management and cleaning can help us avoid these kinds of embarrassing mistakes.
I recently recorded a series of data management videos for the Stata Youtube Channel. You can click on the links below to watch the videos. I included topics that I think are important, but the list is far from exhaustive. If you would like to see videos on additional topics, please leave your suggestion in the comments below.
- How to merge files into a single dataset
- How to append files into a single dataset
- How to label variables
- How to label the values of categorical variables
- How to add notes to a variable
- How to convert a string variable to a numeric variable
- How to convert categorical string variables to labeled numeric variables
- How to create a date variable from a date stored as a string
- How to convert missing value codes to missing values
- How to identify and replace unusual data values
- How to round a continuous variable
- How to change the display format of a variable
- How to create a new variable that is calculated from other variables
- How to create a categorical variable from a continuous variable
- How to identify and remove duplicate observations
- How to reshape data from wide format to long format
- How to reshape data from long format to wide format
- How to optimize the storage of variables
You can learn more about these topics and many others in the Data Management Reference Manual.