I'm introducing psidR, which is a small helper package for R here which makes constructing panels from the PSID a bit easier.
One potential difficulty with the PSID is to construct a longitudinal dataset, i.e. one where individuals are followed over several survey waves. There are several solutions.
- In the so-called data center, users can use drill-down menus to select relevant variables from each wave. If the user wants only recent waves, there exists a subsetting mechanism (e.g. only household heads younger than 55). As the required dataset gets larger, this becomes unhandy, as the interface gets slower and slower, and the clicking procedure is rather error prone. The main motivation for this package is that I've spent too many hours clicking on cryptic variable names only to realize after I was done that I had forgotten a variable. Unacceptable.
- User may download the data and attempt to merge the annual interview files in order to obtain the desired panel. Though conceptually not very difficult (there is an individual index file, which provides a link for individuals across years), it is a cumbersome accounting exercise to find the right variable names from each year and do the right merges.
- One can use psidR. The main function is inspired by the Stata add on package psiduse. Here is the function's signature.
build.panel(datadir,fam.vars,ind.vars=NULL,fam.files=NULL,ind.file=NULL,heads.only=TRUE,core=TRUE,design="balanced",verbose=FALSE)
- There is a default behaviour, where the user only points towards a data directory. otherwise one can specify custom locations for family files and individual index.
- you can supply the PSID data in stata format or csv files
- The user has to supply a data.frame "fam.vars" which lists the variable names for all required waves.
- it's possible to tell the function that a certain variable is missing in a given year (without the variable getting dropped, so you can impute it later on)
- One can subset the data for household heads only
- there is a switch to only get the core sample
- There are 3 different sample designs to choose from: balanced panel (all individuals must be present in all waves), k-period panel (individuals must be at least k periods present) and unbalanced (all included)
- with "verbose=TRUE" the function prints comments as you go along.
awesome. i've been working through the psid for a post for http://asdfree.com -- mind if i build on this work? please contact me, i can't find your e-mail. ajdamico@gmail.com
ReplyDeleteThis is what I've been looking for. I tried to implement the build.panel function, but received the following error message:
ReplyDeleteobject '.SD' not found
when the code tried to use the lapply command. Any thoughts?
Hi Kris,
Deletedon't now why this could happen. I use the .SD idiom only in three places and they all seem to be pretty standard. .SD is short for "sub datatable", thus it's part of the data.table package. Maybe updating data.table would help? To get a handle on this, can you post me your sessionInfo() to florian.oswald@gmail.com? Also, did you try out one of the examples I posted or did you put together your own? A lot depends on where files are stored and that all the paths are correctly specified, so maybe that's where the error comes from? Anyway, let me know.
thanks!
this has been a lifesaver! the PSID is such a frustrating dataset, and the website is horrible! This has made the impossible possible for a lowly undergrad researcher.
ReplyDeleteHi Justin,
Deletegood stuff, I'm glad you were successful. Seems that you are exactly the person I wrote this up for - I've been there myself. I'm releasing the next version these days with better examples and an option to directly download the data from the server into R (not using SAS or stata in the middle).
Excellent! I look forward to future releases.
DeleteYour Net Worth Will Be More Than Halved After This…
ReplyDeleteYour Savings Will Be Worthless After China Does This
China Will Destroy Your Net Worth
Did you know that China is USA’s largest debt holder?
This means that all they need to do start devaluing the US Dollar is by selling their debt holdings to their secondary market.
This means the US Dollar value itself will crumble.
Some financial analysts have stated that the USD will more than halve in value over the next few months.
Are you and your family prepared?
Make sure you don’t let your net worth and savings be rendered worthless.
>>[Watch This Video To Learn How To Profit From This Downfall]<<
However, it’s not all bad news.
There’s a way that you can actually PROFIT from the fall of the US dollar and also any other economic collapse in the future.
Just click on the video before and learn how you can join the 1% of the elite that makes money each time there is an economic downturn.
>>[Watch This Video To Learn How To Profit From This Downfall]<<
Speak soon.
[Mr Mark Fidelman]