Economists frequently use public datasets. One frequently used dataset is the Panel Study of Income Dynamics, short PSID, maintained by the Institute of Social Research at the University of Michigan.

I'm introducing psidR, which is a small helper package for R here which makes constructing panels from the PSID a bit easier.

One potential difficulty with the PSID is to construct a longitudinal dataset, i.e. one where individuals are followed over several survey waves. There are several solutions.

  1. In the so-called data center, users can use drill-down menus to select relevant variables from each wave. If the user wants only recent waves, there exists a subsetting mechanism (e.g. only household heads younger than 55). As the required dataset gets larger, this becomes unhandy, as the interface gets slower and slower, and the clicking procedure is rather error prone. The main motivation for this package is that I've spent too many hours clicking on cryptic variable names only to realize after I was done that I had forgotten a variable. Unacceptable.
  2. User may download the data and attempt to merge the annual interview files in order to obtain the desired panel. Though conceptually not very difficult (there is an individual index file, which provides a link for individuals across years), it is a cumbersome accounting exercise to find the right variable names from each year and do the right merges. 
  3. One can use psidR. The main function is inspired by the Stata add on package psiduse. Here is the function's signature.
build.panel(datadir,fam.vars,ind.vars=NULL,fam.files=NULL,ind.file=NULL,heads.only=TRUE,core=TRUE,design="balanced",verbose=FALSE)
  • There is a default behaviour, where the user only points towards a data directory. otherwise one can specify custom locations for family files and individual index.
  • you can supply the PSID data in stata format or csv files
  • The user has to supply a data.frame "fam.vars" which lists the variable names for all required waves. 
  • it's possible to tell the function that a certain variable is missing in a given year (without the variable getting dropped, so you can impute it later on)
  • One can subset the data for household heads only
  • there is a switch to only get the core sample
  • There are 3 different sample designs to choose from: balanced panel (all individuals must be present in all waves), k-period panel (individuals must be at least k periods present) and unbalanced (all included)
  • with "verbose=TRUE" the function prints comments as you go along. 
An issue could be memory. The dataset is quite big. I use data.table to keep things manageable, but it's hard to get around a data.table of 628MB, which is the size of the individual file index. The verbose option prints memory load at various points, so you may be able to intervene and through out some things if you hit a limit.
7

View comments

  1. awesome. i've been working through the psid for a post for http://asdfree.com -- mind if i build on this work? please contact me, i can't find your e-mail. ajdamico@gmail.com

    ReplyDelete
  2. This is what I've been looking for. I tried to implement the build.panel function, but received the following error message:

    object '.SD' not found

    when the code tried to use the lapply command. Any thoughts?

    ReplyDelete
    Replies
    1. Hi Kris,

      don't now why this could happen. I use the .SD idiom only in three places and they all seem to be pretty standard. .SD is short for "sub datatable", thus it's part of the data.table package. Maybe updating data.table would help? To get a handle on this, can you post me your sessionInfo() to florian.oswald@gmail.com? Also, did you try out one of the examples I posted or did you put together your own? A lot depends on where files are stored and that all the paths are correctly specified, so maybe that's where the error comes from? Anyway, let me know.
      thanks!

      Delete
  3. this has been a lifesaver! the PSID is such a frustrating dataset, and the website is horrible! This has made the impossible possible for a lowly undergrad researcher.

    ReplyDelete
    Replies
    1. Hi Justin,
      good stuff, I'm glad you were successful. Seems that you are exactly the person I wrote this up for - I've been there myself. I'm releasing the next version these days with better examples and an option to directly download the data from the server into R (not using SAS or stata in the middle).

      Delete
    2. Excellent! I look forward to future releases.

      Delete
  4. Your Net Worth Will Be More Than Halved After This…

    Your Savings Will Be Worthless After China Does This

    China Will Destroy Your Net Worth

    Did you know that China is USA’s largest debt holder?

    This means that all they need to do start devaluing the US Dollar is by selling their debt holdings to their secondary market.

    This means the US Dollar value itself will crumble.

    Some financial analysts have stated that the USD will more than halve in value over the next few months.

    Are you and your family prepared?

    Make sure you don’t let your net worth and savings be rendered worthless.

    >>[Watch This Video To Learn How To Profit From This Downfall]<<

    However, it’s not all bad news.

    There’s a way that you can actually PROFIT from the fall of the US dollar and also any other economic collapse in the future.

    Just click on the video before and learn how you can join the 1% of the elite that makes money each time there is an economic downturn.

    >>[Watch This Video To Learn How To Profit From This Downfall]<<

    Speak soon.

    [Mr Mark Fidelman]


    ReplyDelete

UPDATE: Following up on a comment below, I used another data source, ONS JOBS02 for the labor market statistics. I report the findings below.

10

For loss of a better place, I'll store my recipe for homemade pizza dough here. This will make dough for 14 people.

I've got some questions regarding this issue, maybe someone out there has a clue.

The Austrian contingent was 300 out of 1000 soldiers

source: http://www.guardian.co.uk/world/2013/jun/06/israel-angry-austria-golan-heights.

Inspired by the Institute of Fiscal Studies' "Where do you fit in" application, where people can find out their position in the UK's income distribution, I wanted to find out how the picture in London looks like. Quite different.

3

I just pushed the most recent version of the PSID panel data builder introduced a little while ago. Got some user feedback and made some improvements. The package is hosted on github.

News:

I added a reproducible example using artificial data which you can run by calling 'example(build.panel)'.

3

I got intrigued by the numbers presented in this news article talking about the re-trial in the Amanda Knox case. The defendants, accused and initially convicted of murder, were acquitted in the appeal's instance when the judge ruled that the forensic evidence was insufficiently conclusive.

I just finished reading the extraordinary book tomorrow's table by P. Ronald and Raoul Adamchak. (I linked to Ronald's blog).

Economists frequently use public datasets. One frequently used dataset is the Panel Study of Income Dynamics, short PSID, maintained by the Institute of Social Research at the University of Michigan.

7

I was recently asked by a friend whether it's worth to buy a house in the UK. That is, assuming they could put down the money, whether it was worth buying as opposed to renting.

So Paul Krugman laments in his post that policy makers across Europe have blindly signed up to the "Austerity only" ticket. He cites some evidence which I find fairly convincing. I just want to raise the point that what he says cannot be used as a critique against the Monti government.

2

File under: getting data.

Oh the irony. Exactly one day after I start to read the great book on Open Government by O'Reilly media (which they released in tribute to Aaron Schwartz), I come in need of time series data on unemployment rates in the United States.

6

Update 2: Put up a second graph showing debt of Spain.

Update: I just read the most recent article of Stiglitz. I mostly agree. However, I'm not quite sure what he means when he says that

Spain and Ireland had fiscal surpluses and low debt/GDP ratios before the crisis.

4

I recently read something about the feminist movement, it's past, it's future, and so on. It kind of all started with a documentary about the way women are shown in Italian TV, which is quite amazing. It tells a sobering tale about the position of women in Italian society. Here's the link.

Here is my suggestion: why does Ryanair in it's confirmation email not have a subject line that is actually conveying some information about the flight concerned? Currently, the subject line of such a confirmation email is

Ryanair Travel Itinerary - Don't Forget You MUST Check-in Online and Print O

Some time ago I got inspired by a post on r-bloggers.com, showing the housing bubble in several US cities, nicely done with ggplot.

3

Code Readability is maybe the most important part of producing reproducible research. If it's impossible (i.e.

2

The Deferred Acceptance Algorithm (DAA) goes back to Gale and Shapley (1962). They introduce a rather simple algorithm that finds a stable matching for example for college admissions or in a marriage market.

This is according to the latest study by uswitch.com. 46% of Britons have thought about emigrating.

flo.

The latest census data are out, and they don't look very good. Median household income has fallen for the third year in a row. It stands now at levels of 1996, adjusted for inflation.

1

I just finished reading "the big short", a rather vivid account of the events surrounding the 2008 financial crisis, as told by Michael Lewis, who is a former employee of a big - deceased - Wall Street bank called Salomon Brothers. I'll have to say more about the book.

2

I've started to fork out the R-related entries to www.r-bloggers.com. It's a great place to learn R and see what others are up to. The blog has a very nice mixture of topics ranging from extremely simple (stuff that I would post) to extremely advanced, to the very vanguard of R development.

flo.

As a first step to produce some useable code for spline interpolation/approximation in R, I set out to first do polynomial interpolation to see how I get along. It's not that there is no spline interpolation software for R, but I find it a bit limited.

2

I have data on user access to a website. This log file (helpdesk log.csv) just contains the date of access, and how many accesses were counted. It would look like this:

Date hits

13-07-2011 2

14-07-2011 1

16-07-2011 3

17-07-2011 4

...

Have a look at this video, trying to explain how much a trillion dollars are. The sum has been in the media for some time now, as experts debate whether there should be spent more or less than that on bailing out marode banks. Interesting question, but not today.

5

have a look at this, nicely done by the NYTimes.

flo.

2

Following the last post, a couple of questions came up:

How could it be that bad quality mortgages in the US could have such an effect on the global economy? Is it because we are so tightly connected to the US economy or is it because something similar happened everywhere?

I would say it's a mixtu

Hi guys.

this week we have some "light entertainment". I found this little film about how all the strings (could) get together to fabricate what we call "the credit crunch". Needless to say that in a 10 minute animation one can hardly explain anything in great detail.

An interested reader asks:

And what about the government? Does it make sense to pump billions of dollars into the auto industry - as is happening in the US at the moment? In the end this is money that is taken away from taxpayers, because sooner or later they will have to pay it back - regardless i

I hope you are doing fine. The snow is gone and London is as grey/green as ever - everything back to normal.

As normal as it can get those days, that is. And apart from the weather, there are few things you would label "normal" around here.

this week has seen some spectacular action in the UK. We started off with heavy snowfalls in sunday night, covering London and the southeast of the country by as much as 20 cm of snow.

I hope this message finds you well.

Hi guys!

Look at what I see on my desktop today! July 20, 18 degrees is quite a shame, yes, but look at the next couple of days. Seems impossible, but it must be that somebody up there is finally feeling pity for us...thanks, Peter.

1

I am leaving for 10 days of vacation. Exams were horrible. Tell you more about it soon. Will get sun-burned first.flo.

2

If you are interested in prose, find a lot of it (and other stories) in Pablo's blog. He is based in Buenos Aires. This link goes to one of my favourite stories:

pablejacioflo.

I bet you knew that (and thank heavens, I like it as well, which makes survival here much easier). Some clever people took notice of this fact and subsequently turned it into an amazing marketing strategy. Below you can see just one example I came across recently.

links
About Me
About Me
Blog Archive
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.