Economists frequently use public datasets. One frequently used dataset is the Panel Study of Income Dynamics, short PSID, maintained by the Institute of Social Research at the University of Michigan.

I'm introducing psidR, which is a small helper package for R here which makes constructing panels from the PSID a bit easier.

One potential difficulty with the PSID is to construct a longitudinal dataset, i.e. one where individuals are followed over several survey waves. There are several solutions.

  1. In the so-called data center, users can use drill-down menus to select relevant variables from each wave. If the user wants only recent waves, there exists a subsetting mechanism (e.g. only household heads younger than 55). As the required dataset gets larger, this becomes unhandy, as the interface gets slower and slower, and the clicking procedure is rather error prone. The main motivation for this package is that I've spent too many hours clicking on cryptic variable names only to realize after I was done that I had forgotten a variable. Unacceptable.
  2. User may download the data and attempt to merge the annual interview files in order to obtain the desired panel. Though conceptually not very difficult (there is an individual index file, which provides a link for individuals across years), it is a cumbersome accounting exercise to find the right variable names from each year and do the right merges. 
  3. One can use psidR. The main function is inspired by the Stata add on package psiduse. Here is the function's signature.
build.panel(datadir,fam.vars,ind.vars=NULL,fam.files=NULL,ind.file=NULL,heads.only=TRUE,core=TRUE,design="balanced",verbose=FALSE)
  • There is a default behaviour, where the user only points towards a data directory. otherwise one can specify custom locations for family files and individual index.
  • you can supply the PSID data in stata format or csv files
  • The user has to supply a data.frame "fam.vars" which lists the variable names for all required waves. 
  • it's possible to tell the function that a certain variable is missing in a given year (without the variable getting dropped, so you can impute it later on)
  • One can subset the data for household heads only
  • there is a switch to only get the core sample
  • There are 3 different sample designs to choose from: balanced panel (all individuals must be present in all waves), k-period panel (individuals must be at least k periods present) and unbalanced (all included)
  • with "verbose=TRUE" the function prints comments as you go along. 
An issue could be memory. The dataset is quite big. I use data.table to keep things manageable, but it's hard to get around a data.table of 628MB, which is the size of the individual file index. The verbose option prints memory load at various points, so you may be able to intervene and through out some things if you hit a limit.
7

View comments

UPDATE: Following up on a comment below, I used another data source, ONS JOBS02 for the labor market statistics. I report the findings below.

I read a series of articles related to the goings of the UK housing market, the likely effects of the new Help To Buy scheme, the 10% increase in mean London house price over the last year, and employment statistics. I failed to reproduce some numbers cited in the economist (below). This post talks about this.

It all starts with this blog post on the economist:

http://www.economist.com/blogs/buttonwood/2013/09/house-prices

It talks about many things, amongst which employment and housing completions, and how the UK seems likely to be embarking on another round of debt-fueled growth.
10

For loss of a better place, I'll store my recipe for homemade pizza dough here. This will make dough for 14 people.

2kg strong white flour (no self-raising or other extras) 4 sachets of yeast, 7g each (not the super fast bicarbonate stuff) Salt Sugar Olive Oil Water The main problem is to get the right consistency, i.e. how much water to add. You'll have to do some experiments here.

I've got some questions regarding this issue, maybe someone out there has a clue.

The Austrian contingent was 300 out of 1000 soldiers

source: http://www.guardian.co.uk/world/2013/jun/06/israel-angry-austria-golan-heights.

Inspired by the Institute of Fiscal Studies' "Where do you fit in" application, where people can find out their position in the UK's income distribution, I wanted to find out how the picture in London looks like. Quite different. If you are in a very high percentile nationwide, high incomes of mainly financial sector employees in London make sure that you find yourself a couple of ranks further down.
3

I just pushed the most recent version of the PSID panel data builder introduced a little while ago. Got some user feedback and made some improvements. The package is hosted on github.

News:

I added a reproducible example using artificial data which you can run by calling 'example(build.panel)'. This means you can try out the package before bothering to download anything and it provides a simple test of the main function.
3

I got intrigued by the numbers presented in this news article talking about the re-trial in the Amanda Knox case. The defendants, accused and initially convicted of murder, were acquitted in the appeal's instance when the judge ruled that the forensic evidence was insufficiently conclusive. The appeals judge ignored the forensic scientist's advice to retest a DNA sample, because

"The sum of the two results, both unreliableā€¦ cannot give a reliable result," he wrote.

I just finished reading the extraordinary book tomorrow's table by P. Ronald and Raoul Adamchak. (I linked to Ronald's blog). In this post I wanted to quickly redo a calculation Adamchak does on page 16, where he explains to his students how much energy is required to produce the fertilizer used to grow one acre of corn using conventional agriculture (as opposed to organic methods).

Economists frequently use public datasets. One frequently used dataset is the Panel Study of Income Dynamics, short PSID, maintained by the Institute of Social Research at the University of Michigan.

I'm introducing psidR, which is a small helper package for R here which makes constructing panels from the PSID a bit easier.

One potential difficulty with the PSID is to construct a longitudinal dataset, i.e. one where individuals are followed over several survey waves.
7

I was recently asked by a friend whether it's worth to buy a house in the UK. That is, assuming they could put down the money, whether it was worth buying as opposed to renting. Apart from obvious things like the expected length of stay in one place, the interest on mortgages and how prices might develop and so forth, they were interested in particular in the amount of transaction costs they were likely to face: fees, taxes and so forth.

So Paul Krugman laments in his post that policy makers across Europe have blindly signed up to the "Austerity only" ticket. He cites some evidence which I find fairly convincing. I just want to raise the point that what he says cannot be used as a critique against the Monti government.

Basically what he's saying is that Monti was installed as a puppet of European creditor nations to make sure that austerity would be imposed and the country's government debt would be continued to be serviced.
2
links
About Me
About Me
Blog Archive
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.