Ok this thread is for all you Stata junkies out there that use reshape regularly.
Sometimes I have to do “unbalanced” reshapes, which is a task that is super inefficient using the standard reshape command.
So here's a hack you might find useful!
1/
I often reshape panels from wide to long that have many missings.
E.g. I have a panel of scientists with a wide list of pubmed identifiers for their papers. (In this case, these came from space-delimited strings that I split into a wide set of variables with the stub “pmids") 2/
I call this panel “unbalanced” because the distribution of publications for the sample of scientists is very skewed. 37% of them have only one published paper, but 0.15% have more than 200, a handful have more than 1000 papers. Mean is 7, median is 2. 3/
Economists have long characterized the reward system for innovation (patents, academic papers, etc.) as winner-take-all races. This extreme allocation of credit affects how we think about R&D investment, innovation strategy, and the pace and direction of science.