Nick HK Profile picture
Econ prof @SeattleU. Book The Effect https://t.co/3F41M6EzEb out now! Check my pinned thread for all my projects. Substack https://t.co/YYrIUc8JPC
7 subscribers
Dec 17 12 tweets 3 min read
New working paper out today with Eleanor Murray, "Do LLMs Act as Repositories of Causal Knowledge?"

Can LLMs (like ChatGPT) build for us the causal models we need to identify an effect? There are reasons to expect they could. But can they? Well, not really, no. Paper here:

Why might we expect LLMs could help with this task? At first blush you might expect this to require LLMs to have a real-world causal understanding of how the world works.arxiv.org/html/2412.1063…
Mar 30, 2023 9 tweets 2 min read
Various thoughts on LLMs:

1. Exam performance is super interesting, but I think is misinterpreted. Exams are almost pure signal. They're not the actual thing you want to do, they're designed to be something that a *human* could only do if they could actually do the actual thing. i.e. on a good exam, cost of performing well on the exam should drop sharply as your skill on the actual important thing improves.

Example: your ability to answer "what are the proper safety checks to run before letting a plane fly?" is easier for a good plane mechanic...
Oct 10, 2022 6 tweets 2 min read
If you teach students to work with data, you're doing them a great disservice if you just teach them how to run models/analyses and not how to clean and manipulate data. If you haven't tested them directly on this, you'll be surprised how unintuitive this is to new users! In my data viz class, the week 3 assignment has them:

1. Read some data docs
2. Recreate some variables based on the docs ("count the number of peers of each gender in each class, not counting someone as their own peer")
3. Make some tables of the form "average X by group"
Oct 10, 2022 5 tweets 1 min read
I have recently been following two sources of 90s music reevaluation: (1) The Number Ones and (2) the Woodstock 99 documentary and the things these, respectively, have most noticeably revised upwards my opinions of:

1. Mariah Carey
2. The Limp Bizkit song "Break Stuff" i think i'm previously on record on twitter as rejecting any limp bizkit reevaluation, and i largely stand by that, but break stuff is an exception in the catalogue and dang did you see that crowd
Oct 10, 2022 4 tweets 1 min read
The steel nerve of teaching behavioral economics and saying mid-lecture "huh I don't remember the sample size on this study let me check real quick" It was in the 40s
Aug 30, 2022 12 tweets 3 min read
Announcement and invitation! A new project that aims to improve the quality of research in applied microeconomics by examining researcher choices. I am hoping to recruit up to *200 researchers* of all kinds (with pay) and hope you will join me! (Thread) nickch-k.github.io/ManyEconomists/ This project, with Claus Portner, is a follow-up to this paper onlinelibrary.wiley.com/doi/full/10.11…, where multiple researchers each replicated the same studies (a “many-analyst study”). Analytic and data-cleaning choices were different, and this really impacted results.
Aug 11, 2022 7 tweets 2 min read
scales, which formats numbers for presentation, is such an undersung package. I use it all the time. If you make anything in R that is intended for an audience to see - graphs, tables, RMarkdown/Quarto, it's perfect. Neat scales functions, among others:
number() and comma(): format regular ol' numbers
comma(100000) becomes 100,000
number(1.324, accuracy = .1) becomes 1.3 (in a way that's much more reliable for this purpose than round())
number(1000, scale = 1/1000, suffix = 'k') becomes 1k
Jun 15, 2022 16 tweets 5 min read
I've updated my Data Wrangling in the Tidyverse course material and am uploading a 17-part video series. This assumes little previous R knowledge (although some). Covers tidying, manipulating variables, cleaning factors, dates, and strings. Enjoy! Episode 2: What is data wrangling actually about? What are we trying to do with it? Forget specific languages or codes. How should we be *thinking* about data wrangling and how to do it right?
Jun 14, 2022 10 tweets 2 min read
I have a new paper out today in the Journal of Economic Methodology, on Judea Pearl's Structural Causal Modeling approach. I ask what the *marginal* (not absolute!) value of SCM is for economists, given we're already familiar with potential outcomes tandfonline.com/doi/abs/10.108… SCM mixes together a few things:
1. A graphical representation of causal relationships in causal diagrams
2. A set of structural equations describing those relationships more fully
3. A do-calculus that lets you derive causal estimands
Mar 22, 2022 4 tweets 1 min read
In regression, there are several things that students are eternally concerned about but are actually Just Fine:

1. Your coefficients don't need to be significant
2. Your R^2 doesn't need to be huge
3. Your predictors can be correlated
4. Your variables don't need to be normal I continue not to know where students learned these things, if anywhere. Every one of them I say ahead of time that they're not a problem (with 1 and 2, repeatedly), and yet there are always still students giving themselves a hard time trying to "fix" them
Jan 24, 2022 60 tweets 20 min read
Today I'm releasing the first video designed to accompany my book The Effect, about research design and causal inference. There are ~70 videos planned for the series. These can be used to accompany the book, as classroom material, or just on their own.

The first video follows Chapter 1 on why we want to HAVE a research design, and how it can be that every car insurance company is cheaper than every other car insurance company. I'll keep posting videos in this thread as they come out. Find the book here: theeffectbook.net
Sep 13, 2021 8 tweets 1 min read
Neat LaTeX packages and things I learned about while trying to do the formatting things an actual book editor wanted in obstinate tufte-LaTeX:

Avoid a page break between the first/last line of a para and the rest: \usepackage[all]{nowidow} Avoid a line break before the last word of a~paragraph.
Jun 10, 2021 9 tweets 3 min read
I have a new short working paper: Linear Rescaling to Accurately Interpret Logarithms.

Do you use logs? Do you interpret increases in ln(X) in terms of percentage changes in X?

Did you know there are a lot of problems in the way we're told to do that? arxiv.org/abs/2106.03070 The natural logarithm trans... "A p increase in ln(X) is a p*100% increase in X" is an approximation, not a literally true. And it has problems:

1. Approx quality drops quickly in p
2. People don't realize (1)
3. We must treat it unlike all other regression variables (nonunit scales, interpret outside table)
Apr 14, 2021 6 tweets 2 min read
It's fully-featured and ready to go! The did Stata package allows Stata access to the Callaway and @pedrohcgs R package "did" for estimating difference-in-differences with staggered treatment and covariates. Check it out, and installation instructions, at github.com/NickCH-K/did In this package you can: (1) use the Callaway and Sant'Anna (2020) estimator to get group-time treatment effects, (2) aggregate those effects, (3) plot the results, and (4) do a test of the conditional parallel trends assumption doi.org/10.1016/j.jeco…
Mar 22, 2021 15 tweets 6 min read
I have a new paper out in Economic Inquiry with a buncha coauthors: The Influence of Hidden Researcher Decisions in Applied Microeconomics. What's it about?... onlinelibrary.wiley.com/doi/full/10.11… Anyone who has ever done research knows that a whole bunch of decisions go into the process. More than you could ever write up. For many of those decisions, even ignoring the possibility of bad decisions or errors, there's often more than one right way to do it.
Dec 28, 2020 5 tweets 1 min read
I have a new paper in the Journal of Causal Inference about IVs!

You can greatly improve the finite-sample performance of your IV by simply modeling first-stage effect heterogeneity. Super easy! Even if you do a bad job, it helps! Big error reductions.

degruyter.com/view/journals/… The estimator for doing so is super simple: literally just identify groups along effect heterogeneity lines, and interact that group identifier with the instrument. In the paper I do it "naively" and with causal forests, but it's a flexible concept.
Sep 24, 2020 40 tweets 7 min read
I don't post about my personal life a lot but today we finally had our court date to finalize the adoption. I've found that academics, who make up a lot of my followers, are above-average curious about adoption. So here's a little about the process, and how it went. This is specifically about adopting an unborn child and becoming the parents upon birth. I don't know anything about adoption from foster care, international adoption (which I understand is way way harder to do these days anyway), or adopting older kids
Sep 14, 2020 6 tweets 2 min read
Finished the first volume of my accessible causal inference textbook (even my mom got it, and this is *not her thing*). Please check it out, and if you have read it even a little I have some questions for you below...

Assuming the price were reasonable (or free), would you do any of: read the completed book, recommend someone else read it, or assign this book in a classroom?

(be honest, the "No"s are valuable too)
Aug 24, 2020 4 tweets 1 min read
Hell yeah, my Safeway started carrying fresh anchovies. I don't think I've ever seen them outside a restaurant, and I've looked in a number of seafood markets if you don't like canned anchovies i get it, but if you've never tried fresh ones get on that ASAP
Aug 23, 2020 4 tweets 1 min read
I think game theory is immensely cool and fun. Taking it undergrad made me an econ major. Got first-in-class marks on it in grad school. Love teaching it in principles, Would gladly teach a whole UG or G class of it. I've also never once had a good idea for a game theory paper. it is literally the only field i am interested in where my interest completely stops cold long before the current research frontier
Aug 16, 2020 5 tweets 1 min read
Finishing my econometrics prep. Gonna make a hard sell for my vtable package in your class if you're using R.

- Easily explore variable characteristics and values with vtable
- Super easy summary tables with sumtable
- Balance tables with sumtable

nickch-k.github.io/vtable/index.h… If you're making a switch from Stata it can really ease the transition too with some stuff R lacks:

- vtable takes the place of the Variable Browser (incl. find-in-page if you open in browser)
- summarize -> sumtable, including super easy use and both numeric/factor support