Matt Salganik (@msalganik) and I are looking for a joint postdoc at Princeton to explore the fundamental limits of machine learning for prediction. We welcome quantitatively minded candidates from many fields including computer science and social science. [Thread]
This is an unusual position. Here's how it came to be. Last year I gave a talk on AI snake oil. Meanwhile Matt led a mass collaboration that showed the limits of machine learning for predicting kids’ life outcomes. Paper in PNAS: pnas.org/content/117/15…
We realized we were coming at the same fundamental question from different angles: given enough data and powerful algorithms, is everything predictable? So we teamed up and taught a course on limits to prediction. We're excited to share the course pre-read cs.princeton.edu/~arvindn/teach…
During the course, we read papers on prediction from a dozen domains ranging from computer vision to civil wars. We developed a basic understanding of the limits to prediction in different domains and for different problems. We hope to share a synthesis soon.
But there’s much more that we can’t answer yet. Join us and help us figure it out! Our Center at Princeton is also hiring postdocs to work on many other tech policy topics. We’ll start reviewing applications soon. Apply here: citp.princeton.edu/programs/fello…
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Many online education platforms track and profit from student data, but universities are able to use their power to negotiate contracts with vendors to get much better privacy. That’s one of the findings in our new paper “Virtual Classrooms and Real Harms” arxiv.org/abs/2012.05867
We analyzed 23 popular tools used for online learning—their code, their privacy policies, and 50 “Data Protection Addenda” that they negotiated with universities. We studied 129 (!) U.S. state privacy laws that impact ed tech. We also surveyed 105 educators and 10 administrators.
A major reason for poor privacy by default is that the regulations around traditional educational records aren’t well suited to the ‘data exhaust’ of online communication, echoing arguments by @elanazeide & @HNissenbaum here: papers.ssrn.com/sol3/papers.cf…
Job alert: At Princeton we’re hiring emerging scholars who have Bachelor’s degrees for 2-year positions in tech policy. The program combines classes, 1-on-1 mentoring, and work experience with real-world impact. Apply by Jan 10. More details: citp.princeton.edu/programs/citp-…
[Thread]
This is a brand new program. Emerging scholars are recruited as research specialists: staff, not students. This comes with a salary and full benefits. We see it as a stepping stone to different career paths: a PhD, government, nonprofits, or the private sector.
Who are we? At Princeton’s Center for Information Technology Policy (@PrincetonCITP), our goal is to understand and improve the relationship between technology and society. Our work combines expertise in technology, law, social sciences, and humanities. citp.princeton.edu
One of the most ironic predictions made about research is from mathematician G.H. Hardy’s famous "Apology", written in 1940. He defends pure mathematics (which he called real mathematics) on the grounds that even if it can't be used for good, at least it can't be used for harm.
Number theory later turned out to be a key ingredient of modern cryptography, and relativity is necessary for GPS to work properly. Cryptography and GPS both have commercial applications and not just military ones, which I suspect Hardy would have found even more detestable.
Hardy’s examples weren’t merely unfortunate in retrospect. I think they undercut the core of his argument, which is a call to retreat to the realm of the mind, concerned only with the beauty of knowledge, freed from having to think about the real-world implications of one’s work.
When I was a student I thought professors are people who know lots of stuff. Then they went and made me a professor. After getting over my terror of not knowing stuff, I realized I had it all wrong. Here are a bunch of things that are far more important than how much you know.
- Knowing what you know and what you don’t know.
- Being good at teaching what you know.
- Being comfortable with saying you don’t know.
- Admitting when you realize you got something wrong.
- Effectively communicating uncertainty when necessary.
- Spotting BS.
- Recognizing others with expertise.
- Recognizing that there are different domains of expertise.
- Recognizing that there are different kinds of expertise including lived experience.
- Drawing from others’ expertise without deferring to authority.
Many face recognition datasets have been taken down due to ethical concerns. In ongoing research, we found that this doesn't achieve much. For example, the DukeMTMC dataset of videos was used in 135 papers published *after* it was taken down in June 2019. freedom-to-tinker.com/2020/10/21/fac…
A major challenge comes from derived datasets. In particular, the DukeMTMC-ReID dataset is a popular dataset used for person re-identification and continues to be free for anyone to download. 116 of 135 papers that use DukeMTMC after its takedown actually use a derived dataset.
This is a widespread problem. MS-Celeb was removed due to criticism but lives on through MS1M-IBUG, MS1M-ArcFace, MS1M-RetinaFace… all still public. The original dataset is also available via Academic Torrents. One popular dataset, LFW, has spawned at least 14 derivatives.
At Princeton CITP, we were concerned by media reports that political candidates use psychological tricks in their emails to get supporters to donate. So we collected 250,000 emails from 3,000 senders from the 2020 U.S. election cycle. Here’s what we found. electionemails2020.org
Let me back up: this is a study by @aruneshmathur, Angelina Wang, @c_schwemmer, Maia Hamin, @b_m_stewart, and me. We started last year by buying a list of all candidates running for federal and state elections in the U.S. We also acquired lists of PACs and other orgs.
Next, the key bit for data collection: we created a bot that was able to find these candidates’ websites through search engines, look for email sign up forms, fill them in, and collect the emails in a giant inbox. We verified manually that each step works pretty accurately.