Ben Schmidt / @benmschmidt@sigmoid.social Profile picture
VP of Information Design @nomic_ai, building new ways to interpret and shape embedding models. Onetime history/digital humanities prof. @bschmidt.bsky.social
Aug 11, 2023 4 tweets 2 min read
Public administration, English, languages, mathematics… one of the things emerging about the programs where WVU is firing faculty/cutting programs is that they have no facilities/are cheap to run the faculty themselves represent the entire infrastructure. Eating seed corn. This stunted idea of the university--that the core asset is buildings, not people--is the *same idea* that led Gordon Gee to create this mess over the last decade, as he loaded on millions in debt to increase the WVU's footprint. (h/t @SCrichlow) https://t.co/YApBnMWa0sfacultysenate.wvu.edu/files/d/9737a8…
Image
Apr 13, 2023 8 tweets 6 min read
Read and explore this rich interactive of 20 *million* research articles from PubMed, a project we're releasing today with @ritagonmar and @hippopedoid. It's a *beautiful* embedding structure, a fascinating, complete corpus. Some highlights (thread) static.nomic.ai/pubmed.html @ritagonmar @hippopedoid 1. This presents some really interesting ways to take an *entire* digital library and make it searchable and browsable. Showing ten search results at a time in a browser is *not* the right way to see what's in a corpus: here you can start from global structure first and zoom in.
Apr 6, 2023 16 tweets 5 min read
Big day for the Web: Chrome just shipped WebGPU without flags. Someone on @nomic_ai's GPT4All discord asked me to ELI5 what this means, so I'm going to cross-post it here—it's more important than you'd think for both visualization and ML people. (thread)
developer.chrome.com/blog/webg So: GPUs are processors on basically every computer/phone. Individually they're weaker than CPUs, but they run in packs of little ones that run in parallel. The G is for 'graphics,' but it's turned out they're good for anything involving lots of math.
Mar 14, 2023 9 tweets 5 min read
I think we can call it shut on 'Open' AI: the 98 page paper introducing GPT-4 proudly declares that they're disclosing *nothing* about the contents of their training set. This report focuses on the ... Why should you care? Every piece of academic work on ML datasets has found consistent and problematic ways that training data conditions what the models outputs. (@safiyanoble, @merbroussard, @emilymbender, etc.) Indeed, that's the whole point! That's what training data is!
Nov 2, 2022 8 tweets 3 min read
We've just released from @nomic_ai a new map for exploring 6+ million AI-generated images and the prompts used to created them, collected by @krea_ai. atlas.nomic.ai/map/809ef16a-5… The most exciting thing for me here is how this changes full-text search: here's why (thread). In my life as a researcher, search engines have worse and worse on some axes. UX studies targeted at users push institutions towards single search-box interfaces with ordered lists of results show barely anything: NYU's search engine only shows 3 results for 'maps'!
Sep 6, 2022 7 tweets 3 min read
There's a zombie idea that humanities majors somehow remain 'only for elites' as they fall nationwide. It's not true. Here are four majors at Yale over the last 35 years. Yale history made great hullabaloo a few years ago about reclaiming the 'largest major' title. But look: Image I hate to talk about this because it contributes to the insane conflation of "the Ivies" with "higher education" that the NYT lives on. But inside the historical profession I *still* periodically hear people trying to draw lessons from the Yale comeback. historians.org/publications-a…
Aug 30, 2022 5 tweets 3 min read
@AaronRHanlon @Ted_Underwood But I see this as the core of my disagreement. I see a lot of people trying to yoke the humanities to the non-applied sciences as shared practitioners of 'pure research,' and that's what I see I see as 'liberal arts-ism'. The belief that humanities vs STEM is incorrect, or @AaronRHanlon @Ted_Underwood that there's a new ordering of things that could bring others around... I see these not as new ideas but as doomed attempts to bring about the 1990s/200s, when the humanities were ok. (I can't access your old Chronicle article which I think is where you spelled your thoughts out)
Aug 23, 2022 10 tweets 4 min read
@ipeds_nces just released new data on degree completions for the 2021 class (the first class with a full semester during the pandemic.) History and Religion have both joined English in being down to half their 2000s peak; philosophy's rebound persists, while area studies falls. Image Here's the raw size of all the fields (just BAs). The downtick in cultural, ethnic, and gender studies is notable--those had been the only fields *not* to get pulled down by the collapse of humanities majors. Also sharper-than normal drops in English, Comp Lit, languages... Image
Jun 1, 2022 11 tweets 3 min read
In 1980 the median age of history authors published by Yale University press was 40: by 2013 it was nearly 60. It's striking how *different* the age profiles are across different presses. Here's the same chart for *all* university press books.
Jun 1, 2022 5 tweets 3 min read
Guess what this is a chart of. @TheHigherFriar nails the color dimension, (actual codes in image). And % paying a mortgage is a good guess because unlike--say--homeownership it captures the falloff. But it's not real estate.
Nov 2, 2021 21 tweets 7 min read
I shot from the hip against an article claiming to trace a worldwide outbreak of maladaptive thinking: @spiantado, @kmahowald and I now have a letter in PNAS with more details about why you just *can't* pretend Google ngrams is a static corpus. (thread) I suspected the changes in terms they found came from more fiction in the corpus. Although Google has no metadata, we came up with a neat way to test that--using the relative predominance in the Google Fiction corpus as marker of a word's fictionality. /+ pnas.org/content/118/45…
Sep 28, 2021 14 tweets 5 min read
For historians who want to better understand their job market, here's an interactive that directly embeds every ad posted since 2002 from the AHA's own jobs board. observablehq.com/@bmschmidt/aha… You can slice down to see--e.g.--how many jobs list any country, like "Japan". This data is also good for things like tracking how Assistant professor jobs, specifically, have collapsed more than open-rank and tenured searches:
Aug 26, 2021 6 tweets 3 min read
Making dot-density population maps is making me appreciate more just how dense urban areas get. This here looks like a vaguely reasonable map of population distribution: Wyoming is empty, the Texas metros are obvious, the northeast is crowded. (1/) But that also makes Atlanta and Minneapolis look about the same size as NYC, because both are saturated. Jittering the points far enough that NYC ones *aren't* over-saturated makes metros like Portland, Kansas City, even Phoenix nearly vanish. (2/)
Aug 23, 2021 4 tweets 1 min read
I'm interested in the significant number people reporting their race as "other" in Orthodox areas of NYC. (Williamsburg; Borough Park). Is this a described phenomenon? A changing one? Image Here's Kiryas Joel, the Hasidic enclave in Orange county--half white, half 'other.' Image
Jul 26, 2021 15 tweets 6 min read
This PNAS article claiming to find a world-wide outbreak of depression since 2000 is shockingly bad. The authors don't bother to understand the 2019 Google Books "corpus" a tiny bit; everything they find is explained by Google ingesting different books.

pnas.org/content/118/30… The background here is that in February 2020, Google generated a new version of their corpus for the ngrams browser. The 2012 and 2009 corpora were accompanied by academic articles, but the 2020 one seems to have been basically undocumented. (I welcome correction here?).
Mar 8, 2021 16 tweets 6 min read
Couldn't find a decent blog post about why it's not so bad (really!) that Javascript is poised to displace R and Python (really!) as the indispensable language of data programming, touching on WebGPU/WebGL, @observablehq, and @ApacheArrow. So I wrote one. benschmidt.org/post/2020-01-1… I know that nowadays this sort of thing only exists as a Medium post or a Twitter threads, so I'll hit a couple highlights here:

Computers have gotten much, much faster in the last couple decades, but our languages for data analysis have failed to keep up. (+)
Sep 18, 2020 4 tweets 3 min read
I've got a keen interest in this particular discussion, but I'm finding it hard to catch up--can you all help me collect some of the more substantive takes you've seen from historians and economists here? Or give me your own? I'm not making a snarky one-liner so the Twitter algo is going to bury this question. Forgive me, @snaidunl @jamesfeigenbaum @A_NeedhamNYU @historying @abbymullen @rebeccawingo @danbouk, for tagging you all to see if you know anyone who's given/should give an interesting take.
Sep 2, 2020 4 tweets 2 min read
Every year, I run the @EdNCES IPEDS numbers to see how US humanities majors are changing in the US; here's a quick rundown for 2019. Blog post here: benschmidt.org/post/2020-08-2…. Most fields are falling, but philosophy the first uptick for any large major since 2009. These numbers represent the second-to-last year of the Great Recession fallout, and trends are gonna be different in the post-COVID era. It's now clear what the full story of the 2010s was: universities expanded STEMM education at the expense of all other learning.
Sep 1, 2020 4 tweets 2 min read
Unsurprisingly, this is shaping up as the worst year ever on the academic history job market; less than half as many TT jobs listed through August 31 than even in 2009, and a quarter what there were last year. As @rbthisted says, the 2009 crunch may have been dampened by the rising history numbers through 2008: that's not the case this time around.
Dec 13, 2019 22 tweets 28 min read
Here's a great work of digital history in the new issue of the AHR: "Networks and Opportunities: A Digital History of Ireland’s Great Famine Refugees in NYC," by @TylerAnbinder, Cormac Ó Gráda, and Simone Wegge. Instantly adding to spring syllabi. academic.oup.com/ahr/article/12… @TylerAnbinder This dovetails perfectly with a conversation about quantification and digital history that I've been having with @Ted_Underwood, @jtheibault, @Zoe_LeBlanc, et al about digital and quantitative history, so let me draft a blog post as a thread here. (+)
Jul 27, 2018 5 tweets 3 min read
I recant my 2013 view that there's no humanities crisis in the US. The last five years of degree data have been brutal, and all humanists need to worry about how to deal with the attendant changes to our disciplines.

Post: sappingattention.blogspot.com/2018/07/mea-cu… I'm somewhat surprised to see that there's no apparent drop in humanities enrollments at HBCUs.