12,399 views

Kevin J.S. Zollman

@KevinZollman

, 25 tweets, 6 min read

My Authors

This was the last week for my network epistemology class (sad emoji). We focused on models of citation networks, the network formed by how papers cite other papers.

This is an enormous literature, so we focused on one question: why might citations fail to track quality?

This question is incredibly important because increasingly scientists, journals, and papers are evaluated by using various citation metrics. Hiring, firing, promotion, tenure, subscription, and search all depend on these metrics. If they don't track quality... things are bad.

@mollymking

@mollymking

We started with this paper by @mollymking, @CT_Bergstrom, Correll, @jenniferjacquet, and @jevinwest.

Through the analysis of a giant data set, they show that men cite themselves significantly more than do women. (And the gap isn't decreasing)

journals.sagepub.com/doi/full/10.11…

This connects to our theme because it's hard to see why this pattern of self-citation could be because of quality differences. So, it suggests that in at least one respect we should be cautious about comparing men and women's work using citation metrics.

The empirical analysis is really top-notch, but so too is some of the discussion.

One important point: this problem can't simply be solved by removing self-citations. Early citations generate more later citations, and so the problem compounds itself beyond just self-cites.

While the paper doesn't aim to test for the mechanism, they have a very good discussion of that as well. Some possible mechanisms leap to mind (maybe men are taught to be more self-promoting). But others are more subtle, like maybe men hyper-specialize more.

An important point brought out in the class is how people often respond to data like this: "Women should be encouraged to self-cite more."

This response contains a subtle type of sexism: it presumes that the male behavior is the normative ideal. And (surprise!) it might not be.

We spent some time in class talking about how one might even think about what the normative ideal of self-citation could be, and why we might think that self-citation could be good or bad.

There are both epistemic considerations and political ones, and those interact.

(There's much more to the paper than I'm discussing here, so I encourage you to go and read it.)

We then shifted gears a little to talk about models of citation networks. Many scholars have observed that citation networks broadly follow a power law that could be generated by "Preferential Attachment Networks."

In preferential attachment networks, you start with a small set of nodes that are connected to one another. Then you add a new node, and it connects to existing nodes. The new node "preferentially" chooses those existing nodes that have more connections. (pic: Y. Berset)

In this model, the rich get richer because nodes with more connections are more likely to get new connections. And this happens entirely at random, there is no notion of quality.

The point about citation networks is this: we *could* generate the observed patterns without reference to quality. So, we would need some additional reason to think that the pattern we observed had anything to do with quality.

@katycns

@katycns

In class we read this paper by @katycns, Maru, and @RobertGoldston5. They investigate various ways that the preferential attachment model does not fit the data, and present a more complicated model that includes searching for papers and coauthors.

pnas.org/content/101/su…

Importantly for the purposes of this class, however, they still don't include any notion of paper *quality.* Despite that, they are able to fit the data remarkably well. Like with the pref. attachment model, this gives us some reason to be skeptical that citations track quality

In class, we spent time discussing conclusions we should draw from this model. One possibility is that, since the model fits the data, one should conclude that paper quality is irrelevant. Another is that it's more a "proof of possibility:" quality *might* be irrelevant.

This is a broader question than just this model, but interesting nonetheless. What should we make of a model that generates a pattern without appealing to particular underlying features?

@RemcoHeesen

@RemcoHeesen

Finally, we turned to this paper by @RemcoHeesen. Heesen points to one incredibly important wrinkle: even if citations track *paper* quality a scholar's total citations may not track *that person's* quality.

link.springer.com/article/10.100…

Heesen develops a model that shows that even if citations track paper quality perfectly, it still might be the case that total citation counts misrepresent the quality of an individual scientist.

There are two important mechanisms.

Mechanism 1: Based on a plausible model of how citation work, a paper that is only slightly better may get many, many more citations than another. So doubling a citation count may be nothing like double the quality.

Mechanism 2: Scientists can get lucky. A paper might be better by luck rather than ability. And if there are a lot of mediocre scientists, the superstars (in citation count) may just be the lucky ones not the good one.

In science -- like in poker -- I'd rather be lucky than good.

Heesen doesn't claim his model is totally accurate (or anything). But it raises a very important proof of possibility, citation counts may be radically misleading unless we make additional assumptions about the distribution of quality of scientific work.

The class finished with a broader discussion of "what does scientific quality mean anyway?" And we tried to think of ways to judge whether citations track this amorphous idea of quality. (Like what's the relationship between citations and replication, for example?)

Overall, we concluded that one has good reason to be skeptical of using citation counts to track quality. That doesn't mean they don't, but that we should uncritically assume that they do.

Enjoying this thread?

Try unrolling a thread yourself!

Enjoying this thread?

Try unrolling a thread yourself!

More from @KevinZollman see all

Embed code for your website

Did Thread Reader help you today?