Read on Twitter

12,399 views

@SfPRocur

, 58 tweets, 14 min read Read on Twitter

Today I'd like to talk about issues with respect to #openscience and specifically sharing code.

But first an intro to what open science is...

@daniellecrobins

@daniellecrobins

I am sure you all know a bit about #openscience already but essentially it's a very broad community/movement that aims to make science more accessible, transparent, inclusive, etc.

@daniellecrobins and @rchampieux have this great umbrella infographic!

https://twitter.com/daniellecrobins/status/1072936157493649408

#Openscience boils down to making science more free and open in the same general ways that the related open source and free software movements/communities pushed for change by making the outputs of science more accessible to both the general public and to other scientists.

In the last few years, #openscience notions and tools have really pushed sciences to self-reflect and change age-old ways of doing things at a pretty fast pace. Relatedly, social media has had a great, and sometimes not so great, catalytic effect on OS.

Even though #openscience can be argued to be decades old (even centuries if you look at Wikipedia en.wikipedia.org/wiki/Open_scie…), it's only recently managed to have mainstream uptake and make changes to how science is done.

#OpenScience has also coincided with a push in some (sub)fields, esp ones adjacent to my research like within psychology, for better research, which is known by the catchy name "replication crisis".

What happened with respect to replication is that back in the early 2010s a few different researchers, including, e.g., David Shanks at UCL where I am, noticed that certain studies in his case in social psych (even though he himself is not in that subfield) do not replicate.

What this means is that even though the published science says A and B happen what you run a certain experiment, when a lab uninvolved in the original study tried to run it they get different results or no results.

From a personal perspective, this was very interesting to me because I was literally involved in replication as it is actually very common practice in my sub-field (computational (cognitive and/or neuroscientific) modelling) to first reproduce other people's models from scratch.

I had tons of experience doing this and it was quite surprising to me that others in adjacent sub-fields didn't. It hadn't crossed my mind they don't/didn't explicitly do this, but I guess I had also come to expect some diffs as I had moved from compsci to cogsci/psych anyway.

@ReScienceEds

@ReScienceEds

Check out @ReScienceEds for a journal dedicated to replications of computational/cog/neuro models (more of what those are later in this thread!) because even though modellers do this a lot, often we don't publish these replications and just keep them as useful exercises.

Anyway back to crises and #openscience — both these communities/movements/groups of researchers (speaking especially from the perspective of my (sub)fields) emerged as self-reinforcing forces for reform in how to do science.

Openness (all meanings as touched on above) and replicability have become for better or worse intertwined. And both have at their core the idea that science can be done better, which of course is great.

As always of course with any complex system/issue like the enterprise of doing science there is nuance & this is what I want to touch on ultimately in this thread — bear with me! 😅

[Just need to video call a student to explain some cog modelling I did for their exp! BRB!]

[Please feel free to interrupt or ask me any questions at any time!]

Time to get back to this thread and link it up to what I wanted to get to — before I have to run off again for a talk! 😂

So on to as I mentioned in the OP, sharing code, and as I promised I would explain, computational modelling — all will be clear, hopefully!

Now all the intro stuff is out of the way... #openscience in part aims to help scientists make their data and code open.

@psychopy

@psychopy

Code is used for a variety of things in science. In neuroscience and psychology specifically it is used for three main types/steps:

1) coding up the experiment itself, the data collection is done using some kind of program, e.g., check out @psychopy;

2) analysing the data using inferential stats of some kind or another from the massive pipelines they use to analyse fMRI/neuroimaging data (which often need high performance computing clusters) to the much smaller/tractable analyses they do for behavioural datasets;

3) cognitive/neuroscientific computational/mathematical modelling.

This last bit of coding is the kind of work I mostly do, which has a large number of people often replicating other people's work as part of the learning process.

What is modelling? Modelling is basically creating code/maths that reflects the theoretical understanding or hunches we have for a complex system in order to further understand both the system and our formalism/model for the system. In my case the system is humans or the brain.

[Gah! Gotta run again, catch you all later — if I get too exhausted, it's getting close to end of the working day in London, this might have to roll-over into tomorrow. 😁]

OK, so back on track to get to the whole sharing code idea and why it's more nuanced than one might think — for 1 & 2 mentioned above it's more clear cut that it's almost always yes, for 3, modelling, it's a little more of a complex issue!

Why is sharing code for models not ALWAYS a good idea then? Isn't this idea contra to #openscience #reproducibility?

Well...

Even in the spirit of #openscience and #reproducibility (i.e., the spirit = an intention of improving science) sharing the code COULD in specific cases, if we are careless, actually impair understanding & undermine reproducibility.

But how? 😕

Let's start with how models (the ideas behind models) are evaluated and their implementations (the code behind models) are evaluated. Models are evaluated in a variety of ways but at the highest level what we do is look at how models capture some aspect of a complex system.

We need to inspect a computational model's built-in assumptions, i.e., we need to run it. That's the whole point of having a computational model, we can hit the run button and see the code/implementation "live out" all the assumptions and inputs we have given it.

When we run it, does it look like the world (as measured by specific metrics, each case/model is different of course)? If yes, great, we have a model which does something useful.

Furthermore, we can now understand our model. What parts of our model give rise to capturing the effects we want? This is the core of modelling: what does this model teach us about the system modelled? To answer this question we play around with the model in systematic ways.

And we do so to understand it. We do science, in other words, on the model. Just like we do experiments on the empirical world, we can do experiments on the model.

This is important for many reasons, primarily because to increase our understanding of our model we need to rule out implementation details being the cause of the model's success.

In other words, we want the theory/ideas behind the model to be what drives the model's behaviour and NOT any superficial details we never assumed were part of the model/theory.

In the same way in empirical non-modelling experimental work we do not want nuisance variables or confounds to be driving the effects we see, we don't something like a for-loop specific to our unique implementation of the model to be driving the useful results.

We want to be sure that when we describe our model using a formal or informal specification that what we are saying is sufficient for another person to be able to implement, i.e., replicate, our original code and thus results.

What does this have to do with releasing code for the model, you might ask? Surely releasing the code will help find bugs and issues with, e.g., confounding for-loops?

Yes and no, if one releases code as part and parcel of a modelling account, others might naively use the code, assume the model does what is claimed, but really there's something special in the code not reflected in the understanding.

The only way this can be rigorously checked is to rewrite the code from scratch of the model based on the specification, where specification can mean the original paper and/or a more formalised spec given in supplemental materials.

The reason this is so important is because models are only useful if they improve our understanding in some or another and if the help they provide in doing that is not clear, recall the confound for-loop driving the results, they aren't a good model.

So this is why (one of many reasons) many modellers tend to replicate others' models as a matter of course and why we (should) explore models very meticulously.

We replicate, literally write the code from scratch, models for pedagogical reasons (to learn how to do it) and to see what these models are doing — we need to see if the spec/paper is sufficient to replicate them.

We explore carefully and systematically models to see what they offer us in terms of broadening our understanding, like explaining and/or predicting how systems behave, like humans, etc.

If code released with a model is not understood to be essentially required to be ignored to check if a model replicates and if the spec embodies the model, then it is a dangerous situation for modelling.

It opens up the potential that code for a model can be used as if the implementation/code is literally the model. But that's not true. The implementation can diverge from the spec/model, contain very very high-level bugs, confounds, etc. This undermines science in general.

@ReScienceEds

@ReScienceEds

And this is why initiatives like @ReScienceEds which promotes replicating the code for published models and showcasing these attempts publically is important. It is part of the due scientific diligence of modellers to provide and evaluate spec for models by rewriting code.

A lot of modellers, like myself, have encountered so many cases where the results published of a model are not attainable. We write all our code for the model from scratch based on the published spec (the journal article) and yet we find that the results do not look the same.

Sometimes (I'll call this case A) the authors even send us their code and we dig in their code and discover the results are being driven by complex things that are not in the paper/spec!

Other times, and typically more often (case B), we never get hold of the original codebase AND the original spec/paper doesn't have enough information to replicate the results.

In case A, it's a relief in some ways as we can pinpoint what drives the results and why the model works. So we can take the implementation detail that drives the results and elevate it to the model/spec level. So it goes from an "unimportant" detail to an important model aspect.

In case B, it's just a nightmare as often we cannot find out what is driving the original published modelling results. So in some ways that's a bad state for science to be in, basically what looked like a step forward (a useful model) was just not.

So we cannot rely on the original codebase to claim that a model is reproducible or not, we have to rewrite the code. The original codebase is useful in cases where a spec isn't written but ideally authors/modellers should have one clearly stated in the publication...

because even with the original codebase often the effort required to fish out what drives the results can be prohibitive and impossible for another party to do.

The spec is needed most of all — the original code of course should be released too, but understood to be limited in terms of helping evaluate the modelling account.

By spec here I mean the materials released with the model, mainly the journal article. If the journal article is not well-written enough to be able to replicate (re-implemental) the model, that's a bit of a bad sign.

I hope this has shed some light on how some modellers, of course I cannot speak for all, evaluate their models and do modelling, and how #openscience and #reproducibility fit in to this picture.

Please feel free to ask any questions!

I'll finish this thread off by giving some extracts from some of my work on this, in case anybody wants to keep reading and/or if I've been unclear...

"We agree that models, like data, should be freely available according to the normal standards of science, but caution against confusing implementations with specifications." (Guest & Cooper, 2014)

doi.org/10.1016/j.cogs…

"In the best case, authors may provide their source code as a compressed archive and they may feel confident their research is reproducible. But this is not exactly true." (Rougier et al., 2017).

doi.org/10.7717/peerj-…

@NPRougier

@NPRougier

Here is an extract written by me and @NPRougier on the levels of analysis I mentioned for understanding/doing computational modelling work: theory, model, implementation.

More here: oliviaguest.com/doc/guest_roug…

@SusanLeemburg

@SusanLeemburg

PS: I dedicate this thread to @SusanLeemburg who said it would make a good thread. 🙃

Like this thread? Get email updates or save it to PDF!

Subscribe to Olivia tweets progress.

This content may be removed anytime!

Try unrolling a thread yourself!

Trending hashtags

Like this thread? Get email updates or save it to PDF!

Subscribe to Olivia tweets progress.

This content may be removed anytime!

Try unrolling a thread yourself!

Related hashtags

More from @SfPRocur see all

Related threads

Trending hashtags

Did Thread Reader help you today?