Discover and read the best of Twitter Threads about #reproducibility

Most recents (19)

I recently implemented some pairs trading strategies for a paper, and decided to share an implementation of the Gatev, Goetzmann & Rouwenhorst (2006) strategy on a short article on RPubs.…
#rstats #RPubs #DataScience #finance #pairstrading #reproducibility
In the RPubs post above, I provide the #R code to backtest the strategy, as well as some results replicating Gatev, Goetzmann & Rouwenhorst (2006) and Do and Faff (2010), and extending the sample to the end of 2020. In this thread, I show some of these results.
Pairs trading is a type of systematic trading strategy based on finding pairs of stocks or assets that have historically "moved together", and betting that divergences will eventually get corrected. It is a simple form of statistical arbitrage.
Read 23 tweets
Executable papers on CodaLab Worksheets are now linked from pages thanks to a collaboration with @paperswithcode! For example:…
By transitivity, the links are also available from @arxiv:…
Executable papers contain not just the code and data, but also the experiments that produced the results of a paper. Releasing code is great, but CodaLab goes one step further for full #reproducibility, providing the full certifiable provenance of an empirical result.
Read 5 tweets
I am very happy that @MKrzywinski and Naomi Altman gave us the opportunity to contribute an article on the #standardizationfallacy to their legendary #pointsofsignificance column in @naturemethods…
It all started 20 (actually 21) years ago with this 👇correspondence on behaviour and the standardization fallacy in @NatureGenet…
It was my response to this 👇famous paper in @ScienceMagazine by John Crabbe and colleagues showing that behavioural phenotypes of mouse mutants may turn out to be lab specific…
Read 16 tweets
Unpopular opinion. In Computer Science, we should start organizing a new kind of conferences where #reproducibility and #openscience are first-class citizens. You submit claims (paper) together with the evidence (data, tools,..). The PC can query the evidence via an AEC. 1/2
(PC = Program Committee, AEC = Artifact Evaluation Committee).

* Yes, as author you will get less papers accepted, but the quality of accepted papers would be supreme.
* Yes, not every research facilitates the submission of "evidence", but you can always submit elsewhere.
Read 3 tweets
Hi everyone! I'm Louise Bowler, a Research Data Scientist from @turinginst's Research Engineering Group @turinghut23. I'm borrowing the account for the day to show you all a day in the life of a Research Data Scientist! 👩‍💻
I stumbled across this job whilst I was writing up my PhD and immediately went “Yes, that’s the job I want but didn’t know existed!” I wanted to stay close to research but not be tied to a single field, so the projects here are a great fit
I’ve always enjoyed switching fields – I started out as an undergraduate @ImperialPhysics, then explored the biological and medical sciences during the first year of my PhD @dtc_oxford 🌟🪐👩‍🔬🖥️🧬🧫💊
Read 16 tweets
In a new preprint, @sinabooeshaghi et al. present deep SMART-Seq, @10xGenomics and MERFISH #scRNAseq (37,925,526,323 reads, 344,256 cells) from the mouse primary motor cortex, demonstrating the benefits of cross-platform isoform-level analysis.… 1/15 Image
We produce an isoform atlas and identify isoform markers for classes, subclasses and clusters of cells across all layers of the primary motor cortex. 2/15 ImageImage
Isoform-level results are facilitated by kallisto isoform-level quantification of the SMART-seq data. We show that such EM-based isoform quantification is essential not just for isoform but for gene-level results. #methodsmatter 3/15 Image
Read 15 tweets
Your MDS Curator this week, @TiffanyTimbers, here.

This morning I would like to share with you some of the most influential resources that have shaped my #DataScience workflow:

1. @swcarpentry 's Version Control with Git lesson:
@TiffanyTimbers @swcarpentry @JennyBryan 3. "Good enough practices in scientific computing" by @gvwilson @JennyBryan Karen Cranston Justin Kitzes @lexnederbragt & @tracykteal…
Read 7 tweets
A "worrying analysis":

"18 [#deeplearning] algorithms ... presented at top-level research conferences ... Only 7 of them could be reproduced w/ reasonable effort ... 6 of them can often be outperformed w/ comparably simple heuristic methods."


[Updates worth tweeting]

There is much concern about #reproducibility issues and flawed scientific practices in the #ML community in particular & #academia in general.

Both the issues and the concerns are not new.

Isn't it time to put an end to them?
There are several works that have exposed these and similar problems along the years.

👏👏 again to @Maurizio_fd et al. for sharing their paper and addressing #DL algorithms for recommended systems (1st tweet from this thread).

But there is more, unfortunately:
Read 18 tweets
Now out! New report examines Reproducibility and Replicability in Science, with recommendations for researchers, agencies, policy makers, journals, etc.
@theNASEM Terms like “reproducibility” and “replicability” are sometimes used as an umbrella word to encompass all related concerns. But often, researchers use each term to refer to a distinct concept. #ReproducibilityInScience
@theNASEM The committee defined #reproducibility as obtaining consistent computational results using the same input data, computational steps, methods, code, and conditions of analysis as an original study.
Read 9 tweets
Today I'd like to talk about issues with respect to #openscience and specifically sharing code.

But first an intro to what open science is...
I am sure you all know a bit about #openscience already but essentially it's a very broad community/movement that aims to make science more accessible, transparent, inclusive, etc.

@daniellecrobins and @rchampieux have this great umbrella infographic!

#Openscience boils down to making science more free and open in the same general ways that the related open source and free software movements/communities pushed for change by making the outputs of science more accessible to both the general public and to other scientists.
Read 58 tweets
Today at my first #SciData18 conference with @SpringerNature. Today's themes are:
mentoring open science
making data findable, accessible, interoperable and reusable throughout the research lifecycle
Data Generalist @becky_boyles

Scientists must store, integrate, analyse, compare + share data sets. Via @TheEconomist, data is the new oil

Or is it the new plastic?

Careful how data used as resource. Closed v shared v open data. Not even 'open data' is truly open #SciData18
Data Generalist @becky_boyles

New model for data -
not sharing data via 'copying' (email, dropbox)
enhanced security where user is both producer and consumer
teams form outside silos
democratic tools for use by non-programmers
integrated data

Read 41 tweets
If you want to make code/data “available”, GitHub isn’t enough.

You must deposit at a DOI-issuing data repository @figshare & @ZENODO_org are both free & awesome; can be synced w/ a GitHub repo

Why GitHub not enough? 1/4
#OpenAccess #OpenData
GitHub is a place for things to be worked on, not for them to live forever.

- Links are fragile (username, repo name)
- Users can delete repos
- GitHub could make your code/data unavailable in the future.

DOI-issuing data repositories preserve your stuff for the future 2/4
Depositing on @KaggleDatasets isn’t good enough for #OpenAccess #OpenData either.

- No API for accessing files without an account
- Fragile URLs
- Kaggle Datasets is a commercial thing.

Do all three! GitHub repo, Kaggle Dataset and @figshare or @ZENODO_ORG 3/4
Read 4 tweets
Five hours in @Reagan_Airport and still here; twice rebooked due to thunderstorms—hope I make it to Boston tonight for tomorrow's IEEE #reproducibility workshop.
As the @IEEEorg steps into the #reproducibility discussion, I'm really hoping they'll pay attention to terminology—"Terminologies for Reproducible Research"
My assessment after reviewing literature from more than a dozen fields is that the predominant usage for #reproducibility is “same data+same methods=same results.”
Read 10 tweets
Fifth and final session on #ResearchIntegrity Brandon Stell on #PubPeer @FEBSnews
Stell: Scientists are not the only people whose work relies on accuracy of published work - also basis for current and future research, public policy, etc #ResearchMisconduct
Stell: cites the #Poldermans case and how flawed publication that made its way into guidelines led to 8000 deaths
Read 13 tweets
Third #ResearchMisconduct presentation by Bernhard Rupp: The action is in the re(tr)action @FEBSnews #FEBS2018
Rupp breaks with convention and walks away from the podium #WanderingSpeaker
Rupp: valid concerns exist about incorrect and irreproducible research, but is there a "reproducibility crisis"? #ResearchMisconduct
Read 13 tweets
How many random seeds are needed to compare #DeepRL algorithms?

Our new tutorial to address this key issue of #reproducibility in #reinforcementlearning




#machinelearning #neuralnetworks
Algo1 and Algo2 are two famous #DeepRL algorithms, here tested
on the Half-Cheetah #opengym benchmark.

Many papers in the litterature compare using 4-5 random seeds,
like on this graph which suggests that Algo1 is best.

Is this really the case? Image
However, more robust statistical tests show there are no differences.

For a very good reason: Algo1 and Algo2 are both the same @OpenAI baseline
implementation of DDPG, same parameters!

This is what is called a "Type I error" in statistics.
Read 11 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!