Discover and read the best of Twitter Threads about #reproducibility

Most recents (10)

Your MDS Curator this week, @TiffanyTimbers, here.

This morning I would like to share with you some of the most influential resources that have shaped my #DataScience workflow:

1. @swcarpentry 's Version Control with Git lesson: swcarpentry.github.io/git-novice/
@TiffanyTimbers @swcarpentry @JennyBryan 3. "Good enough practices in scientific computing" by @gvwilson @JennyBryan Karen Cranston Justin Kitzes @lexnederbragt & @tracykteal

journals.plos.org/ploscompbiol/a…
Read 7 tweets
A "worrying analysis":

"18 [#deeplearning] algorithms ... presented at top-level research conferences ... Only 7 of them could be reproduced w/ reasonable effort ... 6 of them can often be outperformed w/ comparably simple heuristic methods."

Paper:
lnkd.in/dTaGCTv

#AI
[Updates worth tweeting]

2/
There is much concern about #reproducibility issues and flawed scientific practices in the #ML community in particular & #academia in general.

Both the issues and the concerns are not new.

Isn't it time to put an end to them?
3/
There are several works that have exposed these and similar problems along the years.

👏👏 again to @Maurizio_fd et al. for sharing their paper and addressing #DL algorithms for recommended systems (1st tweet from this thread).

But there is more, unfortunately:
Read 18 tweets
Now out! New report examines Reproducibility and Replicability in Science, with recommendations for researchers, agencies, policy makers, journals, etc.
@theNASEM ow.ly/BK5350u0Iz2
#ReproducibilityInScience
@theNASEM Terms like “reproducibility” and “replicability” are sometimes used as an umbrella word to encompass all related concerns. But often, researchers use each term to refer to a distinct concept. #ReproducibilityInScience
@theNASEM The committee defined #reproducibility as obtaining consistent computational results using the same input data, computational steps, methods, code, and conditions of analysis as an original study.
Read 9 tweets
Today I'd like to talk about issues with respect to #openscience and specifically sharing code.

But first an intro to what open science is...
I am sure you all know a bit about #openscience already but essentially it's a very broad community/movement that aims to make science more accessible, transparent, inclusive, etc.

@daniellecrobins and @rchampieux have this great umbrella infographic!

#Openscience boils down to making science more free and open in the same general ways that the related open source and free software movements/communities pushed for change by making the outputs of science more accessible to both the general public and to other scientists.
Read 58 tweets
Today at my first #SciData18 conference with @SpringerNature. Today's themes are:
mentoring open science
+
making data findable, accessible, interoperable and reusable throughout the research lifecycle
Data Generalist @becky_boyles

Scientists must store, integrate, analyse, compare + share data sets. Via @TheEconomist, data is the new oil

Or is it the new plastic?

Careful how data used as resource. Closed v shared v open data. Not even 'open data' is truly open #SciData18
Data Generalist @becky_boyles

New model for data -
not sharing data via 'copying' (email, dropbox)
enhanced security where user is both producer and consumer
teams form outside silos
democratic tools for use by non-programmers
integrated data

#SciData18
Read 41 tweets
If you want to make code/data “available”, GitHub isn’t enough.

You must deposit at a DOI-issuing data repository @figshare & @ZENODO_org are both free & awesome; can be synced w/ a GitHub repo

Why GitHub not enough? 1/4
#OpenAccess #OpenData
GitHub is a place for things to be worked on, not for them to live forever.

- Links are fragile (username, repo name)
- Users can delete repos
- GitHub could make your code/data unavailable in the future.

DOI-issuing data repositories preserve your stuff for the future 2/4
Depositing on @KaggleDatasets isn’t good enough for #OpenAccess #OpenData either.

- No API for accessing files without an account
- Fragile URLs
- Kaggle Datasets is a commercial thing.

Do all three! GitHub repo, Kaggle Dataset and @figshare or @ZENODO_ORG 3/4
Read 4 tweets
Five hours in @Reagan_Airport and still here; twice rebooked due to thunderstorms—hope I make it to Boston tonight for tomorrow's IEEE #reproducibility workshop.
As the @IEEEorg steps into the #reproducibility discussion, I'm really hoping they'll pay attention to terminology—"Terminologies for Reproducible Research" arxiv.org/abs/1802.03311
My assessment after reviewing literature from more than a dozen fields is that the predominant usage for #reproducibility is “same data+same methods=same results.”
Read 10 tweets
Fifth and final session on #ResearchIntegrity Brandon Stell on #PubPeer pubpeer.com @FEBSnews
Stell: Scientists are not the only people whose work relies on accuracy of published work - also basis for current and future research, public policy, etc #ResearchMisconduct
Stell: cites the #Poldermans case and how flawed publication that made its way into guidelines led to 8000 deaths
Read 13 tweets
Third #ResearchMisconduct presentation by Bernhard Rupp: The action is in the re(tr)action @FEBSnews #FEBS2018
Rupp breaks with convention and walks away from the podium #WanderingSpeaker
Rupp: valid concerns exist about incorrect and irreproducible research, but is there a "reproducibility crisis"? #ResearchMisconduct
Read 13 tweets
How many random seeds are needed to compare #DeepRL algorithms?

Our new tutorial to address this key issue of #reproducibility in #reinforcementlearning

PDF: arxiv.org/pdf/1806.08295…

Code: github.com/flowersteam/rl…

Blog: openlab-flowers.inria.fr/t/how-many-ran…

#machinelearning #neuralnetworks
Algo1 and Algo2 are two famous #DeepRL algorithms, here tested
on the Half-Cheetah #opengym benchmark.

Many papers in the litterature compare using 4-5 random seeds,
like on this graph which suggests that Algo1 is best.

Is this really the case?
However, more robust statistical tests show there are no differences.

For a very good reason: Algo1 and Algo2 are both the same @openAI baseline
implementation of DDPG, same parameters!

This is what is called a "Type I error" in statistics.
Read 11 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!