Profile picture
Catherine Wheller @catinthefield
, 41 tweets, 34 min read Read on Twitter
Today at my first #SciData18 conference with @SpringerNature. Today's themes are:
mentoring open science
+
making data findable, accessible, interoperable and reusable throughout the research lifecycle
Data Generalist @becky_boyles

Scientists must store, integrate, analyse, compare + share data sets. Via @TheEconomist, data is the new oil

Or is it the new plastic?

Careful how data used as resource. Closed v shared v open data. Not even 'open data' is truly open #SciData18
Data Generalist @becky_boyles

New model for data -
not sharing data via 'copying' (email, dropbox)
enhanced security where user is both producer and consumer
teams form outside silos
democratic tools for use by non-programmers
integrated data

#SciData18
Data Generalist @becky_boyles

Link to NIH Data Commons Pilot Phase Consortium
commonfund.nih.gov/commons

Making data Findable, Accessible, Interoperable, and Reusable (FAIR)

#SciData18
Data Steward Coordinator @martateperek @tudelft

Institution appoints #datastewards at all faculties. They understand researchers and their problems. They have a PhD - so understand the pain points. There to help and improve data culture.

#SciData18
Aside - I didn't understand data structure at my research institution. Training was given in large groups, not tailored, and I didn't reach out further. Probably missed opportunities. Would have been great to have proactive, individual data mentorship.

#SciData18 @martateperek
Great initiatives at institutions:

Cambridge Data Champions data.cam.ac.uk/intro-data-cha…

TU Delft Data Stewards tudelft.nl/en/library/cur… (they're hiring! academictransfer.com/en/50677/data-…)

@ResPlat at @unimelb do a great job too!

#SciData18 @martateperek
Claudia Wolff | Coastal Risks and Sea-Level Rise @WolffClaudi

Problem: missing data!

Old approach - manually digitising coastline through @googleearth - tedious!

New approach - citizen science! coastwards.org

#SciData18
Takeya Adachi | Japan. Agency for Med Res. + Dev.

Problem: ultra-rare disease & undiag. diseases overlooked

1. survey undiag. patients in Japan
2. local network
3. global data exchange (clinical + genetic)
4. patient + public involvement

amed.go.jp/en/program/IRU…

#SciData18
Andrew Tatum | WorldPop worldpop.org.uk @WorldPopProject

Problem: Bad maps

Approach: Disaggregated blocky maps to aggregated, sharable, interactive usable grids

Not just static anymore - can use changing data for policy

wp-winterschool.org - training!

#SciData18
Natalia Tejedor Garavito | WorldPop @NatuTejedor @WorldPopProject

Problem: Need to create maps that can be compared across countries

Approach: Collecting data, using data + combining datasets
Datasets: worldpop.org.uk
Code share: figshare.com

#SciData18
Sophie Adler | MELD Project @sophieadler

Problem: finding evidence of epilepsy characteristics in MRI

Approach: deep learning - train legion classifier

open access journal
github.com/MELDProject
data repository

resulted in external replication and validation!

#SciData18
Jane Seymour | Nursing + Midwifery @sheffielduni

Problem: Gaining consent to use end-of-life care data

Data sharing + informed consent: do they really understand?
Layered consent model - cooperation btw ethics, family etc
Data is being used!
data-archive.ac.uk

#SciData18
James Avery | Hyper acute stroke unit eit-team.github.io

Problem: sharing + accessing health data

Why share data? Find the solution to your exact data online! Don't say 'contact author for data' just put it up!

zenodo.org
nature.com/sdata/

#SciData18
Sarala Wimalaratne @EMBLEBI identifiers.org

Problem: Inconsistent identifiers for life sciences - too many urls!

Approach: Actionable compact identifiers!

prefix:identifier (e.g. taxon:9606)

stable
unique
resolvable
location independent

#SciData18
Nick DeVito @NDevito1 @EBMDataLab

Problem: trial transparency - reporting info completely, accurately + timely

Approach: website that tracks breaches of trial reporting laws fdaaa.TrialsTracker.net
EU.TrialsTracker.net

'Hey, this trial report is overdue!'

#SciData18
Carsten Kettner | STRENDA-DB @CarKettner @BeilsteinInst

Problem: missing + imprecise information

Approach: define guidelines; check data for compliance with guidelines

Completeness; Compliance; Registration; Quality -> better data!

beilstein-strenda-db.org/strenda/

#SciData18
Andrej-Nikolai Spiess | github.com/anspiess

Problem: How to reproduce stat significance w/o raw data?

Approach: digitise graphs; re-analyse data; look for 'reversers'
What proportion of data points need to be removed to change significance?
Concerning amount...

#SciData18
Aliaksandr Yakutovich | @EPFL_en

Problem: Screen large number of materials, reproducibility of computational research; track provenance

Approach:
materialscloud.org
aiida.net

#SciData18
Helena Cousijn | @datacite @HelenaCousijn

Problem: Developing data metrics; standardise data reuse

Approach: infrastructure to count data reuse (citations, views, downloads); display data metrics

@MakeDataCount
projectcounter.org
cdlib.org/services/uc3/d…

#SciData18
Alasdair Rae @undertheraedar @sheffielduni

Why figshare.com?

Anyone can find + read your research! It's nice + good + the right thing to do!
You don't know who might be interested!

See AR's projects here:
ajrae.staff.shef.ac.uk/#resources
statsmapsnpix.com

#SciData18
Great way to finish the lightning talks:

By sharing your data you open yourself up to the good and the bad - but open data is a risk worth taking

#SciData18
The Wellcome Data Re-use Prizes! @apDinsmore

wellcome.ac.uk/news/new-wellc…

Data shared by researchers can be re-used by others to generate new insights and tools

Competition data:
1. AMR Surveillance data synapse.org/#!Synapse:syn1…
2. Malaria
synapse.org/#!Synapse:syn1…

#SciData18
Magdalena Skipper | Editor in Chief of @nature @Magda_Skipper

#stateofopendata survey - reviewing the development of the open data movement over the past 10 years

Credit, concerns, awareness, reuse, guidelines, motivations

stateofopendata.od4d.net

@stateofopendata

#SciData18
Magdalena Skipper | Editor in Chief of @nature @Magda_Skipper

Only 10% of authors cite datasets used in research papers properly

Nature Research journals mandate a data availability statement + recommend public repositories - very supportive of open data + training

#SciData18
Magdalena Skipper | Editor in Chief of @nature @Magda_Skipper

Data sharing polices e.g. bit.ly/datasharingpol…

Research data support
go.nature.com/ResearchDataSe…
1. Can you share your data?
2. Are your data ready to share?
3. Who is the owner of your data?

#SciData18
Magdalena Skipper | Editor in Chief of @nature @Magda_Skipper

Not just biomed data! Here's a resource for the Earth scientists (my people):

copdess.org/enabling-fair-… - develop standards in the Earth sciences to enable FAIR data on a large scale

#SciData18
Magdalena Skipper | Editor in Chief of @nature @Magda_Skipper

@CodeOceanHQ - allow reviewers access to the exact same environment that the authors used in the paper. A great leap into code reproducibility!

codeocean.com (wow!)

#reproducibility
#SciData18
John Burn-Murdoch | Data Viz @FinancialTimes @jburnmurdoch

Deep desire to make arguments - communication first, not visualisation first.

Focus on the story!

Effective Data Vis: Find the meaningful between the beautiful and clinical.

#SciData18
John Burn-Murdoch | Data Viz @FinancialTimes @jburnmurdoch

“You have to be like the worst tabloid newspaper in the front and the Academy of Science in the back.” - Hans Rosling

Examples of Hans Rosling data vis: visualisingdata.com/2017/02/thank-…
gapminder.org

#SciData18
John Burn-Murdoch | Data Viz @FinancialTimes @jburnmurdoch

Data vis paper!

vcg.seas.harvard.edu/files/pfister/… [pdf]

Effectiveness of data vis:

1. Do you recognise this graphic?
2. Image blurred - what do you remember the graphic telling you?

Text is a key part of graphics!

#SciData18
John Burn-Murdoch | Data Viz @FinancialTimes @jburnmurdoch

Active titles - write take home message in the title

Say it again! Use data and text to explain message

Exploit colour (remember colour blind design somersault1824.com/tips-for-desig…)

Labels - emphasis on narrative

#SciData18
John Burn-Murdoch | Data Viz @FinancialTimes @jburnmurdoch

Aside - I'm keen on communicating meaningful stories in data + this is a field I'd like to explore. If anyone would like 2 chat to a beginner about tools + skills 2 start with, pls let me know!

#SciData18
Panel discussion: Responsibility of Reproducibility

Kirstie Whitaker @kirstie_j

Sue Fletcher-Watson @SueReviews

Paola Quattroni @PaolaQuattroni

Natalia Tejedor-Garavito @NatuTejedor

Zaheer-Ud-Din Babar

Computer Battery dying, let's see how we go!

#SciData18
How do we encourage sharing of metadata?

Integrate data cleaning from the start!

If you can't share sensitive data, share metadata!

Incentives for ppl who do it correctly - though recognise huge time commitment. Diversify meaning of sucess in academia!

#SciData18
How do we encourage sharing of data that didn't support hypothesis?

Funders + institutes provide incentives for academics
Supervisors to let go of the ego + have trust in why result not supported
Commit to publish every protocol

#SciData18
How can we encourage reproducibility studies? Who funds them? Are we in a #reproducibilitycrisis?

Funders - harvest grant scheme for reproducing studies
Give students reproducibility studies rather than new projects -
Less emphasis on novelty, more on quality!

#SciData18
Who should be responsible for long-term archiving research data?

Institutional level + Government level (perhaps the same as institution, as unis funded by government)

#SciData18
What studies should be reproduced?

Not everything needs to be reproduced - some studies provoke a line of enquiry. Lots of qualitative studies fall here.

(Definitions: Reproducibility
1. original data + validating study
2. new, similar data + get same results)

#SciData18
Your study shouldn't be too good to be above checking - @kirstie_j

Debate! Yes, but doesn't say that it *should have* to be reproduced.

Who is responsible for reproducing work?

You should be able to have a career checking someone else's work - @kirstie_j

#SciData18
And Fin.

5% battery left.

Brilliant interactive conference! Diversity in topics, gender, geographic backgroud, career level, data origin, opinions, universities...and more I'm sure.

Well done all involved!

#SciData18
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Catherine Wheller
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!