mentoring open science
+
making data findable, accessible, interoperable and reusable throughout the research lifecycle
Scientists must store, integrate, analyse, compare + share data sets. Via @TheEconomist, data is the new oil
Or is it the new plastic?
Careful how data used as resource. Closed v shared v open data. Not even 'open data' is truly open #SciData18
New model for data -
not sharing data via 'copying' (email, dropbox)
enhanced security where user is both producer and consumer
teams form outside silos
democratic tools for use by non-programmers
integrated data
#SciData18
Link to NIH Data Commons Pilot Phase Consortium
commonfund.nih.gov/commons
Making data Findable, Accessible, Interoperable, and Reusable (FAIR)
#SciData18
Institution appoints #datastewards at all faculties. They understand researchers and their problems. They have a PhD - so understand the pain points. There to help and improve data culture.
#SciData18
#SciData18 @martateperek
Cambridge Data Champions data.cam.ac.uk/intro-data-cha…
TU Delft Data Stewards tudelft.nl/en/library/cur… (they're hiring! academictransfer.com/en/50677/data-…)
@ResPlat at @unimelb do a great job too!
#SciData18 @martateperek
Problem: missing data!
Old approach - manually digitising coastline through @googleearth - tedious!
New approach - citizen science! coastwards.org
#SciData18
Problem: ultra-rare disease & undiag. diseases overlooked
1. survey undiag. patients in Japan
2. local network
3. global data exchange (clinical + genetic)
4. patient + public involvement
amed.go.jp/en/program/IRU…
#SciData18
Problem: Bad maps
Approach: Disaggregated blocky maps to aggregated, sharable, interactive usable grids
Not just static anymore - can use changing data for policy
wp-winterschool.org - training!
#SciData18
Problem: Need to create maps that can be compared across countries
Approach: Collecting data, using data + combining datasets
Datasets: worldpop.org.uk
Code share: figshare.com
#SciData18
Problem: finding evidence of epilepsy characteristics in MRI
Approach: deep learning - train legion classifier
open access journal
github.com/MELDProject
data repository
resulted in external replication and validation!
#SciData18
Problem: Gaining consent to use end-of-life care data
Data sharing + informed consent: do they really understand?
Layered consent model - cooperation btw ethics, family etc
Data is being used!
data-archive.ac.uk
#SciData18
Problem: sharing + accessing health data
Why share data? Find the solution to your exact data online! Don't say 'contact author for data' just put it up!
zenodo.org
nature.com/sdata/
#SciData18
Problem: Inconsistent identifiers for life sciences - too many urls!
Approach: Actionable compact identifiers!
prefix:identifier (e.g. taxon:9606)
stable
unique
resolvable
location independent
#SciData18
Problem: trial transparency - reporting info completely, accurately + timely
Approach: website that tracks breaches of trial reporting laws fdaaa.TrialsTracker.net
EU.TrialsTracker.net
'Hey, this trial report is overdue!'
#SciData18
Problem: missing + imprecise information
Approach: define guidelines; check data for compliance with guidelines
Completeness; Compliance; Registration; Quality -> better data!
beilstein-strenda-db.org/strenda/
#SciData18
Problem: How to reproduce stat significance w/o raw data?
Approach: digitise graphs; re-analyse data; look for 'reversers'
What proportion of data points need to be removed to change significance?
Concerning amount...
#SciData18
Problem: Screen large number of materials, reproducibility of computational research; track provenance
Approach:
materialscloud.org
aiida.net
#SciData18
Problem: Developing data metrics; standardise data reuse
Approach: infrastructure to count data reuse (citations, views, downloads); display data metrics
@MakeDataCount
projectcounter.org
cdlib.org/services/uc3/d…
#SciData18
Why figshare.com?
Anyone can find + read your research! It's nice + good + the right thing to do!
You don't know who might be interested!
See AR's projects here:
ajrae.staff.shef.ac.uk/#resources
statsmapsnpix.com
#SciData18
By sharing your data you open yourself up to the good and the bad - but open data is a risk worth taking
#SciData18
wellcome.ac.uk/news/new-wellc…
Data shared by researchers can be re-used by others to generate new insights and tools
Competition data:
1. AMR Surveillance data synapse.org/#!Synapse:syn1…
2. Malaria
synapse.org/#!Synapse:syn1…
#SciData18
#stateofopendata survey - reviewing the development of the open data movement over the past 10 years
Credit, concerns, awareness, reuse, guidelines, motivations
stateofopendata.od4d.net
@stateofopendata
#SciData18
Only 10% of authors cite datasets used in research papers properly
Nature Research journals mandate a data availability statement + recommend public repositories - very supportive of open data + training
#SciData18
Data sharing polices e.g. bit.ly/datasharingpol…
Research data support
go.nature.com/ResearchDataSe…
1. Can you share your data?
2. Are your data ready to share?
3. Who is the owner of your data?
#SciData18
Not just biomed data! Here's a resource for the Earth scientists (my people):
copdess.org/enabling-fair-… - develop standards in the Earth sciences to enable FAIR data on a large scale
#SciData18
@CodeOceanHQ - allow reviewers access to the exact same environment that the authors used in the paper. A great leap into code reproducibility!
codeocean.com (wow!)
#reproducibility
#SciData18
Deep desire to make arguments - communication first, not visualisation first.
Focus on the story!
Effective Data Vis: Find the meaningful between the beautiful and clinical.
#SciData18
“You have to be like the worst tabloid newspaper in the front and the Academy of Science in the back.” - Hans Rosling
Examples of Hans Rosling data vis: visualisingdata.com/2017/02/thank-…
gapminder.org
#SciData18
Data vis paper!
vcg.seas.harvard.edu/files/pfister/… [pdf]
Effectiveness of data vis:
1. Do you recognise this graphic?
2. Image blurred - what do you remember the graphic telling you?
Text is a key part of graphics!
#SciData18
Active titles - write take home message in the title
Say it again! Use data and text to explain message
Exploit colour (remember colour blind design somersault1824.com/tips-for-desig…)
Labels - emphasis on narrative
#SciData18
Aside - I'm keen on communicating meaningful stories in data + this is a field I'd like to explore. If anyone would like 2 chat to a beginner about tools + skills 2 start with, pls let me know!
#SciData18
Kirstie Whitaker @kirstie_j
Sue Fletcher-Watson @SueReviews
Paola Quattroni @PaolaQuattroni
Natalia Tejedor-Garavito @NatuTejedor
Zaheer-Ud-Din Babar
Computer Battery dying, let's see how we go!
#SciData18
Integrate data cleaning from the start!
If you can't share sensitive data, share metadata!
Incentives for ppl who do it correctly - though recognise huge time commitment. Diversify meaning of sucess in academia!
#SciData18
Funders + institutes provide incentives for academics
Supervisors to let go of the ego + have trust in why result not supported
Commit to publish every protocol
#SciData18
Funders - harvest grant scheme for reproducing studies
Give students reproducibility studies rather than new projects -
Less emphasis on novelty, more on quality!
#SciData18
Institutional level + Government level (perhaps the same as institution, as unis funded by government)
#SciData18
Not everything needs to be reproduced - some studies provoke a line of enquiry. Lots of qualitative studies fall here.
(Definitions: Reproducibility
1. original data + validating study
2. new, similar data + get same results)
#SciData18
Debate! Yes, but doesn't say that it *should have* to be reproduced.
Who is responsible for reproducing work?
You should be able to have a career checking someone else's work - @kirstie_j
#SciData18
5% battery left.
Brilliant interactive conference! Diversity in topics, gender, geographic backgroud, career level, data origin, opinions, universities...and more I'm sure.
Well done all involved!
#SciData18