Discover and read the best of Twitter Threads about #datasets

Most recents (16)

Are you interested in getting started in #bioinformatics but not sure where to begin? Here are some tips to help you get started on your journey. A THREAD🧵🧵:
Start by learning a high-level #programming language, such as #Python or #R, and familiarizing yourself with data structures and #algorithms commonly used in #bioinformatics. The #BioPython and #Bioconductor libraries are great resources for this.
Next, learn about #genomic data formats and standards, such as #FASTA, #FASTQ, and #GFF. This will allow you to effectively manipulate and analyze large-scale #genomic #datasets. The #NCBI SRA and #EBI ENA databases are great places to find real-world data to work with.
Read 12 tweets
🧵#Bioinformatics applications, a THREAD🧵🧵:
One of the key areas of #bioinformatics is the application of #computational techniques to the analysis of large #datasets of biological information, such as #genomic, #proteomic, and #metabolomic data.
#Genomic data provides information about the entire genetic makeup of a #biological system, including the #sequences of all its genes and the #regulation of their e#xpression.
Read 8 tweets
¿Eres usuario o usuaria de #R y trabajas con datos del @INE_Chile o de otros servicios públicos? En este hilo te enseñamos cómo usar el paquete #calidad, desarrollado por profesionales de nuestra institución y que ya está en #CRAN. Para instalar el paquete tienes dos opciones: Image
...Dentro del #paquete encontrarás algunos #datasets, los que estarán disponibles una vez que el paquete #calidad esté cargado en el ambiente con el siguiente comando: Image
...¡Vayamos ahora con un ejemplo! Si queremos calcular el porcentaje de #pobreza, según sexo y región, debemos crear una variable #dummy con el siguiente código: Image
Read 6 tweets
Data Nerds!

Have you been looking for datasets to practise, work with or even explore? But, you do not know where to find them?

We gat you.

Datasets and where to find them, a thread:

Remember to share and save.
#data #datasets #womenindata #DataScience #DataAnalytics
1. Kaggle:
Type of Data: Miscellaneous

Type of Data: Miscellaneous

3. Quandi:
Type of Data: Economic and Financial
4. Socrata:
Type of Data: Government, Business and Education

5. FiveThirtyEight:
Type of Data: Miscellaneous

6. Google Dataset Search:
Type of Data: Miscellaneous
Read 9 tweets
Learning data science is fun, so then why do we always use the same boring datasets? It's common to see projects using the iris, cars, or titanic data. Stand out! Check these 9 datasets on I created on #kaggle perfect for a unique portfolio project. #datascience #datasets🧵👇
1. MrBeast Youtube Stats

Includes metadata for every MrBeast Youtube video including: title, description, view, comment counts, likes AND thumbnails. Updated daily so you can track this viral sensation’s video trends over time.

2. Workplace Injury Data

Dataset of over 200k OSHA reportable injuries spanning 5 years. Do some investigative data science to see which industries produce the most injuries and which companies keep their employees safe.

Read 10 tweets
The 2nd day of #PESW starts in a few hours with a #keynote speech on #multivariate #analysis by our guest prof. @j_camacho_p from @CanalUGR. #NetSec Session 2 on #Datasets and Quality will follow. Spoiler: #perqoda. See… and come to #Horoměřice in person!
@j_camacho_p is on the stage! #PESW day has started. Let's listen to the #networkmetrics.
Very interesting presentation - data analysis based on multivariate analysis can help to reveal issues in data(́sets). Maybe even the "Clever Hans" effect? Btw. photo of "technology that never fails" (professionals are ready for presentation)
Read 5 tweets
🚨 #NewStudyAlert 🚨 #energytwitter

Today we release a new report: "Role of Electricity Produced by Advanced Nuclear Technologies in Decarbonizing the U.S Energy System". Below lies a thread (🧵🪡)
We deployed WIS:dom-P w/ augmentation to include: #endogenous learning, blended adv. #nuclear tech., yearly investment periods, @NREL #electrification, pilot projects, new #weather & load #datasets.
Two pathways were analyzed. First, "nominal", was where advanced nuclear had a lower first-of-a-kind (FOAK) cost & no delays due to permitting, labor, supply chains. The second pathway, "constrained", had a higher FOAK and there were delays in permitting, labor and supply chains.
Read 16 tweets
the service is using ML model trained on large number of annotated product texts. google product taxonomy was taken as the base taxonomy, which was then customised.
the #platform can be used both as dashboard or as #API #json endpoint. here is the url for IAB classification that you can try out:…
Read 5 tweets
#Robots need a better sense of touch to become dexterous.
We work on fixing this with our new sensor: “Insight” -- it uses a tiny camera and deep learning to enable high-fidelity sensing all-around with normal and shear forces.
Out today:…
I am super proud of the rest of the team @huanbo_sun and Katherine J. Kuchenbecker.

We set out to create a high-fidelity 3D tactile sensor that is robust, cheap, and easy to make.

Here is a 4-min video explaining how it works:

more below
So here are a few details:

The mechanical design is pretty unique: we use a soft elastomer that encloses a rigid thin skeleton.
-> it can withstand strong forces
-> it is very sensitive
-> surface has high friction
#Haptics #Elastomer #Overmolding Image
Read 8 tweets
When beginners start with #DataScience, the biggest pain point is finding good datasets for projects.
Luckily, there are many good public #datasets available! 📈🤓

Here are places you can find them! ⬇️⬇️⬇️


#MachineLearning #datascience #AI

1️⃣ @fastdotai

Apart from providing amazing free courses, Fast AI has teamed up with AWS to provide free datasets for image classification, NLP, image localization and COCO projects.


@fastdotai 2️⃣ Awesome Public Datasets

This is a rich github repo of carefully curated datasets from 35+ domains.


Read 8 tweets
1/9 Covid-19 has shown us the benefits of #datasharing, but lack of trust in how #data is shared, and difficulty in designing and sustaining #dataaccess initiatives, can be a barrier to sharing.

We’ve been exploring these challenges in our @InnovateUK funded R&D programme...
2/9 Lack of trust and trustworthiness could lead to less #datasharing. But how do we decide what organisations and #datasets are trustworthy?

We've been exploring systematic ways of examining trust and trustworthiness:…
3/9 ...and here’s a (beta) guidebook to help you be more trustworthy and trusted when collecting, managing, using and sharing #data. Please try it and let us know what you think:…
Read 9 tweets
Subsequent to #Surgisphere, all @TheLancet journals will now introduce additional peer-review requirements for papers based on large, real-world #datasets.…
@TheLancet journals now require all #research papers, irrespective of method, to include a data-sharing statement that details what #data will be shared, whether additional documents will be shared, when data will become available & by what access criteria data will be shared.
All @TheLancet journals will now introduce additional peer-review requirements for papers based on large, real-world datasets.
Read 10 tweets
In reply @VijayKunadian : 1) we know there is an excess of deaths (when all-causes as a single group are studied) (@d_spiegel);
2) there will also be an excess of #cardiovascular deaths (we will release a pre-print shortly for external review @mmamas1973). I think it is very likely people did not come to #hospital - we know that failure to treat MI results in premature death.
(However, limitations in arbitration of cause, ‘default’ to #COVID19 as a cause, & latent effects of missed #heartattack will produce mis-classification bias & an underestimation of the extent of the deaths from not seeking help)
Read 8 tweets
Two thoughts: (1) any system where information *for care* does not flow along #CarePathways is fundamentally broken. Once that DOES work properly, abstracting useful/necessary #statistics for planning & #response becomes far more possible/likely...
(2) As #COVID19 has shown, any '#MinimumCareDataset' MUST contain #operational as well as #health data - and must distinguish #health & #care too. Also, focusing just on '#datasets' is too many layers up the #stack; what about the #dictionary?
#Investment in #SocialCare as an #InformationSystem (which every complex system basically is) lags health & the NHS by 3 - 4 DECADES.

If that's not an opportunity for massive #innovation, #transformation and *avoiding past errors* then tell me what is. Allowing #BigTech et al...
Read 4 tweets
Daily Bookmarks to GAVNet 6/02/2020-2…

What COVID-19 Means For The Future Of Capitalism, Democracy And Sustainability…

#coronavirus #future #capitalism #democracy #Sustainability
U.S. COVID-19 Contact Tracing Programs Designed for Failure, Despite Bloomberg Money; Why Can't the U.S. Copy the Lessons of Hong Kong's Success? | naked capitalism…

#tracing #failure #contact #Bloomberg
Stretch and flow: Research sheds light on unusual properties of well-known materials…

#research #flow #materials #Stretch
Read 12 tweets
Today, @NITIAayog released a Discussion Paper titled ‘National Strategy for Artificial Intelligence”. #AIforAll…
We provide our initial thoughts on the paper here. 1/n
We welcome this initiative by @NITIAayog, but a call for comments would have been a welcome addition. The paper takes important steps forward from the #AI Task Force report released earlier this year by the DIPP, Ministry of Commerce and Industry (…). 2/n
This paper attempts a more holistic look at a broader range of issues concerning AI including #regulation, #ethics, #fairness, #transparency and #accountability. However, a number of issues still remain with this paper. 3/n
Read 41 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!