Discover and read the best of Twitter Threads about #DataScience

Most recents (24)

Everyone has a different journey.

To get into the #Datascience role that I am in, I wanted to share the courses I took.

Some free, some not.

Here is how I mapped my journey 🧡
1/7 Some of the skills I realized that I needed:

- Math βœ…
- Software Development βœ…
- Courses on Data Analytics and Data Science βœ…
2/7 Software: CS50 Harvard (Free)

This course was an eye opener.

I freelanced before as a web developer but felt I needed to sharpen my skills.

A great teacher makes a great course and learning experience.

edx.org/cs50
Read 8 tweets
πŸ”΄ NumPy Fundamentals: Indexing & Slicing
🟑 Jupyter Notebook πŸ“’ Available βœ…

A thread 🧡 πŸ‘‡

#DataScience #MachineLearning #100daysOfCode
πŸ’« Basic Indexing ⬇️
πŸ’« Slicing and Striding ⬇️
Read 7 tweets
Ever wondered how to join big data in R?

A thread on using Spark in RπŸ‘‡πŸ§΅

#rstats #spark #datascience
This is thread # 3 in a series exploring using Spark in R with the sparklyr package. You can find the others here:

# 1:


# 2:

1/17
Here's what you'll learn reading this thread:

1. How a regular left join works.

2. The sort-merge-join algorithm (the default way to join big data with Spark and sparklyr).

3. Broadcast joins.

4. Salted joins.

All directly in R. Let's go.

2/17
Read 18 tweets
Following the @_useRconf 2022 conference and right before the RStudio conference, here are, in my opinion, the main trends in the #R language 🌈:
➑️ MLOps πŸš€
➑️ Data 🏎
➑️ Documentation ❀️
πŸ§΅πŸ‘‡πŸΌ

#rstats #opensource #DataScience
MLOps - I was not aware of any #MLOps framework in R until the release of the 𝐯𝐞𝐭𝐒𝐯𝐞𝐫 package. The vetiver package by @juliasilge from RStudio provides an MLOps framework for both R and #Python. Image
The vetiver supports MLOps applications for tidymodels, #XGBoost, mlr, caret in R, and for #PyTorch and scikit-learn in Python Image
Read 9 tweets
Introduction to Data Science in Yoruba.
Youtube:

#DataScience #yoruba
Read 4 tweets
Tools for quickly performing Exploratory Data Analysis

A Thread 🧡

Here’s a curated list of YouTube videos that I’ve made on the topic.

#datascience #eda
1. AutoPlotter
2. Bamboolib
Read 16 tweets
Neural Network with Flax πŸš€πŸš€πŸš€

#Flax is a Google #OpenSource Python library for neural network applications for JAX 🌈. While most folks are familiar with Google #TensorFlow and #Keras, 🧡 πŸ‘‡πŸΌ

#DataScience #deeplearning #DL #python Image
While most folks are familiar with Google TensorFlow and Keras, Flax is less known, but it is mainly used by researchers and engineers at Google.

One of the core use cases of this library is for #NLP #Transformers and image recognition applications Image
Among the Flax applications, you can find:
βœ… Neural network API (flax.linen): Dense, Conv, Norm, Attention, Pooling, Cell, Dropout
βœ… Utilities and patterns: replicated training, serialization and checkpointing, metrics, prefetching on device
βœ… Educational examples Image
Read 5 tweets
πŸ”΄ Pandas 🐼 Tutorial 3️⃣
🟑 Iterrows 🀜 πŸ€› Itertuples

Check this out πŸ‘‡

#programming #MachineLearning #DataScience #pythonprogramming #CodeNewbie #pythoncode #100daysofcode #pythontricks #pythonprojects #100daysofcodechallenge #python #Pandas
Find Jupyter Notebook πŸ“’ ⬇️
github.com/patchy631/twit…
Read 4 tweets
Ever wondered how to manipulate big data with R?

A thread on using Spark in RπŸ‘‡πŸ§΅

#rstats #spark #datascience
Big data is any data which is too large to be loaded into memory all in one go. Ever tried to read a large dataset into R and everything fell over or slowed waaaaaay down? This is where Spark comes in.

1/18
Spark is an open source tool for processing big data. It splits data into partitions for processing to overcome RAM limitations and writes to disk when needed. It is complicated but Spark handles most of the difficult parts. And you can work with Spark directly from R.

2/18
Read 19 tweets
Learn Data Science in 180 daysπŸ€‘πŸ“ˆ and start your data science career.

Bookmark this thread

A threadπŸ§΅πŸ‘‡
First Month πŸ—“οΈ
Day 1 to 15 - Learn Python for Data Science
Day 16 to 30 - Learn Statistics for Data Science
Second Month πŸ—“οΈ
Day 31 to 45 - Explore Python Packages( Numpy, Pandas, Matplotlib, Seaborn, Scikit-Learn)
Day 16 to 30 - Implement EDA on real-world datasets.
Read 8 tweets
Microsoft has some free #datascience and #machinelearning courses on GitHub.
Free tutorials for machine learning, data science, and web development βœ…πŸ“ˆπŸ€‘

A thread πŸ§΅πŸ‘‡
Machine Learning for Beginners ->
aka.ms/ml-beginners
Web Dev for Beginners -> lnkd.in/gkpq9fVz
Read 5 tweets
Scholarship Alert, πŸ₯³!!!

WiDA is excited to announce a partnership with Footright!

Footright will be providing 200 WiDA community members with full scholarships for their Data Science/ AI bootcamps (valued at N300,000)!

The bootcamp is scheduled to run for 3 months online.
What you can be sure to get from this bootcamps:
- Expert Training
- 48-hour class content
- Trainings in Statistics
- Trainings in Python for Data Science
- Trainings in Data Visualization
- Trainings in SQL
Interested candidates should hurry and apply. Do not miss out on this opportunity.

Apply here: bit.ly/WiDAxFootright

Application deadline: 13th, June 2022.

#womanindata #womanintech #techopportunities #womanindataafrica #data #datascience #girl #girlintech #girlindatascience
Read 3 tweets
Best websites for data science! 😱

/

‘Las mejores pÑginas web para aprender ciencia de datos! 😱

πŸ§΅β¬‡ [1/7]

#datascience
kaggle.com
@kaggle

πŸ§΅β¬‡ [3/7]

#datascience
Read 9 tweets
I teach a 15-credit course to 3rd yr #datascience ugrads @UCD. I've their full attention for ~10 wks. They work in pairs to produce a major project of their own design. Every year I'm amazed by how they grow during this time & the confidence they gain from what they achieve. 1\n
At the start of the module we've a 1-wk bootcamp where I work on a sample project from start to finish. Each year I pick a new topic & this year it was an analysis of #Wordle, the popular @nytime word puzzle. Here's a summary of the key findings... 2\n
towardsdatascience.com/big-data-in-li…
The study was based on an analysis of almost 70M Wordle games: >53M simulated games (using a simulator designed to simulate realistic, not optimal, human gameplay) & >15M real games shared on Twitter. The simulated gameplay matches real gameplay in several important respects. 3\n
Read 12 tweets
How I began my Data Analytics journey.

β€’ I started with Microsoft Excel. Excel is more than a spreadsheet. It helped me to think outside the box due to its flexibility and also improved my problem-solving skills.

#DataScience #DataAnalytics #Excel
β€’ I transitioned into Power BI with my skills in Excel. I didn't struggle to learn Power BI because of my strong Excel skills. It doesn't mean you need Excel to start Power BI but it helps.
β€’ I picked up new skills in SQL. SQL is one of the key skills every analyst must possess. Very powerful and easy to learn.
Read 7 tweets
β€œData don’t lie”. But it typically requires a process of defining #research questions, hypotheses, methodology, interpreting and #dataviz that can introduce subjectivity and #bias. Scientific rigor and objectivity are key in #DataScience. Some #Tips for #DataScientists 🧡
Don’t dive straight into a dataset, domain knowledge is critical. Good #Science requires a theoretical understanding of a topic while #ignorance introduces bias. Sound domain knowledge enables you to ask the right questions and give relevant answers with #DataScience
Investigate the alternate hypothesis. Business questions asked to #DataScientists are often directive, as there already is a hypothesis. Don’t confirm this hypothesis without properly investigating the alternate option.
Read 11 tweets
Today, @ONS & @DataSciCampus publish an article tracking the prices of the lowest-cost grocery items using web-scraping, inspired by @BootstrapCook & other campaigners who have rightly asked about the #costofliving experiences of the poorest in society. Here's what we found 🧡 Text saying "Tracking the price of the lowest-cost groc
First: a caveat. This is highly experimental research, as web-scraping supermarket data for 30 grocery items can go wrong in multiple ways! But even though this is experimental, we thought it was really important to talk about it and the results. 2/n
#datascience #econtwitter
Okay: results. Lowest cost items (eg value brands) have had significant price increases: 13 of 30 monitored had average lowest price increase faster than overall equiv. inflation measure. Big movers: pasta (up 50%), crisps (17%), bread (16%), minced beef (16%) & rice (15%). 3/n Lowest price of selected 30 everyday groceries, item-level p
Read 8 tweets
1/ "Software is eating the world. Machine learning is eating software. Transformers are eating machine learning."

Let's understand what these Transformers are all about

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI #DataAnalytics
2/ #Transformers architecture follows Encoder and Decoder structure.

The encoder receives input sequence and creates intermediate representation by applying embedding and attention mechanism.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI
3/ Then, this intermediate representation or hidden state will pass through the decoder, and the decoder starts generating an output sequence.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI #DataScientist #DataAnalytics #Statistics
Read 14 tweets
But what p-value means in #MachineLearning - A thread

It tells you how likely it is that your data could have occurred under the null hypothesis

1/n

#DataScience #DeepLearning #ComputerVision #100DaysOfMLCode #Python #DataScientist #Statistics #programming #Data #Math #Stat
2/n
What Is a Null Hypothesis?

A null hypothesis is a type of statistical hypothesis that proposes that no statistical significance exists in a set of given observations.

#DataScience #MachineLearning #100DaysOfMLCode #Python #stat #Statistics #Data #AI #Math #deeplearning
3/n
A P-value is the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the null hypothesis

#DataScience #MachineLearning #100DaysOfMLCode #Python #DataScientist #Statistics #Data #DataAnalytics #AI #Math
Read 11 tweets
1/ One way to test whether a time series is stationary is to perform an augmented Dickey-Fuller test - A Thread

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI #DataScientist #DataAnalytics #Statistics #programming #ArtificialIntelligence
2/ H0: The time series is non-stationary. In other words, it has some time-dependent structure and does not have constant variance over time.

HA: The time series is stationary.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI #DataScientist
3/ If the p-value from the test is less than some significance level (e.g. Ξ± = .05), then we can reject the null hypothesis and conclude that the time series is stationary.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI #DataScientist
Read 8 tweets
2/ It is important to standardize variables before running Cluster Analysis. It is because cluster analysis techniques depend on the concept of measuring the distance between the different observations we're trying to cluster.

#DataScience #MachineLearning #DeepLearning
3/ If a variable is measured at a higher scale than the other variables, then whatever measure we use will be overly influenced by that variable.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI #DataScientist #DataAnalytics #Statistics
Read 16 tweets
Did you know how TensorFlow can run on a single mobile device as well as on an entire data center? Read this thread

1/n

#TensorFlow #DataScience #DeepLearning #MachineLearning #ComputerVision #100DaysOfMLCode #Python #DataScientist #Statistics #programming #Data
2/n
Google has designed TensorFlow such that it is capable of dividing a large model graph whenever needed.

#TensorFlow #DataScience #DeepLearning #MachineLearning #ComputerVision #100DaysOfMLCode #Python #DataScientist #Statistics #programming #Data #Math #Stat #AI
3/n
It assigns special SEND and RECV nodes whenever a graph is divided between multiple devices (CPUs or GPUs).

#TensorFlow #DataScience #DeepLearning #MachineLearning #ComputerVision #100DaysOfMLCode #Python #DataScientist #Statistics #programming #Data #Math #Stat #AI
Read 9 tweets
2/16

"roc_auc_score" is defined as the area under the ROC curve, which is the curve having False Positive Rate on the x-axis and True Positive Rate on the y-axis at all classification thresholds.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python
Read 16 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!