Discover and read the best of Twitter Threads about #datascience

Most recents (24)

Pandas is a fast, powerful, flexible and open source data analysis and manipulation tool.

A Mega thread 🧵covering 10 amazing Pandas hacks and how to efficiently use it(with Code Implementation)👇🏻
#Python #DataScientist #Programming #MachineLearning #100DaysofCode #DataScience
1/ Indexing data frames
Indexing means to selecting all/particular rows and columns of data from a DataFrame. In pandas it can be done using two constructs —
.loc() : location based
It has methods like scalar label, list of labels, slice object etc
.iloc() : Interger based Image
2/ Slicing data frames
In order to slice by labels you can use loc() attribute of the DataFrame.

Implementation — Image
Read 17 tweets
Projects Alert : 140 Python Projects with Source Code
You don't have to go to a university or pay hefty tuition to learn ML when you can learn for FREE.

An extensive list of 100+ Most Valuable Github Repository for ML, beginner to advanced.
theinsaneapp.com/2021/09/best-g…
#Python #DataScientist #Programming #Machinelearning #100DaysofCode
Read 5 tweets
10 Amazing Advanced Python Constructs that you can use to write efficient and clean Code
A Thread 🧵👇🏻
#Python #TensorFlow #DataScientist #Programming #Coding #100DaysofCode #DataScience #AI #MachineLearning
1/ DefaultDict
In python, a dictionary is a container that holds key-value pairs. Keys must be unique, immutable objects. If you try to access or modify keys that don’t exist in the dictionary, it raise a KeyError & break up your code execution ( continued..)
2/ (Continued..)To tackle this issue,Python defaultdict type, a dictionary-like class is used.If you try to access or modify a missing key,then defaultdict will automatically create the key & generate a default value for it
A defaultdict will never raise a KeyError ( Continued..)
Read 18 tweets
How to get started in #datascience?

👀🧵👇 See thread below
2/ 1. Craft your own personal learning plan
Earlier this year I made a video that details the steps you can take to craft your own personal learning plan for your data journey. Everyone's plan is different, make your own! Here's how...
3/ 2. Work on data projects using datasets that is interesting to you
When starting out, I found that working on datasets that's interesting to you will help you engage in the process. Be persistent and work on the project to completion (end-to-end).
How? Data→Model→ Deployment
Read 10 tweets
Two things that have come naturally to me since childhood have been Math and Entrepreneurship. I was actually preparing to become a Math professor at my university. Now, I'm a data scientist at Microsoft. What happened? Thread 👇🏾 (1/6)

#DataScience
For most of my life, I was known as an excellent student. However, for the first time ever, due to a hiccup, I failed in my 3rd and 4th years of university while studying Actuarial Science. This had a severe impact that I almost gave up on everything. I felt so defeated. 😞(2/6)
So when my professor suggested that I apply to do a Master's in Statistics, I couldn't be bothered. But I also thought to myself that the worst had already happened so doing that Master's will just help me pass time. While in the program, I got the idea to start a business. (3/6)
Read 6 tweets
💡 A comprehensive roadmap to data infrastructure: bvp.com/atlas/roadmap-…

#DataInfrastructure #Roadmap #DataScience
1. Growth in adoption of cloud software: As companies in all industries and sizes adopt various cloud-based software to run their businesses, they have had to deal with data sprawl across a number of different sources and systems.
2. Increase in the volume of accessible data: Movement to the cloud and the growth of software users around the world has also generated more data at an exponential rate.
Read 7 tweets
The big thing that I'd change here is the color palette.

This color palette is hard to interpret and frankly, just look a little ugly.

#datascience #DataVisualization

[1/11]
[2/11]

The fix here is pretty simple.

The data are sequential in nature. There's a low and a high.

When you have sequential data, you should almost always look at sequential color palettes.

[3/11]

More specifically:

For sequential data, your go-to palettes should almost always be perceptually uniform sequential palettes like viridis or magma.

Read 11 tweets
Here’s a cartoon illustration I’ve drawn a while back:
The #machinelearning learning curve

👀🧵👇 See thread below
2/ Starting the learning journey
The hardest part of learning data science is taking that first step to actually start the journey.
3/ Consistency and Accountability
After taking that first step, it may be challenging to maintain the consistency needed to push through with the learning process. And that’s where accountability steps in.
Read 9 tweets
If you want to create great data visualizations, you need to understand color palettes.

Here are a few quick tips:

[1/n]

#datascience #datavisualization #Python #rstats
[2/n]

For data that has a sequential ordering (i.e., low to high), you should use sequential color scales.

matplotlib.org/stable/tutoria…

#Python #matplotlib Image
[3/n]

Sequential color scales incrementally change saturation or lightness.

For example, this is a red-sequential color palette: Image
Read 24 tweets
If you want to master data science in Python, you need to learn Pandas method chaining.



[thread: 1/14]

#data #datascience #Python
[2/14]

Pandas method chains enable you to combine together several individual Pandas techniques in complex ways.
[3/14]

When most people do this, they do it with very long chains of techniques, *all on a single line*.

These are hard to read and hard to debug.

They get more challenging the longer they get.
Read 14 tweets
How to Add New Variables to a Python Dataframe

sharpsightlabs.com/blog/pandas-as…

[Thread: 1/9]

#data #datascience #Python Image
[2/9]

There are several ways to add a variable to a Python dataframe ...

But my preferred way is the Pandas "assign" method.
[3/9]

The Pandas assign method has fairly simple syntax.

You can use the technique to add a single new variable like this: Image
Read 9 tweets
We have a new working paper up on @arxiv titled:

"Double Machine Learning and Bad Controls — A Cautionary Tale" (with @beyers_louw & @itamarcaspi)

Link: arxiv.org/abs/2108.11294
#CausalInference #Causality #DataScience #Econometrics #MachineLearning 1/ Image
Double machine learning (DML) is getting more and more traction in econ. One important application – among several others – is in automatic model selection in high-dimensional settings. 2/
Suppose you're interested in estimating theta in the following regression:

Y = theta * D + X * beta + u

The covariate vector X possibly contains many variables, but only a few of them have non-zero coefficients. DML then allows you to automatically select relevant controls. 3/
Read 16 tweets
Climate change is making your morning coffee cup more expensive. Here's a brief story of how much more depends on your taste. /1 oec.world

#Brazil #Coffee #ClimateCrisis #ClimateEmergency #OECPro #ForeignTrade #EconomicComplexity #DataVisualization #DataScience
Brazil grows 28% of the world's Robusta crop (bitter, used for instant coffee and espresso), and 41% of Arabica, the beans favored by Starbucks & Dunkin'. Trade data shows how coffee is affected by climate change, and how our taste for coffee could evolve in a warmer future. /2 https://oec.world/en/profile/country/brahttps://oec.world/en/profile/country/bra
Coffee is one of the most traded commodities ($30B in 2019). Two species, Arabica(70%) and Robusta(30%), account for virtually all production. Climate change studies suggest a 50% reduction in the area suitable for Arabica by 2050. Robusta grows in hotter temperatures. /3 https://oec.world/en/profile/bilateral-product/coffee/report
Read 12 tweets
Are you aware of Boston Dynamics' amazingly advanced robots?
Well @elonmusk just announced he's entering the humanoid robots game and we're all EXCITED! 🤩

Read along! (Thread)

(1/n)

#machinelearning #tesla #BostonDynamics #ElonMusk #robot #AI
@elonmusk Tesla apart from being an automaker is also popularly known for its AI capabilities. The FSD or full self driving capabilities of tesla are unmatched in the 🚗 industry!
.
(2/n)

#machinelearning #tesla #BostonDynamics #ElonMusk #robot #AI
@elonmusk Elon Musk on Thursday unveiled a humanoid robot called the Tesla Bot that runs on the same AI used by Tesla's fleet of autonomous vehicles. 🤩

(3/n)

#machinelearning #tesla #BostonDynamics #ElonMusk #robot #AI
Read 7 tweets
1/

Thread of the very best #YouTube channels and #Twitter accounts to follow for:

#AI/ #ML, #DeepLearning, #neural and all things #datascience

bit.ly/3g8pVDL

#AI #machinelearning @wiserin10 #datascience #bigdata #artificialintelligence
2/

@Analyticsindiam

Analytics India Magazine includes discussions on news, tips for the data ecosystem and a deep dive into #AI/#ML, #deeplearning and #neural networks

#YouTube subscriber count: 38k
3/

@Krishnaik06

Krish Naik is co-founder of iNeuron.ai and specialises in #machinelearning, #deeplearning, and computer vision. Krish’s #YouTube channel is a deep dive into all things #AI/#ML, perfect for beginners

YouTube subscriber count: 421k
Read 15 tweets
#DataScience Project 4

Customer Segmentation

- Use Machine Learning to create a model that performs Customer Segmentation

Libraries Used
- Numpy
- Pandas
- Matplotlib
- Seaborn
- Scikit learn

Models Trained
- KMeans Clustering
- Hierarchical Clustering
Code for this project can be found here 👇

[Please do consider giving an upvote if you find this notebook to be useful 😀]

kaggle.com/piyalbanik/seg…
1. Business Understanding

The goal of this project is to divide customers into groups based on common characteristics in order to maximize the value of each customer to the business.
Read 13 tweets
Cheat sheet that summarizes #DataScience in 10 pages
(Links in the comments below 👇)
2/ Link to the cheatsheet by Maverick Lin
github.com/ml874/Data-Sci…
3/ Topics include:
- Overview of Data science
- Probability and Statistics
- Data cleaning
- Feature engineering
- Modeling
- Classical Machine learning
- Deep learning
- SQL
- Python data structures
Read 4 tweets
No-Code AI or ML in your projects? I did an hour of research, and those are the 9 best tools & APIs for integrating into your projects.

Happy predicting 🤖

#lowcode #nocode #machinelearning #datascience

🧵👇
👉 @clarifai

✅ computer vision, natural language processing, and automated machine learning
✅ many ready-to-use models
👉 @levityai

✅ flexible ai model creation in a visual drag and drop editor
✅ computer vision, text processing
Read 11 tweets
7 Websites to Learn DSA for Free 🔥

Mega thread 🧵
#DataScience #DSA
1️⃣ Programiz
Learn to code with our beginner-friendly tutorials and examples.

programiz.com 🔗
2️⃣ CodeChef
CodeChef is a competitive programming community of programmers from across the globe.

codechef.com 🔗
Read 9 tweets
#JSM2021 panel led by @minebocek on upskilling for a statistician -- how to learn??
@minebocek #JSM2021 @hglanz no shortage of stuff to learn. First identify what you don't know -- that comes from modern media (blogs, twitter, podcasts; groups, communities -- @RLadiesGlobal or local chapters; professional organizations -- @amstatnews ).
@minebocek @hglanz @RLadiesGlobal @AmstatNews #JSM2021 @hglanz What do the job postings require these days? (This is how the content for the @CalPoly stat/data science program was developed.)
Read 64 tweets
#DataScience Project 3

Best Suburb to Open a Cafeteria in Melbourne 🇦🇺

- Create a Machine Learning model which suggests a location to open a Cafe.

Libraries Used
- Numpy
- Pandas
- Matplotlib
- Scikit Learn
- BeautifulSoup
- Geocoder
- Folium

Model Used:
- K Means Clustering
Please Note: the main focus of this project was on data collection, visualization, and training a model. Did not involve data cleaning.

Code for this project 👇
github.com/Piyal-Banik/Me…
1. Business Understanding:

The main goal of this project is to collect and analyze data in order to select a location in Melbourne to open a Cafeteria. We want to help a business owner planning to open up a Cafe in a location by exploring better facilities around the Suburb.
Read 17 tweets
US' Pentagon believes its precognitive AI can predict events 'days in advance' using Machine Learning! 😮

Read along 👇

#BigData #Analytics #DataScience #AI #MachineLearning #Python #Coding #100DaysofCode #MLBS4Spoilers
US Northern Command recently completed a string of tests for Global Information Dominance Experiments (GIDE)

It is a combination of :

👉 AI
👉 Cloud Computing and
👉 Sensors

(2/n)
The machine learning-based system observes changes in raw, real-time data that hint at possible trouble. 🙌

This could lead to a major change in military and government operations in the US.

(3/n)
Read 7 tweets
The ABSOLUTE ESSENTIALS of Bias/Variance Analysis

🧵This thread will cover the following concepts:
a. Bayes Error
b. Bias vs Variance
c. Possible Solutions

(Explanation + Examples)

#MachineLearning #DataScience
📜Introduction
- After training a ML model, it is important to assess its performance before putting it into production.
- We start by measuring the model performance on the training set to evaluate how well the model fits the training data.
- Then we measure the model performance on the test set to evaluate the generalization error.

To measure the model performance on the training set, we need a reference value against which we can compare the model performance.
This reference value is called "Bayes Error". 👇
Read 20 tweets
1/ #Pandas is the go-to library that you need for #datawrangling for your #datascience projects when coding in #Python.
👀🧵👇 See thread below
2/ Why Do We Need Pandas?
The Pandas library has a large set of features that will allow you to perform tasks from the first intake of raw data, its cleaning and transformation to the final curated form in order to validate hypothesis testing and machine learning model building.
3/ Basics of Pandas - 1. Pandas Objects
Pandas allows us to work with tabular datasets. The basic data structures of Pandas that consists of 3 types: Series, DataFrame and DataFrameIndex. The first 2 are data structures while the latter serves as a point of reference.
Read 10 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!