But what p-value means in #MachineLearning - A thread

It tells you how likely it is that your data could have occurred under the null hypothesis

It tells you how likely it is that your data could have occurred under the null hypothesis

What Is a Null Hypothesis?

A null hypothesis is a type of statistical hypothesis that proposes that no statistical significance exists in a set of given observations.

What Is a Null Hypothesis?

A null hypothesis is a type of statistical hypothesis that proposes that no statistical significance exists in a set of given observations.

A P-value is the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the null hypothesis

A P-value is the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the null hypothesis

A thread on AUC Score (Area under the ROC Curve) Interpretation in #DataScience #MachineLearning

"roc_auc_score" is defined as the area under the ROC curve, which is the curve having False Positive Rate on the x-axis and True Positive Rate on the y-axis at all classification thresholds.

"roc_auc_score" is defined as the area under the ROC curve, which is the curve having False Positive Rate on the x-axis and True Positive Rate on the y-axis at all classification thresholds.

AUC ranges in value from 0 to 1.

AUC ranges in value from 0 to 1.

What is p-value - A thread

It tells you how likely it is that your data could have occurred under the null hypothesis.

It tells you how likely it is that your data could have occurred under the null hypothesis.

What Is a Null Hypothesis?

A null hypothesis is a type of statistical hypothesis that proposes that no statistical significance exists in a set of given observations.

What Is a Null Hypothesis?

A null hypothesis is a type of statistical hypothesis that proposes that no statistical significance exists in a set of given observations.

A P-value is the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the null hypothesis

A P-value is the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the null hypothesis

[Data Analysis] 🧵

Exploratory data analysis is a fundamental step in any analysis work. You don't have to be a data scientist and be proficient at modeling to be a useful asset to your client if you can do great EDA.

Here's a template of a basic yet powerful EDA workflow👇

Here's a template of a basic yet powerful EDA workflow👇

Exploratory data analysis is a fundamental step in any analysis work. You don't have to be a data scientist and be proficient at modeling to be a useful asset to your client if you can do great EDA.

Here's a template of a basic yet powerful EDA workflow👇

Here's a template of a basic yet powerful EDA workflow👇

EDA is incredibly useful. Proper modeling CANNOT happen without it.

The truth:

Stakeholders NEED it far more than modeling.

EDA empowers the analyst with knowledge about the data, which then moderates the #machinelearning pipeline

The truth:

Stakeholders NEED it far more than modeling.

EDA empowers the analyst with knowledge about the data, which then moderates the #machinelearning pipeline

While #pandas and #matplotlib are key to good EDA in #python, the real difference are the QUESTIONS you ask to your dataset.

As in all things, these tools are just tools. The real weapon is the analyst. You are in control, not the dataset.

As in all things, these tools are just tools. The real weapon is the analyst. You are in control, not the dataset.

A thread on AUC Score (Area under the ROC Curve) Interpretation in #DataScience #MachineLearning

"roc_auc_score" is defined as the area under the ROC curve, which is the curve having False Positive Rate on the x-axis and True Positive Rate on the y-axis at all classification thresholds.

"roc_auc_score" is defined as the area under the ROC curve, which is the curve having False Positive Rate on the x-axis and True Positive Rate on the y-axis at all classification thresholds.

AUC ranges in value from 0 to 1.

AUC ranges in value from 0 to 1.

I realized today one of the reasons I appreciate working for a system integrator. Last time this year I was training in #Scala and #Spark. Then, I was placed in a project using #python, #pandas, #pyspark and #sql. That ended 12/31/21 and I had a short break before joining...

a pilot project working with #terraform, #aws, #boto3 with #python, #informatica, #autosys and #shellScripting. Not to mention the increasing frequency of situations that are like "Here is some technology setup that you have never seen before, here is some rudimentary docs,...

Figure it out in 1 or 2 sprints, provide deliverable artifacts and documentation on the preexisting system AND the new artifacts. Thx, k. Bai!" Oh, and teammates you joined with, they all have a shit-ton of stuff, so don't expect replies to your msgs.

Cue: 1. panic, then...

Cue: 1. panic, then...

Resources for getting started in #DataScience

Where to start? What to read? What to learn? I got you covered.

🧵 See thread below 👇

Where to start? What to read? What to learn? I got you covered.

🧵 See thread below 👇

2/ Roadmap to #datascience

Here's my 4 step process on becoming a data scientist

1. Plan

2. Learn

3. Build

4. Explain

👉 Video

👉 Blog towardsdatascience.com/the-art-of-lea…

Here's my 4 step process on becoming a data scientist

1. Plan

2. Learn

3. Build

4. Explain

👉 Video

👉 Video

👉 Blog towardsdatascience.com/the-art-of-lea…

3/ Create your own #datascience learning curriculum

Everyone's interest or needs are different. Therefore the destiny of everyone's learning journey is also different.

Create your own curriculum. Here's how:

👉 Video

👉 Blog towardsdatascience.com/how-to-create-…

Everyone's interest or needs are different. Therefore the destiny of everyone's learning journey is also different.

Create your own curriculum. Here's how:

👉 Video

👉 Blog towardsdatascience.com/how-to-create-…

Want a #Python #pandas data frame with all Apollo missions, indexed by date?

df = pd.read_html('https://t.co/1OfUmAGe6N')[2]

df['Date'] = pd.to_datetime(df['Date'].str.replace('(–.+)?,', '', regex=True))

df = df.set_index('Date')

df = pd.read_html('https://t.co/1OfUmAGe6N')[2]

df['Date'] = pd.to_datetime(df['Date'].str.replace('(–.+)?,', '', regex=True))

df = df.set_index('Date')

The first line scrapes the Wikipedia page for the Apollo program, putting all HTML tables into data frames. The missions are in the third table, aka index 2.

The second line turns lines containing date ranges into single (launch) dates, also removing commas and hyphens.

The second line turns lines containing date ranges into single (launch) dates, also removing commas and hyphens.

That second line then takes the resulting cleaned-up date strings, and passes them to pd.to_datetime. The resulting datetime series is then assigned back to df['Date'].

Quick Thread : 5 Cool Advanced Pandas Techniques for Data Scientists

Some super-useful magic commands for Jupyter notebook - A thread

Types of magic commands

Line magics - starts with % character. Rest of the line is its argument passed without parentheses or quotes.

Cell magics - %% - can operate on multiple lines below their call.

Types of magic commands

Line magics - starts with % character. Rest of the line is its argument passed without parentheses or quotes.

Cell magics - %% - can operate on multiple lines below their call.

%load - load code from an external source into a cell in Jupyter Notebook

%load - load code from an external source into a cell in Jupyter Notebook

A thread - All the basic #Matrix #Algebra you will need in #MachineLearning #DeepLearning

A matrix A is a rectangular array of scalars usually presented in the following form

A matrix A is a rectangular array of scalars usually presented in the following form

A thread on AUC Score Interpretation

roc_auc_score is defined as the area under the ROC curve, having False Positive Rate on the x-axis and True Positive Rate on the y-axis at all classification thresholds

roc_auc_score is defined as the area under the ROC curve, having False Positive Rate on the x-axis and True Positive Rate on the y-axis at all classification thresholds

What is p-value

A P-value is the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the null hypothesis

P values address only one question:

How likely are your data, assuming a true null hypothesis ?

How likely are your data, assuming a true null hypothesis ?

Pandas is a fast, powerful, flexible and open source data analysis and manipulation tool.

A Mega thread 🧵covering 10 amazing Pandas hacks and how to efficiently use it(with Code Implementation)👇🏻

A Mega thread 🧵covering 10 amazing Pandas hacks and how to efficiently use it(with Code Implementation)👇🏻

1/ #Pandas is the go-to library that you need for #datawrangling for your #datascience projects when coding in #Python.

2/ Why Do We Need Pandas?

The Pandas library has a large set of features that will allow you to perform tasks from the first intake of raw data, its cleaning and transformation to the final curated form in order to validate hypothesis testing and machine learning model building.

The Pandas library has a large set of features that will allow you to perform tasks from the first intake of raw data, its cleaning and transformation to the final curated form in order to validate hypothesis testing and machine learning model building.

3/ Basics of Pandas - 1. Pandas Objects

Pandas allows us to work with tabular datasets. The basic data structures of Pandas that consists of 3 types: Series, DataFrame and DataFrameIndex. The first 2 are data structures while the latter serves as a point of reference.

Pandas allows us to work with tabular datasets. The basic data structures of Pandas that consists of 3 types: Series, DataFrame and DataFrameIndex. The first 2 are data structures while the latter serves as a point of reference.

Want to become a Data Scientist? Here are some great resources that you should watch in order;Statistics & Linear Algebra Not Included 🧵

Part 1 of Microsoft's Intro to Python Series:

Gets you up and running in python and introduces you to the basics of setting up your development environment.

youtube.com/playlist?list=…

youtube.com/playlist?list=…

Gets you up and running in python and introduces you to the basics of setting up your development environment.

youtube.com/playlist?list=…

youtube.com/playlist?list=…

Continuation of the series above.

youtube.com/playlist?list=…

youtube.com/playlist?list=…

How to move towards Quantitative Trading :

1) Learn Python

2) Learn key trading libraries & Pandas

3) Data visualisation using seaborn & matplotlib

3) Statistics for Financial Markets

4) Backtest your strategies

5)Optimise & Automate

6) ML insights for trading

1) Learn Python

2) Learn key trading libraries & Pandas

3) Data visualisation using seaborn & matplotlib

3) Statistics for Financial Markets

4) Backtest your strategies

5)Optimise & Automate

6) ML insights for trading

1) This is a good basic course on Udemy to learn about Python required for trading

udemy.com/course/python-…

udemy.com/course/python-…

2) Learn about Pandas( cruciwl for data cruncing & handling time series data)

learnpython.org/en/Pandas_Basi…

Also install talib: an essential library for a trader. It has inbuilt functions for all technical indicators making our life easy.

mrjbq7.github.io/ta-lib/doc_ind…

learnpython.org/en/Pandas_Basi…

Also install talib: an essential library for a trader. It has inbuilt functions for all technical indicators making our life easy.

mrjbq7.github.io/ta-lib/doc_ind…

I used Matlab for image processing for years. Tried to switch to Python 10 years ago but too many tools were still missing. Tried again 5 years ago and haven't touched Matlab ever since! The combination scikit-image + @ProjectJupyter was a real game-changer! A few more things:

On top of the great classics scientific stack (#numpy, #scipy, #pandas, #matplotlib) there's an entire ecosystem of new tools to handle all sorts of complex problems. E.g. #napari to visualize and annotate multi-dimensional data. @dask_dev to handle very large images.

Complex ML tools for image denoising like content-aware image restoration #CARE (github.com/csbdeep/csbdeep) or point-scanning super-resolution #PSSR (github.com/BPHO-Salk/PSSR) which are documented as Jupyter notebooks that really work "out of the box".