Tweet

Rohan Paul

May 27 • 10 tweets • 17 min read

Outlier Detection with Alibi Detect Library - A Thread
1/n

#DataScience #DeepLearning #MachineLearning #ComputerVision #100DaysOfMLCode #Python #DataScientist #Statistics #programming #Data #Math #Stat #pythoncode #AI #ArtificialIntelligence

2/n

Alibi Detect is a Python library for detecting outliers, adversarial data, and drift. Accommodates tabular data, text, images, and time series that can be used both online and offline. Both TensorFlow and PyTorch backends are supported

#DataScience #DeepLearning

3/n

Supports a variety of outlier detection techniques, including Mahalanobis distance, Isolation forest, and Seq2seq

#DataScience #DeepLearning #MachineLearning #ComputerVision #100DaysOfMLCode #Python #DataScientist #Statistics #programming #Data #Math #Stat #pythoncode

4/n

Mahalanobis Distance - Predict anomalies in tabular data. The algorithm computes an outlier score, which is a measure of distance from the feature distribution’s center (Mahalanobis distance).
#DataScience #DeepLearning #MachineLearning #ComputerVision #100DaysOfMLCode

5/n

Mahalanobis Distance - If this outlier score exceeds a user-specified threshold, the observation is marked as an outlier.

#DataScience #DeepLearning #MachineLearning #ComputerVision #100DaysOfMLCode #Python #DataScientist #Statistics #programming #Data #Math #Stat

6/n

Mahalanobis Distance - The algorithm is online, which means it begins with no knowledge of feature distribution and learns as requests arrive. As a result, you should expect the output to be poor at first and improve over time.

#DataScience #DeepLearning #MachineLearning

7/n

Variational Auto-Encoders - This is first trained on a batch of unlabeled but normal (Linear) data. Because labeled data is often scarce, unsupervised or semi-supervised training is preferable.

#DataScience #DeepLearning #MachineLearning #ComputerVision #100DaysOfMLCode

8/n

Variational Auto-Encoders - The VAE detector makes an attempt to reconstruct the data it receives. The reconstruction error is high if the input data cannot be reconstructed well, and the data can be flagged as an outlier.

#DataScience #DeepLearning #MachineLearning

9/n

Variational Auto-Encoders - The mean squared error (MSE) between the input and the reconstructed instance or the probability that both the input and the reconstructed instance are generated by the same process is used to calculate the reconstruction error.

#DataScience #AI

10/n
Variational Auto-Encoders - Usage Syntax

#DataScience #DeepLearning #MachineLearning #ComputerVision #100DaysOfMLCode #Python #DataScientist #Statistics #programming #Data #Math #Stat #pythoncode #AI #ArtificialIntelligence

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @rohanpaul_ai

Rohan Paul

@rohanpaul_ai

May 28

1/ "Software is eating the world. Machine learning is eating software. Transformers are eating machine learning."

Let's understand what these Transformers are all about

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI #DataAnalytics

2/ #Transformers architecture follows Encoder and Decoder structure.

The encoder receives input sequence and creates intermediate representation by applying embedding and attention mechanism.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI

3/ Then, this intermediate representation or hidden state will pass through the decoder, and the decoder starts generating an output sequence.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI #DataScientist #DataAnalytics #Statistics

Read 14 tweets

Rohan Paul

@rohanpaul_ai

May 28

But what p-value means in #MachineLearning - A thread

It tells you how likely it is that your data could have occurred under the null hypothesis

1/n

#DataScience #DeepLearning #ComputerVision #100DaysOfMLCode #Python #DataScientist #Statistics #programming #Data #Math #Stat

2/n
What Is a Null Hypothesis?

A null hypothesis is a type of statistical hypothesis that proposes that no statistical significance exists in a set of given observations.

#DataScience #MachineLearning #100DaysOfMLCode #Python #stat #Statistics #Data #AI #Math #deeplearning

3/n
A P-value is the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the null hypothesis

#DataScience #MachineLearning #100DaysOfMLCode #Python #DataScientist #Statistics #Data #DataAnalytics #AI #Math

Read 11 tweets

Rohan Paul

@rohanpaul_ai

May 28

1/ One way to test whether a time series is stationary is to perform an augmented Dickey-Fuller test - A Thread

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI #DataScientist #DataAnalytics #Statistics #programming #ArtificialIntelligence

2/ H0: The time series is non-stationary. In other words, it has some time-dependent structure and does not have constant variance over time.

HA: The time series is stationary.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI #DataScientist

3/ If the p-value from the test is less than some significance level (e.g. α = .05), then we can reject the null hypothesis and conclude that the time series is stationary.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI #DataScientist

Read 8 tweets

Rohan Paul

@rohanpaul_ai

May 28

Kullback-Leibler (KL) Divergence - A Thread

It is a measure of how one probability distribution diverges from another expected probability distribution.

#DataScience #Statistics #DeepLearning #ComputerVision #100DaysOfMLCode #Python #programming #ArtificialIntelligence #Data

#DataScience #Statistics #DeepLearning #ComputerVision #100DaysOfMLCode #Python #programming #ArtificialIntelligence #Data #DataAnalytics #pythoncode #AI #MachineLearning #NeuralNetworks

#DataScience #Statistics #DeepLearning #ComputerVision #100DaysOfMLCode #Python #programming #ArtificialIntelligence #Data #DataAnalytics #pythoncode #AI #MachineLearning #NeuralNetworks

Read 6 tweets

Rohan Paul

@rohanpaul_ai

May 27

1/ When it is important to standardize variables in #DataScience #MachineLearning ? - A Thread

#DeepLearning #100DaysOfMLCode #Python #pythoncode #AI #DataScientist #DataAnalytics #Statistics #programming #ArtificialIntelligence #Data #Stats #Database #BigData #100DaysOfCode

2/ It is important to standardize variables before running Cluster Analysis. It is because cluster analysis techniques depend on the concept of measuring the distance between the different observations we're trying to cluster.

#DataScience #MachineLearning #DeepLearning