Let's continue with the #TimeSeries forecasting ๐Ÿ˜€

We want to use an #ARIMA to forecast the #BTC price. But... how can we select its parameters?

#machinelearning #python #datascience Image
Let's first start with "p" and "q".

For this, we need to check how correlated the time series is with lagged versions of itself.

The original time series will be denoted as 0. A 1-timestep lagged version will be referred to as 1. And so on...
The correlation of the original time series with the lag 0 (no lagged), will always be equal to 1, since they are the same time series.

The questions are...
โ€ข What is the correlation of the original time series with lag 1?
โ€ข What about lag 2?
โ€ข And 3, 4, 5...?
A timestep โ€œtโ€ will have some correlation with the previous one (t-1) just by being adjacent to it.

For example, with the #Bitcoin price, the price today will be influenced by the price it had yesterday. Also, the price yesterday will be affected by the price the day before.
There are two effects that we need to account for:

1๏ธโƒฃ The indirect effect that the timestep t-1, t-2, t-3โ€ฆ have over timestep t, just by being adjacent to one another. This is what we have just explained with the #BTC price.
2๏ธโƒฃ The direct effect that each of the previous timesteps has over timestep t.

If for instance, every three days (for the #BTC example) there is a special event, it is expected that we will see a direct correlation of the time series with a lag 3 series.
To measure these two effects we can use the ACF and PACF graphs:

ยท ACF (AutoCorrelation Function) shows the correlation between timesteps. It includes both direct and indirect effects.

ยท PACF (Partial AutoCorrelation Function) shows only the direct correlation.
We use the so-called "lollipop" graphs to visualise it.

There are several spikes or "lollipops" on it. They indicate the correlation of each lag (on the x-axis) with the original non-lagged time series. Image
We will pay attention to the largest spike closer to the lag 0 which is significantly different from 0 (out of the blue-shaded area).

But for today is enough. Tomorrow we will see how we select "p" and "q" for this example.

Follow me on @daansan_ml to find out ๐Ÿ˜‰
If you don't know what those parameters are, check my previous tweet ๐Ÿ‘‡

โ€ข โ€ข โ€ข

Missing some Tweet in this thread? You can try to force a refresh
ใ€€

Keep Current with David Andrรฉs ๐Ÿค–๐Ÿ“ˆ๐Ÿ

David Andrรฉs ๐Ÿค–๐Ÿ“ˆ๐Ÿ Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @daansan_ml

Nov 11
We have so many combinations of parameters for our ARIMA model!! ๐Ÿคฏ

Find out here how to choose the best one ๐Ÿงต๐Ÿ‘‡

#TimeSeries #DataScience
๐Ÿ”Ž The objective of our ARIMA model is to maximize the Log-Likelihood (โ„“). Which indicates how well the model fits our data.

๐Ÿ‘Ž However, this can lead to an over-complicated model that overfits the training data. We want it to be also valid for unseen data!!

What can we do? ๐Ÿ‘‡
We can use the AIC which stands for Akaike Information Criterion.

It is an estimator of the relative quality of a model for a given set of data.
Read 8 tweets
Nov 10
Is the #Bitcoin price a stationary times series? ๐Ÿค”

Check ๐Ÿ‘‡ how to find it out!

#DataScience #MachineLearning
We know that stationary means that the mean and variance of the time series data do not vary across time.

To be sure of that we can perform the Augmented Dickey-Fuller test.
If you are familiar with #statistics here you have the hypotheses for this test:

H0๏ธโƒฃ (Null hypothesis) = Time series non-stationary

H1๏ธโƒฃ (Alternative hypothesis) = Time series is stationary

After the test, we will pay attention to the "p-value".
Read 11 tweets
Nov 9
Time Series analysis and forecasting is a really valuable skill to have in #DataScience.

Here is WHY๐Ÿงต๐Ÿ‘‡
:one: All companies are interested in making money. Time series is really powerful in #finance! ๐Ÿ“ˆ๐Ÿ“‰

There will always be demand for someone who can analyse and forecast financial data. Plus it can bring you a lot of money if you can increase the profit of a company! ๐Ÿ’ฐ๐Ÿ’ฐ๐Ÿ’ฐ
:two: There are multiple applications for Time Series: forecasting sales, unemployment rate, COVID cases, petrol price, temperatures...

There is a demand for Data Scientists with this skill everywhere! You are not restricted to a particular field ๐Ÿ”ญ๐Ÿ’Š๐Ÿงฌ๐Ÿ“ก or location ๐Ÿ‡ช๐Ÿ‡บ๐Ÿ‡บ๐Ÿ‡ธ๐Ÿ‡ฎ๐Ÿ‡ณ!
Read 6 tweets
Nov 9
ARMA cannot be used with #Bitcoin prices!

Here is why ๐Ÿ‘‡๐Ÿ‘‡

#timeseries #datascience Image
You can't use an ARMA model with prices because for using this kind of model you need a stationary time series!

What does it mean? ๐Ÿ‘‡
โญ๏ธA time series is stationary if its mean and variance do not vary across time.

โญ๏ธIt is considered non-stationary if there is a strong trend or seasonality observed from the data.
Read 11 tweets
Nov 8
Use ARIMA to forecast #BTC price!

ACF and PACF graphs can help, here is how๐Ÿ‘‡๐Ÿ‘‡๐Ÿ‘‡ Image
The lag 0 (no lagged time series) is the original time series, so its correlation will always be 1. This lag is redundant and it can be ignored.

Values within the blue-shaded area are statistically non-different from 0 (no correlation). Image
Let's see how these graphs can help us get a first estimate of the parameters "p" and "q". ๐Ÿ‘‡๐Ÿ‘‡๐Ÿ‘‡

But if you don't remember what ACF and PACF or lags mean, have a look at yesterday's thread first!
Read 10 tweets
Nov 3
I have started the #TimeSeries project about predicting Bitcoin #BTC price.

First I am going to try a traditional method: #ARIMA. Will it work? ๐Ÿค”

If you don't know what it is ๐Ÿงต๐Ÿ‘‡

#datascience #artificialintelligence #machinelearning
It is composed of 3 elements:
1๏ธโƒฃ AR: Auto-Regressive
2๏ธโƒฃ I: Integrated
3๏ธโƒฃ MA: Moving Average

Let's introduce them... ๐Ÿ‘‡
1๏ธโƒฃ [ AR ]-IMA

๐Ÿ”ŽIt takes into account "p" previous values, in my case previous #BTC prices.

๐Ÿ”ฎTo make the prediction it relies on the previous prices.
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(