๐—จ๐—ป๐—ฑ๐—ฒ๐—ฟ๐˜€๐˜๐—ฎ๐—ป๐—ฑ๐—ถ๐—ป๐—ด ๐˜๐—ต๐—ฒ ๐—œ๐—บ๐—ฝ๐—ผ๐—ฟ๐˜๐—ฎ๐—ป๐—ฐ๐—ฒ ๐—ผ๐—ณ ๐—Ÿ๐—ผ๐—ด ๐—ฅ๐—ฒ๐˜๐˜‚๐—ฟ๐—ป๐˜€ ๐—ถ๐—ป ๐—™๐—ถ๐—ป๐—ฎ๐—ป๐—ฐ๐—ฒ ๐Ÿ’ฐ

Why Log Returns and not a simple price difference?

Let's develop this a bit more.

๐Ÿงต ๐Ÿ‘‡ Image
In finance, one of the most crucial types of data is price information.

However, when it comes to Time Series analysis, using raw prices can be problematic.

Let's see why log returns are preferred over simple price differences and their significance in financial modeling.
๐—ง๐—ต๐—ฒ ๐—ฃ๐—ฟ๐—ผ๐—ฏ๐—น๐—ฒ๐—บ ๐˜„๐—ถ๐˜๐—ต ๐—ก๐—ผ๐—ป-๐—ฆ๐˜๐—ฎ๐˜๐—ถ๐—ผ๐—ป๐—ฎ๐—ฟ๐˜† ๐——๐—ฎ๐˜๐—ฎ

In Time Series analysis, non-stationary data can introduce a lot of noise and make it difficult to identify underlying patterns.

๐Ÿ‘‰ This is why raw prices are generally not used in financial analyses.
๐—”๐—ฏ๐˜€๐—ผ๐—น๐˜‚๐˜๐—ฒ ๐˜ƒ๐˜€. ๐—ฅ๐—ฒ๐—น๐—ฎ๐˜๐—ถ๐˜ƒ๐—ฒ ๐—–๐—ต๐—ฎ๐—ป๐—ด๐—ฒ

You might think that using the difference between prices could solve this issue.

This approach only provides the absolute change and lacks information on the relative or percentage change, which is often more insightful.
๐—Ÿ๐—ผ๐—ด ๐—ฅ๐—ฒ๐˜๐˜‚๐—ฟ๐—ป๐˜€

This is where log returns come into the picture.

Log returns not only capture the relative change but also offer two additional advantages:

1๏ธโƒฃ Time Additivity
2๏ธโƒฃ Statistical Properties
1๏ธโƒฃ ๐—ง๐—ถ๐—บ๐—ฒ ๐—”๐—ฑ๐—ฑ๐—ถ๐˜๐—ถ๐˜ƒ๐—ถ๐˜๐˜†

Log returns are time-additive. You could simply sum the log returns for two consecutive periods to get the log return for the combined period.

This property is incredibly useful for simplifying analyses and computations in time series modeling.
2๏ธโƒฃ ๐—ฆ๐˜๐—ฎ๐˜๐—ถ๐˜€๐˜๐—ถ๐—ฐ๐—ฎ๐—น ๐—ฃ๐—ฟ๐—ผ๐—ฝ๐—ฒ๐—ฟ๐˜๐—ถ๐—ฒ๐˜€

Log returns tend to be more normally distributed than simple returns, especially when returns are high.

The assumption of normality is foundational to many financial theories and models.
The formula for calculating log returns, using the natural logarithm ( ๐‘™๐‘› ), is:

๐‘…โ‚œ = ๐‘™๐‘›( ๐‘ƒโ‚œ / ๐‘ƒโ‚œโ‚‹โ‚ )

Where:
- ๐‘…โ‚œ is the log return at time ( ๐‘ก )
- ๐‘ƒโ‚œ is the price at time ( ๐‘ก )
- ๐‘ƒโ‚œโ‚‹โ‚ is the price at the previous time period
โœฆ The log return is the natural logarithm of the ratio of the price at time ( ๐‘ก ) to the price at the previous time period ( ๐‘ก-1 ).
Using log returns:

โ€ข simplifies mathematical modeling

โ€ข makes time series analysis more straightforward

โ€ข aligns well with the statistical assumptions made in financial theories
You can read more about this in my article:

mlpills.dev/time-series/xgโ€ฆ
You should also join our newsletter, DSBoost๐Ÿš€

Every week we share:
๐Ÿ”นInterviews
๐Ÿ”นPodcast notes
๐Ÿ”นLearning resources
๐Ÿ”นInteresting collections of content

Subscribe for free๐Ÿ‘‡๐Ÿ‘‡
dsboost.dev

โ€ข โ€ข โ€ข

Missing some Tweet in this thread? You can try to force a refresh
ใ€€

Keep Current with David Andrรฉs ๐Ÿค–๐Ÿ“ˆ๐Ÿ

David Andrรฉs ๐Ÿค–๐Ÿ“ˆ๐Ÿ Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @daansan_ml

Sep 17
Feature encoding is key, discover ๐—ข๐—ป๐—ฒ ๐—›๐—ผ๐˜ ๐—˜๐—ป๐—ฐ๐—ผ๐—ฑ๐—ถ๐—ป๐—ด.

A very useful technique when you don't have many distinct values in a column.

Find out more about it ๐Ÿงต ๐Ÿ‘‡ Image
It converts each unique category into a new binary column of 1 or 0.

๐Ÿ”ง When should you use it?
For nominal categories where no ordinal relationship exists.
๐ŸŸข Pros:

โ€ข Easy to use and interpret.

โ€ข No ordinal relationships are introduced.
Read 5 tweets
Sep 15
Feature encoding is key for many models.

The most basic technique is ๐—Ÿ๐—ฎ๐—ฏ๐—ฒ๐—น ๐—˜๐—ป๐—ฐ๐—ผ๐—ฑ๐—ถ๐—ป๐—ด.

Find out more about it ๐Ÿงต ๐Ÿ‘‡ Image
In this technique, each unique category is mapped to an integer starting from 0.

It does not assume any relationship of order or magnitude between the categories โ†’ categories are numbered arbitrarily.
๐Ÿ”ง When should you use it?

It is best suited for ordinal data where the order matters but can be used for nominal data when the algorithm can handle it correctly (e.g., decision trees).
Read 6 tweets
Sep 10
You want to forecast the price of the EUR/USD pair.

You could definitely use the price during the previous days, but what if you could improve that? ๐Ÿค”

Discover what else you can use here ๐Ÿงต ๐Ÿ‘‡ Image
0๏ธโƒฃ As mentioned, we could and should use the previous prices of this currency pair. Of course, we should convert it to Log Returns first (see my article in the last tweet) to make them stationary.

But that's not sufficient, many other variables influence EUR/USD price.
The EUR/USD price is your endogenous variable.

These additional variables, also called exogenous, can help you make more accurate forecasts ๐Ÿ‘‡
Read 11 tweets
Sep 5
Find out more about another feature scaling technique:

โœจStandard Scaling or Z-score Normalizationโœจ

๐Ÿงต ๐Ÿ‘‡ Image
In this case, features are scaled so that they have the properties of a standard normal distribution with mean ฮผ=0 and standard deviation ฯƒ=1.
๐Ÿ”งUse it when the algorithm assumes that the distribution of your features is Gaussian.

This method is also useful as a general technique when you don't know the distribution of your feature and you're not particularly concerned about robustness to outliers.
Read 7 tweets
Sep 3
Discover one of the most used feature scaling techniques:

โœจMin-Max Scalingโœจ

๐Ÿงต ๐Ÿ‘‡ Image
This is the simplest form of normalization.

๐Ÿ‘‰ The idea is to scale the range of each feature (like age, salary, etc.) so that they all fit within a specific range, usually between 0 and 1. This can make it easier for machine learning algorithms to learn from the data.
๐Ÿ”ง Use it when the distribution of the feature is not Gaussian and you need values in a bounded interval. However, this method is sensitive to outliers.
Read 7 tweets
Aug 9
ARIMA is one of the most popular traditional statistical methods used for time series forecasting.

THREAD ๐Ÿงต ๐Ÿ‘‡ Image
ARIMA stands for Auto-Regressive Integrated Moving Average.

It is composed of 3 components:

๐Ÿ”น Auto-Regressive (AR)
๐Ÿ”น Integrated (I)
๐Ÿ”น Moving Average (MA)
1๏ธโƒฃ Auto-Regressive (AR) models use a linear combination of past values of the variable of interest.

They are described by the parameter "p", which refers to the number of previous values to consider for the forecast.
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(