Let's start with some theory

I've been working with ML on the Audio domain and at first I couldn't understand much but as I kept reading I managed to figure out some things.

Let me share some of the basic theory with you:

[10 minutes read]

1/n🧵
Sound is a vibration that propagates as an acoustic wave.

It has some properties:
- Frequency
- Amplitude
- Speed
- Direction

For us, Frequency and Amplitude are the important features.

en.wikipedia.org/wiki/Sound#Sou…

2/n🧵
An important aspect is that Sounds are a mixture of their component Sinusoidal waves(follow a sine curve) of different frequencies

From the equation below:
- A is amplitude
- f is frequency
- t is time

The code replicates the formula and compose a third wave from 2 other

3/n🧵
Audio Signals are a representation of sound.

You get this data by taking samples of air pressure over time (sample rate)

When we say the sample rate is 44.1kHz, it means we take 44100 samples per second.

This results in a waveform.

en.wikipedia.org/wiki/Audio_sig…

4/n🧵
When you load a wav file, you get this waveform, or to be more precise, an array for N int16 numbers for each channel (mono vs stereo).

You can see the sample rate that was used and the duration of the audio.
Multiple ML models work at 16kHz sample rate.

5/n🧵
One common trick used to extract features or classify audio is to convert the waveform to a spectrogram. It can be used as a 2D image and use CNN layers to extract features from it. From here it would work as an image classification model

But, wait, what's a spectrogram?

6/n🧵
A spectrogram is a visual representation of the spectrum of frequencies over time

But how do you go from the time domain (waveform) to the frequency domain?

Here is where one of the most famous math equations comes to the rescue -> The Fourier Transformation

7/n🧵
The Fourier transformation can convert from the time domain to the frequency domain.

It can, given a complex waveform, extract all the frequencies and amplitudes that form it.

The code below can help you visualize the FT from previous waves

en.wikipedia.org/wiki/Fourier_t…

8/n🧵
Applying FT to the waveform leads to 1000s of frequencies and these frequencies vary over time, as the audio changes (eg: from silence to a dog barks to silence...).

9/n🧵
To solve this, It's better to apply sequentially to parts (windows) of it

FT is quite complex and slow for practical use.

That's why a Discrete version was derived: Fast Fourier Transformation (FFT).

10/n🧵
<INTERMISSION>

Are you enjoying this thread? I hope so!

Don't forget to follow, comment and like, this helps me improve the content and keep the conversation going!

the rest is less than 5 minutes!!! Let's go!

</INTERMISSION>

?/n🧵
Now we can create the spectrogram!

A Spectrogram consists of:
- The y-axis is the frequency in Hz
- The x-axis is the time
- The color represents the magnitude or amplitude (the brighter the higher). Usually in decibels (dB)

11/n🧵
TensorFlow have some methods to help you create the spectrogram based on the waveform

tensorflow.org/io/api_docs/py…

12/n🧵
This is a very brief overview of how we go from a sound to a spectrogram.

This helps understand some of the techniques that are used for audio analysis

I used multiple resources but this video is a very important one:

13/n🧵
If you came this far, thanks!!!

It was a long thread and I hope you learned something!
If you have questions please leave them in the comments so I can improve the content for everyone!

14/14🧵

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Luiz GUStavo

Luiz GUStavo Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @gusthema

19 Jan
For developers, a good debugger and profiler are fundamental tools for their productivity.

On the ML world, TensorBoard can help you with that by enabling:

- Visualizing metrics, model, histograms of weights or biases
- Displaying images, text and audio data
- Profiling

1/5🧵
You can load TensorBoard directly on Colab using a magic word to load the extension and another one to load the tool.

The nice part is that this does not require installing anything on your computer.

2/5🧵
To visualize your training data, you'll need to create a callback and use it on the fit method.

The callback just needs the directory where the log will be written.

3/5🧵
Read 5 tweets
18 Jan
Machine Learning models can be classified regarding how much human supervision they need.

This affects the algorithms used and the types of tasks that it can solve.

You can categorize Machine Learning models in 4 major categories:

1/5🧵 #ML #MondayMotivation
Supervised Learning is when you train a model from the input data and ALL their corresponding labels.

Examples of
- Tasks: classification and regression
- Algorithms: kNN, Linear and Logistic regression, SVM, Decision Tree, Neural Networks(*)

2/5🧵
Unsupervised Learning is when you use unlabelled data to train your model.

Examples of
- Tasks: Clustering, Anomaly Detection, Visualization and Dimension reduction, Association rule
- Algorithms: K-means, PCA, DBSCAN

3/5🧵
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!