Tweet

Luiz GUStavo

1 Mar, 15 tweets, 7 min read

Let's start with some theory

I've been working with ML on the Audio domain and at first I couldn't understand much but as I kept reading I managed to figure out some things.

Let me share some of the basic theory with you:

[10 minutes read]

1/n🧵

Sound is a vibration that propagates as an acoustic wave.

It has some properties:
- Frequency
- Amplitude
- Speed
- Direction

For us, Frequency and Amplitude are the important features.

en.wikipedia.org/wiki/Sound#Sou…

2/n🧵

An important aspect is that Sounds are a mixture of their component Sinusoidal waves(follow a sine curve) of different frequencies

From the equation below:
- A is amplitude
- f is frequency
- t is time

The code replicates the formula and compose a third wave from 2 other

3/n🧵

Audio Signals are a representation of sound.

You get this data by taking samples of air pressure over time (sample rate)

When we say the sample rate is 44.1kHz, it means we take 44100 samples per second.

This results in a waveform.

en.wikipedia.org/wiki/Audio_sig…

4/n🧵

When you load a wav file, you get this waveform, or to be more precise, an array for N int16 numbers for each channel (mono vs stereo).

You can see the sample rate that was used and the duration of the audio.
Multiple ML models work at 16kHz sample rate.

5/n🧵

One common trick used to extract features or classify audio is to convert the waveform to a spectrogram. It can be used as a 2D image and use CNN layers to extract features from it. From here it would work as an image classification model

But, wait, what's a spectrogram?

6/n🧵

A spectrogram is a visual representation of the spectrum of frequencies over time

But how do you go from the time domain (waveform) to the frequency domain?

Here is where one of the most famous math equations comes to the rescue -> The Fourier Transformation

7/n🧵

The Fourier transformation can convert from the time domain to the frequency domain.

It can, given a complex waveform, extract all the frequencies and amplitudes that form it.

The code below can help you visualize the FT from previous waves

en.wikipedia.org/wiki/Fourier_t…

8/n🧵

Applying FT to the waveform leads to 1000s of frequencies and these frequencies vary over time, as the audio changes (eg: from silence to a dog barks to silence...).

9/n🧵

To solve this, It's better to apply sequentially to parts (windows) of it

FT is quite complex and slow for practical use.

That's why a Discrete version was derived: Fast Fourier Transformation (FFT).

10/n🧵

<INTERMISSION>

Are you enjoying this thread? I hope so!

Don't forget to follow, comment and like, this helps me improve the content and keep the conversation going!

the rest is less than 5 minutes!!! Let's go!

</INTERMISSION>

?/n🧵

Now we can create the spectrogram!

A Spectrogram consists of:
- The y-axis is the frequency in Hz
- The x-axis is the time
- The color represents the magnitude or amplitude (the brighter the higher). Usually in decibels (dB)

11/n🧵

TensorFlow have some methods to help you create the spectrogram based on the waveform

tensorflow.org/io/api_docs/py…

12/n🧵

This is a very brief overview of how we go from a sound to a spectrogram.

This helps understand some of the techniques that are used for audio analysis

I used multiple resources but this video is a very important one:

13/n🧵

If you came this far, thanks!!!

It was a long thread and I hope you learned something!
If you have questions please leave them in the comments so I can improve the content for everyone!

14/14🧵

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Luiz GUStavo

Try unrolling a thread yourself!

More from @gusthema

Luiz GUStavo

Luiz GUStavo

Did Thread Reader help you today?

Like this author's thread?