When you load a wav file, you get this waveform, or to be more precise, an array for N int16 numbers for each channel (mono vs stereo).
You can see the sample rate that was used and the duration of the audio.
Multiple ML models work at 16kHz sample rate.
5/n🧵
One common trick used to extract features or classify audio is to convert the waveform to a spectrogram. It can be used as a 2D image and use CNN layers to extract features from it. From here it would work as an image classification model
But, wait, what's a spectrogram?
6/n🧵
A spectrogram is a visual representation of the spectrum of frequencies over time
But how do you go from the time domain (waveform) to the frequency domain?
Here is where one of the most famous math equations comes to the rescue -> The Fourier Transformation
7/n🧵
The Fourier transformation can convert from the time domain to the frequency domain.
It can, given a complex waveform, extract all the frequencies and amplitudes that form it.
The code below can help you visualize the FT from previous waves
Applying FT to the waveform leads to 1000s of frequencies and these frequencies vary over time, as the audio changes (eg: from silence to a dog barks to silence...).
9/n🧵
To solve this, It's better to apply sequentially to parts (windows) of it
FT is quite complex and slow for practical use.
That's why a Discrete version was derived: Fast Fourier Transformation (FFT).
10/n🧵
<INTERMISSION>
Are you enjoying this thread? I hope so!
Don't forget to follow, comment and like, this helps me improve the content and keep the conversation going!
the rest is less than 5 minutes!!! Let's go!
</INTERMISSION>
?/n🧵
Now we can create the spectrogram!
A Spectrogram consists of:
- The y-axis is the frequency in Hz
- The x-axis is the time
- The color represents the magnitude or amplitude (the brighter the higher). Usually in decibels (dB)
11/n🧵
TensorFlow have some methods to help you create the spectrogram based on the waveform
This is a very brief overview of how we go from a sound to a spectrogram.
This helps understand some of the techniques that are used for audio analysis
I used multiple resources but this video is a very important one:
13/n🧵
If you came this far, thanks!!!
It was a long thread and I hope you learned something!
If you have questions please leave them in the comments so I can improve the content for everyone!
14/14🧵
• • •
Missing some Tweet in this thread? You can try to
force a refresh