Music Feature Extraction in Python
Source: Music Extraction in Python
Packages
We’ll be using librosa for analyzing and extracting features of an audio signal.
For playing audio we will use pyAudio so that we can play music on jupyter directly.
import librosa
audio_path = 'audio-path'
x , sr = librosa.load(audio_path)
print(type(x), type(sr))
x = time series
sr = sample rate of x; consider overwriting to sr=44100
Playing audio
import IPython.display as ipd
ipd.Audio(audio_path)
Display methods
- wave plot: loudness of audio at given time
- spectrogram: show different frequencies along with amplitude
- colormap
- stft (short term fourier transform): determines the amplitude of various frequencies at a given time
Feature Extraction
- Zero Crossing Rate: rate at which the signal changes from positive to negative or back
x, sr = librosa.load(audio_path)
#Plot the signal:
plt.figure(figsize=(14, 5))
librosa.display.waveplot(x, sr=sr)
- Zooming in:
n0 = 9000 n1 = 9100 plt.figure(figsize=(14, 5)) plt.plot(x[n0:n1]) plt.grid()
- Spectral centroid: indicates where centre of mass for sound is located
-
Spectral rolloff: is frequency under which a specified percentage of total spectral energy lies
- MFCC: mel frequency cepstral coefficients of a signal are a small set of features which concisely describe the overall shape of a spectral envelope
mfccs = librosa.feature.mfcc(x, sr=sr) print(mfccs.shape) #Displaying the MFCCs: librosa.display.specshow(mfccs, sr=sr, x_axis='time')
Dummy guide to MFCC
Summary: MFCC features represent phonemes (distinct units of sound) as the shape of the vocal tract (which is responsible for sound generation) is manifest in them.
Source: The Dummy’s Guide to MFCC
Term explanation
Cespstral is ‘spectral’ with ‘spec’ inverted. Cepstrum is the information of rate of cahnge in spectral bands.
Pitch is one of the characteristics of a speech signal and is measured as the frequency of the signal.
Mel scale is scale that relates the perceived frequency of a tone to the actual frequency because humans hear differently. They are better at identifying small changes in speech at lower frequencies.
Getting the Cepstral coefficients
MFCC Tutorial
Source: MFCC Tutorial
Implementations
Implemented MFCCs in python, available here. Use the ‘Download ZIP’ button on the right hand side of the page to get the code.
There is a good MATLAB implementation of MFCCs over here.
References
-
Davis, S. Mermelstein, P. (1980) Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences. In IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 28 No. 4, pp. 357-366
-
X. Huang, A. Acero, and H. Hon. Spoken Language Processing: A guide to theory, algorithm, and system development. Prentice Hall, 2001.