Music Feature Extraction in Python

Source: Music Extraction in Python

Packages

We’ll be using librosa for analyzing and extracting features of an audio signal.

For playing audio we will use pyAudio so that we can play music on jupyter directly.

import librosa
audio_path = 'audio-path'
x , sr = librosa.load(audio_path)
print(type(x), type(sr))

x = time series

sr = sample rate of x; consider overwriting to sr=44100

Playing audio

  import IPython.display as ipd
  ipd.Audio(audio_path)

Display methods

Feature Extraction

x, sr = librosa.load(audio_path)
#Plot the signal:
plt.figure(figsize=(14, 5))
librosa.display.waveplot(x, sr=sr)

Summary: MFCC features represent phonemes (distinct units of sound) as the shape of the vocal tract (which is responsible for sound generation) is manifest in them.

Source: The Dummy’s Guide to MFCC

Term explanation

Cespstral is ‘spectral’ with ‘spec’ inverted. Cepstrum is the information of rate of cahnge in spectral bands.

Fourier transform

Pitch is one of the characteristics of a speech signal and is measured as the frequency of the signal.

Mel scale is scale that relates the perceived frequency of a tone to the actual frequency because humans hear differently. They are better at identifying small changes in speech at lower frequencies.

Getting the Cepstral coefficients

Diagram overview

MFCC Tutorial

Source: MFCC Tutorial

Implementations

Implemented MFCCs in python, available here. Use the ‘Download ZIP’ button on the right hand side of the page to get the code.

There is a good MATLAB implementation of MFCCs over here.

References