bbc-vamp-plugins  1.0
Protected Attributes | List of all members
SpeechMusicSegmenter Class Reference

Calculates boundaries between speech and music. More...

#include <SpeechMusicSegmenter.h>

Inheritance diagram for SpeechMusicSegmenter:

Protected Attributes

vector< double > m_zcr
 
int m_nframes
 
int resolution
 
double margin
 
double change_threshold
 
double decision_threshold
 
double min_music_length
 

Detailed Description

Calculates boundaries between speech and music.

Outputs

Segmentation
Impulses at the boundary points.
Detection function
Function used to find boundaries.

Parameters

Resolution
The number of frames defining the window at which candidate changes might be found (default = 256)
Change threshold
The threshold of skewness difference at which a candidate change will be marked (default = 0.0781)
Decision threshold
The threshold used to classify segments as speech or music (default = 0.2734)
Margin
A parameter for the generation of the ZCR skewness (margin around mean ZCR where no ZCR samples will be taken into account) (default = 14)
Minimum music segment length
Music segments that are shorter than this minimum length will be dismissed (default = 0)

Description

This Vamp plugin is heavily inspired by the approach described in [1].

The algorithm works as follows:

  1. Measure the skewness of the distribution of zero-crossing rate across the audio file;
  2. Find points at which this distribution changes drastically;
  3. For each candidate change point found, classify the corresponding segment as follows:
    • Mean skewness > threshold: speech
    • Mean skewness < threshold: music
  4. If the segment has the same type with the previous one, merge it with the previous one.

This is a very early prototype, so not very accurate. It is relatively fast (around 1s to process a 20 minute file).

References

[1] J. Saunders, "Real-time discrimination of broadcast speech/music," IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.2, pp.993-999, 7-10 May 1996


The documentation for this class was generated from the following file: