bbc-vamp-plugins  1.0
Protected Member Functions | Protected Attributes | List of all members
Rhythm Class Reference

Calculates rhythmic features of a signal, including onsets and tempo. More...

#include <Rhythm.h>

Inheritance diagram for Rhythm:

Protected Member Functions

void calculateBandFreqs ()
 
float halfHanning (float n)
 
float canny (float n)
 
float findRemainder (vector< int > peaks, int thisPeak)
 
float findTempo (vector< int > peaks)
 
float findMeanPeak (vector< float > signal, vector< int > peaks, int shift)
 
void findCorrelationPeaks (vector< float > autocor_in, float percentile_in, int windowLength_in, int shift_in, vector< int > &peaks_out, vector< int > &valleys_out)
 
void autocorrelation (vector< float > signal_in, int startShift_in, int endShift_in, vector< float > &autocor_out)
 
void findOnsetPeaks (vector< float > onset_in, int windowLength_in, vector< int > &peaks_out)
 
void movingAverage (vector< float > signal_in, int windowLength_in, float threshold_in, vector< float > &average_out, vector< float > &difference_out)
 
void normalise (vector< float > signal_in, vector< float > &normalised_out)
 
void halfHannConvolve (vector< vector< float > > &envelope_out)
 
void cannyConvolve (vector< vector< float > > envelope_in, vector< float > &onset_out)
 

Protected Attributes

int numBands
 
float * bandHighFreq
 
int halfHannLength
 
float * halfHannWindow
 
int cannyLength
 
float cannyShape
 
float * cannyWindow
 
vector< vector< float > > intensity
 
float threshold
 
int average_window
 
int peak_window
 
int max_bpm
 
int min_bpm
 

Detailed Description

Calculates rhythmic features of a signal, including onsets and tempo.

Outputs

Onset Curve
The filtered and half-wave rectified intensity of the signal, used to detect onsets.
Average
The moving average of the onset curve, plus the threshold - used for selecting where the peaks of the onset curve are.
Difference
The difference between the onset curve and its moving average. Used as the input for peak-picking.
Onset
The detected note onsets.
Average onset frequency
The mean number of onsets per minute.
Rhythm strength
The mean value of the peaks in the onset curve.
Autocorrelation
The autocorrelation of the difference curve.
Mean Correlation Peak
The mean value of the peaks in the autocorrelation.
Peak-Valley Ratio
The mean peak-valley ratio of the autocorrelation.
Tempo
The estimated tempo in beats per minute.

Parameters

Sub-bands
Number of sub-bands to divide the signal into for applying the half-hanning window. A higher increases accuracy at the cost of processing time. (default = 7)
Threshold
Amount by which to increase the moving average filter. A higher number produces fewer onsets. (default = 1.0)
Moving average window length
Length of moving average window. A higher number produces a smoother curve. (default = 200)
Onset peak window length
Length of window used to select peaks in the difference curve. (default = 6)
Minimum BPM
Minimum tempo calculated using the autocorrelation. (default = 12)
Maximum BPM
Maximum tempo calculated using the autocorrelation. (default = 300)

Description

The rhythm features are based on the features described in [1] (section 3C), combined with some techniques from [2].

Firstly the spectrum is divided into \(n\) sub-bands with the following frequency ranges.

\[ \left(0,\frac{F_s}{2^n}\right) , \left(\frac{F_s}{2^n}, \frac{F_s}{2^{n-1}}\right) , \ldots \left(\frac{F_s}{2^2}, \frac{F_s}{2^1}\right) \]

For each sub-band, the magnitude of the FFT bins are summed, producing \(n\) signals. Each of the signals are convolved with a half-hanning window, where \(L\) is set as 12.

\[ H(w) = 0.5 + 0.5\cos\left(2\pi \cdot \frac{w}{2L-1} \right) \hspace{20px} w\in[0, L-1] \]

Subsequently, each of the signals are convolved with a peak-enhancing canny window, where \(L\) is set as 12 and \(\sigma\) is set as 4.

\[ C(w) = \frac{w}{\sigma^2}e^{-\frac{w^2}{2\sigma^2}} \hspace{20px} w\in[-L,L] \]

The \(n\) signals are summed and half-wave rectified to produce the onset curve.

The moving average \(A\) of the onset curve \(O\) is produced from the mean value of a rectangular window of length \((2L+1)\), plus a threshold \(t\). The threshold and moving average window length parameters control \(t\) and \(L\) respectively.

\[ A(x) = \displaystyle\sum\limits_{y=-L}^{L} \frac{O(x+y)}{2L+1} + t \]

The difference signal is created by subtracting the moving average from the onset curve and applying half-wave rectification.

An onset is detected when a sample is the maximum within a given window of length \((2L+1)\), where \(L\) is set by the parameter onset peak window length.

The average onset frequency is the total number of onsets divided by the length of the track in minutes.

The rhythm strength is the mean value of the peaks of the onset curve (pre-averaging).

The autocorrelation is the autocorrelation of the difference signal between delays of \(\frac{60}{T_{max}}\cdot\frac{F_s}{s}\) frames and \(\frac{60}{T_{min}}\cdot\frac{F_s}{s}\) frames, where \(T_{min}\) and \(T_{max}\) are the min/max tempo in BPM and \(s\) is the step size in number of frames.

The peaks of the autocorrelation - \(P_i\) - are defined as those which are above a certain threshold, defined as the 95% confidence interval, and whose value is the maximum within a 7-sample window. The mean correleation peak is the mean value of the selected peaks, and the peak-valley ratio is the ratio between the mean correlation peak and the mean value of the valleys. A valley is defined as the minimum value between two peaks.

The tempo is defined as the maximum common divisor of the detected peaks. It is found by minimising the function below:

\[ T = \underset{P_k}{argmin} \displaystyle\sum\limits_{i=1}^{N} \left|\frac{P_i}{P_k}-\text{round}\left(\frac{P_i}{P_k}\right)\right|\]

References

[1] Lu, L., Liu, D., & Zhang, H.-J. (2006). Automatic Mood Detection and Tracking of Music Audio Signals. IEEE Transactions on Audio, Speech and Language Processing (Vol. 14, pp. 5-18).

[2] Dixon, S. (2006). Onset Detection Revisited. International Conference on Digital Audio Effects (DAFx) (pp. 133-137).

Member Function Documentation

◆ calculateBandFreqs()

void Rhythm::calculateBandFreqs ( )
protected

BBC Vamp plugin collection

Copyright (c) 2011-2013 British Broadcasting Corporation

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Member Data Documentation

◆ average_window

int Rhythm::average_window
protected

Length of moving average window

◆ bandHighFreq

float* Rhythm::bandHighFreq
protected

Upper frequency of each sub-band

◆ cannyLength

int Rhythm::cannyLength
protected

Length of canny window

◆ cannyShape

float Rhythm::cannyShape
protected

Shape of canny window

◆ cannyWindow

float* Rhythm::cannyWindow
protected

Co-efficients of canny window

◆ halfHannLength

int Rhythm::halfHannLength
protected

Length of half-hanning window

◆ halfHannWindow

float* Rhythm::halfHannWindow
protected

Co-efficients of half-hanning window

◆ intensity

vector<vector<float> > Rhythm::intensity
protected

Intensity value for each block

◆ max_bpm

int Rhythm::max_bpm
protected

Maximum BPM detected in autocorrelation

◆ min_bpm

int Rhythm::min_bpm
protected

Minimum BPM detected in autocorrelation

◆ numBands

int Rhythm::numBands
protected

Number of sub-bands

◆ peak_window

int Rhythm::peak_window
protected

Length of peak-picking window

◆ threshold

float Rhythm::threshold
protected

Theshold value added to moving average


The documentation for this class was generated from the following files: