Yaafe core features¶
Yaafe core audio features.
Available features¶
AmplitudeModulation¶
-
class
yaafefeatures.AmplitudeModulation¶ Tremelo and Grain description, according to [SE2005] and [AE2001].
- AmplitudeModulation uses
Envelopeto describe tremolo and grain. Analyzed frequency ranges are : - Tremolo : 4 - 8 Hz
- Grain : 10 - 40 Hz
- For each of these ranges, it computes :
- Frequency of maximum energy in range
- Difference of the energy of this frequency and the mean energy over all frequencies
- Difference of the energy of this frequency and the mean energy in range
- Product of the two first values.
[AE2001] A.Eronen, Automatic musical instrument recognition. Master’s Thesis, Tempere University of Technology, 2001. - Parameters:
EnDecim(default=200): Decimation factor to compute envelopeblockSize(default=32768): output frames sizestepSize(default=16384): step between consecutive frames
Declaration example:
AmplitudeModulation EnDecim=200 blockSize=32768 stepSize=16384
See also
- AmplitudeModulation uses
AutoCorrelation¶
-
class
yaafefeatures.AutoCorrelation¶ Compute autocorrelation coefficients ac on each frames.
- Parameters:
ACNbCoeffs(default=49): Number of autocorrelation coefficients to keepblockSize(default=1024): output frames sizestepSize(default=512): step between consecutive frames
Declaration example:
AutoCorrelation ACNbCoeffs=49 blockSize=1024 stepSize=512
See also
ComplexDomainOnsetDetection¶
-
class
yaafefeatures.ComplexDomainOnsetDetection¶ Compute onset detection using a complex domain spectral flux method [CD2003].
[CD2003] C.Duxbury et al., Complex domain onset detection for musical signals, Proc. of the 6th Int. Conference on Digital Audio Effects (DAFx-03), London, UK, September 8-11, 2003 - Parameters:
FFTLength(default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.FFTWindow(default=Hanning): Weighting window to apply before fft. Hanning|Hamming|NoneblockSize(default=1024): output frames sizestepSize(default=512): step between consecutive frames
Declaration example:
ComplexDomainOnsetDetection FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
See also
Energy¶
Envelope¶
-
class
yaafefeatures.Envelope¶ Extract amplitude envelope using hilbert transform, low-pass filtering and decimation.
- Parameters:
EnDecim(default=200): Decimation factor to compute envelopeblockSize(default=32768): output frames sizestepSize(default=16384): step between consecutive frames
Declaration example:
Envelope EnDecim=200 blockSize=32768 stepSize=16384
See also
EnvelopeShapeStatistics¶
-
class
yaafefeatures.EnvelopeShapeStatistics¶ Centroid, spread, skewness and kurtosis of each frame’s amplitude envelope. For more details about moments, see Shape Statistics.
- Parameters:
EnDecim(default=200): Decimation factor to compute envelopeblockSize(default=32768): output frames sizestepSize(default=16384): step between consecutive frames
Declaration example:
EnvelopeShapeStatistics EnDecim=200 blockSize=32768 stepSize=16384
See also
Frames¶
-
class
yaafefeatures.Frames¶ Segment input signal into frames.
First frame has zeros on left half so that it is centered on time 0s, then consecutive frames are equally spaced. Consequently, frame i (starting from 0) is centered on sample i * stepSize.
- Parameters:
blockSize(default=1024): output frames sizestepSize(default=512): step between consecutive frames
Declaration example:
Frames blockSize=1024 stepSize=512
LPC¶
-
class
yaafefeatures.LPC¶ Compute the Linear Predictor Coefficients (LPC) of a signal frame. It uses autocorrelation and Levinson-Durbin algorithm. see [JM1975].
[JM1975] Makoul J., Linear Prediction: A tutorial Review, Proc. IEEE, Vol. 63, pp. 561-580, 1975. - Parameters:
LPCNbCoeffs(default=2): Number of Linear Predictor Coefficients to computeblockSize(default=1024): output frames sizestepSize(default=512): step between consecutive frames
Declaration example:
LPC LPCNbCoeffs=2 blockSize=1024 stepSize=512
See also
LSF¶
-
class
yaafefeatures.LSF¶ Compute the Line Spectral Frequency (LSF) coefficients of a signal frame. Algorithm was adapted from ([TB2006], [SH1976]).
[TB2006] Tom Backstrom, Carlo Magi, Properties of line spectrum pair polynomials–A review, Signal Processing, Volume 86, Issue 11, Special Section: Distributed Source Coding, November 2006, Pages 3286-3298, ISSN 0165-1684, DOI: 10.1016/j.sigpro.2006.01.010. [SH1976] Schussler, H., A stability theorem for discrete systems, Acoustics, Speech and Signal Processing, IEEE Transactions on , vol.24, no.1, pp. 87-89, Feb 1976 - Parameters:
LSFDisplacement(default=1): LSF Displacement parameter: 1 for classical LSF, 0 for Schussler polynomials, >1 is a generalizationLSFNbCoeffs(default=10): Number of Line Spectral Frequencies to computeblockSize(default=1024): output frames sizestepSize(default=512): step between consecutive frames
Declaration example:
LSF LSFDisplacement=1 LSFNbCoeffs=10 blockSize=1024 stepSize=512
See also
Loudness¶
-
class
yaafefeatures.Loudness¶ The loudness coefficients are the energy in each Bark band, normalized by the overall sum. see [GP2004] and [MG1997] for more details.
[MG1997] Moore, Glasberg, et al., A Model for the Prediction of Thresholds Loudness and Partial Loudness., J. Audio Eng. Soc. 45: 224-240, 1997. - Parameters:
FFTLength(default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.FFTWindow(default=Hanning): Weighting window to apply before fft. Hanning|Hamming|NoneLMode(default=Relative): “Specific” computes loudness without normalization, “Relative” normalize each band so that they sum to 1, “Total” just returns the sum of Loudness in all bands.blockSize(default=1024): output frames sizestepSize(default=512): step between consecutive frames
Declaration example:
Loudness FFTLength=0 FFTWindow=Hanning LMode=Relative blockSize=1024 stepSize=512
See also
MFCC¶
-
class
yaafefeatures.MFCC¶ Compute the Mel-frequencies cepstrum coefficients [DM1980].
Mel filter bank is built as 40 log-spaced filters according to the following mel-scale:
Each filter is a triangular filter with height
. Then MFCCs are computed as following, using DCT II:[DM1980] (1, 2) S.B. Davis and P.Mermelstrin, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28 :357-366, 1980. - Parameters:
CepsIgnoreFirstCoeff(default=1): 0 keeps the first cepstral coeffcient, 1 ignore itCepsNbCoeffs(default=13): Number of cepstral coefficient to keep.FFTWindow(default=Hanning): Weighting window to apply before fft. Hanning|Hamming|NoneMelMaxFreq(default=6854.0): Maximum frequency of the mel filter bankMelMinFreq(default=130.0): Minimum frequency of the mel filter bankMelNbFilters(default=40): Number of mel filtersblockSize(default=1024): output frames sizestepSize(default=512): step between consecutive frames
Declaration example:
MFCC CepsIgnoreFirstCoeff=1 CepsNbCoeffs=13 FFTWindow=Hanning MelMaxFreq=6854.0 MelMinFreq=130.0 MelNbFilters=40 blockSize=1024 stepSize=512
See also
MagnitudeSpectrum¶
-
class
yaafefeatures.MagnitudeSpectrum¶ Compute frame’s magnitude spectrum, using an analysis window (Hanning or Hamming), or not.
- Parameters:
FFTLength(default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.FFTWindow(default=Hanning): Weighting window to apply before fft. Hanning|Hamming|NoneblockSize(default=1024): output frames sizestepSize(default=512): step between consecutive frames
Declaration example:
MagnitudeSpectrum FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
See also
MelSpectrum¶
-
class
yaafefeatures.MelSpectrum¶ Compute the Mel-frequencies spectrum [DM1980].
Mel filter bank is built as 40 log-spaced filters according to the following mel-scale:
Each filter is a triangular filter with height
.- Parameters:
FFTWindow(default=Hanning): Weighting window to apply before fft. Hanning|Hamming|NoneMelMaxFreq(default=6854.0): Maximum frequency of the mel filter bankMelMinFreq(default=130.0): Minimum frequency of the mel filter bankMelNbFilters(default=40): Number of mel filtersblockSize(default=1024): output frames sizestepSize(default=512): step between consecutive frames
Declaration example:
MelSpectrum FFTWindow=Hanning MelMaxFreq=6854.0 MelMinFreq=130.0 MelNbFilters=40 blockSize=1024 stepSize=512
See also
OBSI¶
-
class
yaafefeatures.OBSI¶ Compute Octave band signal intensity using a trigular octave filter bank ([SE2005]).
[SE2005] (1, 2) S.Essid, Classification automatique des signaux audio-frequences: reconnaissance des instruments de musique. PhD, UPMC, 2005. - Parameters:
FFTLength(default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.FFTWindow(default=Hanning): Weighting window to apply before fft. Hanning|Hamming|NoneOBSIMinFreq(default=27.5): Minimum frequency for OBSI filter.blockSize(default=1024): output frames sizestepSize(default=512): step between consecutive frames
Declaration example:
OBSI FFTLength=0 FFTWindow=Hanning OBSIMinFreq=27.5 blockSize=1024 stepSize=512
See also
OBSIR¶
-
class
yaafefeatures.OBSIR¶ Compute log of
OBSIratio between consecutive octave.- Parameters:
DiffNbCoeffs(default=0): Maximum number of coeffs to keep. 0 keeps N-1 value (with N the input feature size)FFTLength(default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.FFTWindow(default=Hanning): Weighting window to apply before fft. Hanning|Hamming|NoneOBSIMinFreq(default=27.5): Minimum frequency for OBSI filter.blockSize(default=1024): output frames sizestepSize(default=512): step between consecutive frames
Declaration example:
OBSIR DiffNbCoeffs=0 FFTLength=0 FFTWindow=Hanning OBSIMinFreq=27.5 blockSize=1024 stepSize=512
See also
PerceptualSharpness¶
-
class
yaafefeatures.PerceptualSharpness¶ Compute the sharpness of
Loudnesscoefficients, according to [GP2004].- Parameters:
FFTLength(default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.FFTWindow(default=Hanning): Weighting window to apply before fft. Hanning|Hamming|NoneblockSize(default=1024): output frames sizestepSize(default=512): step between consecutive frames
Declaration example:
PerceptualSharpness FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
See also
PerceptualSpread¶
-
class
yaafefeatures.PerceptualSpread¶ Compute the spread of
Loudnesscoefficients, according to [GP2004].- Parameters:
FFTLength(default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.FFTWindow(default=Hanning): Weighting window to apply before fft. Hanning|Hamming|NoneblockSize(default=1024): output frames sizestepSize(default=512): step between consecutive frames
Declaration example:
PerceptualSpread FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
See also
SpectralCrestFactorPerBand¶
-
class
yaafefeatures.SpectralCrestFactorPerBand¶ Compute spectral crest factor per log-spaced band of 1/4 octave.
- Parameters:
FFTLength(default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.FFTWindow(default=Hanning): Weighting window to apply before fft. Hanning|Hamming|NoneblockSize(default=1024): output frames sizestepSize(default=512): step between consecutive frames
Declaration example:
SpectralCrestFactorPerBand FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
See also
SpectralDecrease¶
-
class
yaafefeatures.SpectralDecrease¶ Compute spectral decrease accoding to [GP2004].
- Parameters:
FFTLength(default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.FFTWindow(default=Hanning): Weighting window to apply before fft. Hanning|Hamming|NoneblockSize(default=1024): output frames sizestepSize(default=512): step between consecutive frames
Declaration example:
SpectralDecrease FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
See also
SpectralFlatness¶
-
class
yaafefeatures.SpectralFlatness¶ Compute global spectral flatness using the ratio between geometric and arithmetic mean.
- Parameters:
FFTLength(default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.FFTWindow(default=Hanning): Weighting window to apply before fft. Hanning|Hamming|NoneblockSize(default=1024): output frames sizestepSize(default=512): step between consecutive frames
Declaration example:
SpectralFlatness FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
See also
SpectralFlatnessPerBand¶
-
class
yaafefeatures.SpectralFlatnessPerBand¶ Compute spectral flatness per log-spaced band of 1/4 octave, as proposed in MPEG7 standard.
- Parameters:
FFTLength(default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.FFTWindow(default=Hanning): Weighting window to apply before fft. Hanning|Hamming|NoneblockSize(default=1024): output frames sizestepSize(default=512): step between consecutive frames
Declaration example:
SpectralFlatnessPerBand FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
See also
SpectralFlux¶
-
class
yaafefeatures.SpectralFlux¶ Compute flux of
spectrumbetween consecutives frames.- Parameters:
FFTLength(default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.FFTWindow(default=Hanning): Weighting window to apply before fft. Hanning|Hamming|NoneFluxSupport(default=All): support of flux computation. if ‘All’ then use all bins (default), if ‘Increase’ then use only bins which are increasingblockSize(default=1024): output frames sizestepSize(default=512): step between consecutive frames
Declaration example:
SpectralFlux FFTLength=0 FFTWindow=Hanning FluxSupport=All blockSize=1024 stepSize=512
See also
SpectralRolloff¶
-
class
yaafefeatures.SpectralRolloff¶ Spectral roll-off is the frequency so that 99% of the energy is contained below. see [SS1997].
[SS1997] (1, 2) E.Scheirer, M.Slaney. Construction and evaluation of a robust multifeature speech/music discriminator. IEEE Internation Conference on Acoustics, Speech and Signal Processing, p.1331-1334, 1997. - Parameters:
FFTLength(default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.FFTWindow(default=Hanning): Weighting window to apply before fft. Hanning|Hamming|NoneblockSize(default=1024): output frames sizestepSize(default=512): step between consecutive frames
Declaration example:
SpectralRolloff FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
See also
SpectralShapeStatistics¶
-
class
yaafefeatures.SpectralShapeStatistics¶ Compute shape statistics of
MagnitudeSpectrum, (see [GR2004]).Shape Statistics are centroid, spread, skewness and kurtosis, defined as follow:
[GR2004] O.Gillet, G.Richard, Automatic transcription of drum loops. in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Montreal, Canada, 2004. - Parameters:
FFTLength(default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.FFTWindow(default=Hanning): Weighting window to apply before fft. Hanning|Hamming|NoneblockSize(default=1024): output frames sizestepSize(default=512): step between consecutive frames
Declaration example:
SpectralShapeStatistics FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
See also
SpectralSlope¶
-
class
yaafefeatures.SpectralSlope¶ SpectralSlope is computed by linear regression of the spectral amplitude. (see [GP2004])
- Parameters:
FFTLength(default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.FFTWindow(default=Hanning): Weighting window to apply before fft. Hanning|Hamming|NoneblockSize(default=1024): output frames sizestepSize(default=512): step between consecutive frames
Declaration example:
SpectralSlope FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
See also
SpectralVariation¶
-
class
yaafefeatures.SpectralVariation¶ SpectralVariation is the normalized correlation of
spectrumbetween consecutive frames. (see [GP2004])[GP2004] (1, 2, 3, 4, 5, 6) Geoffroy Peeters, A large set of audio features for sound description (similarity and classification) in the CUIDADO project, 2004. - Parameters:
FFTLength(default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.FFTWindow(default=Hanning): Weighting window to apply before fft. Hanning|Hamming|NoneblockSize(default=1024): output frames sizestepSize(default=512): step between consecutive frames
Declaration example:
SpectralVariation FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
See also
TemporalShapeStatistics¶
-
class
yaafefeatures.TemporalShapeStatistics¶ Compute shape statistics of signal frames.
- Parameters:
blockSize(default=1024): output frames sizestepSize(default=512): step between consecutive frames
Declaration example:
TemporalShapeStatistics blockSize=1024 stepSize=512
See also
Available feature transforms¶
AutoCorrelationPeaksIntegrator¶
-
class
yaafefeatures.AutoCorrelationPeaksIntegrator¶ Feature transform that compute peaks of the autocorrelation function, outputs peaks and amplitude.
- Parameters:
ACPInterPeakMinDist(default=5): Minimal distance between consecutive autocorrelation peaks, expressed in lags.ACPNbPeaks(default=3): Number of autocorrelation peaks to keepACPNorm(default=No): can be No|BPM|Hz. Normalize output to be expressed respectively in lag, BPM, HzNbFrames(default=60): Number of frames to integrate togetherStepNbFrames(default=30): Number of frames to skip between two integration
Declaration example:
AutoCorrelationPeaksIntegrator ACPInterPeakMinDist=5 ACPNbPeaks=3 ACPNorm=No NbFrames=60 StepNbFrames=30
Cepstrum¶
-
class
yaafefeatures.Cepstrum¶ Feature transform that compute cepstrum coefficients of input feature frames. (use DCT II)
- Parameters:
CepsIgnoreFirstCoeff(default=1): 0 keeps the first cepstral coeffcient, 1 ignore itCepsNbCoeffs(default=13): Number of cepstral coefficient to keep.
Declaration example:
Cepstrum CepsIgnoreFirstCoeff=1 CepsNbCoeffs=13
Derivate¶
-
class
yaafefeatures.Derivate¶ Compute temporal derivative of input feature. The derivative is approximated by an orthogonal polynomial fit over a finite length window. (see [RR1993] p.117).
[RR1993] L.R.Rabiner, Fundamentals of Speech Processing. Prentice Hall Signal Processing Series. PTR Prentice-Hall, 1993. - Parameters:
DO1Len(default=4): Horizon used to compute order 1 derivative.DO2Len(default=1): Horizon used to compute order 2 derivative. Useless if DOrder=1.DOrder(default=1): Order of the derivative to compute.
Declaration example:
Derivate DO1Len=4 DO2Len=1 DOrder=1
HistogramIntegrator¶
-
class
yaafefeatures.HistogramIntegrator¶ Feature transform that compute histogram of input values
- Parameters:
HInf(default=0): Minimal value to take into considerationHNbBins(default=10): Nb bins of histogramHSup(default=1): Maximal value to take into considerationHWeighted(default=0): Set it to 1 if input values are weighted. If 1, input is considered to be a list of couple (value,weight).NbFrames(default=60): Number of frames to integrate togetherStepNbFrames(default=30): Number of frames to skip between two integration
Declaration example:
HistogramIntegrator HInf=0 HNbBins=10 HSup=1 HWeighted=0 NbFrames=60 StepNbFrames=30
SlopeIntegrator¶
-
class
yaafefeatures.SlopeIntegrator¶ Feature transform that compute the slope of input feature over the given number of frames.
- Parameters:
NbFrames(default=60): Number of frames to integrate togetherStepNbFrames(default=30): Number of frames to skip between two integration
Declaration example:
SlopeIntegrator NbFrames=60 StepNbFrames=30
StatisticalIntegrator¶
-
class
yaafefeatures.StatisticalIntegrator¶ Feature transform that compute the temporal mean and variance of input feature over the given number of frames.
- Parameters:
NbFrames(default=60): Number of frames to integrate togetherSICompute(default=MeanStddev): if ‘MeanStddev’ then compute mean and standard deviation, if ‘Mean’ compute only mean, if ‘Stddev’ compute only stantard deviation.StepNbFrames(default=30): Number of frames to skip between two integration
Declaration example:
StatisticalIntegrator NbFrames=60 SICompute=MeanStddev StepNbFrames=30
