gnes.preprocessor.audio.vggish_example_helper.mel_features module¶

gnes.preprocessor.audio.vggish_example_helper.mel_features.
frame
(data, window_length, hop_length)[source]¶ Convert array into a sequence of successive possibly overlapping frames.
An ndimensional array of shape (num_samples, …) is converted into an (n+1)D array of shape (num_frames, window_length, …), where each frame starts hop_length points after the preceding one.
This is accomplished using stride_tricks, so the original data is not copied. However, there is no zeropadding, so any incomplete frames at the end are not included.
 Args:
 data: np.array of dimension N >= 1. window_length: Number of samples in each frame. hop_length: Advance (in samples) between each window.
 Returns:
 (N+1)D np.array with as many rows as there are complete frames that can be extracted.

gnes.preprocessor.audio.vggish_example_helper.mel_features.
hertz_to_mel
(frequencies_hertz)[source]¶ Convert frequencies to mel scale using HTK formula.
 Args:
 frequencies_hertz: Scalar or np.array of frequencies in hertz.
 Returns:
 Object of same size as frequencies_hertz containing corresponding values on the mel scale.

gnes.preprocessor.audio.vggish_example_helper.mel_features.
log_mel_spectrogram
(data, audio_sample_rate=8000, log_offset=0.0, window_length_secs=0.025, hop_length_secs=0.01, **kwargs)[source]¶ Convert waveform to a log magnitude melfrequency spectrogram.
 Args:
 data: 1D np.array of waveform data. audio_sample_rate: The sampling rate of data. log_offset: Add this to values when taking log to avoid Infs. window_length_secs: Duration of each window to analyze. hop_length_secs: Advance between successive analysis windows. **kwargs: Additional arguments to pass to spectrogram_to_mel_matrix.
 Returns:
 2D np.array of (num_frames, num_mel_bins) consisting of log mel filterbank magnitudes for successive frames.

gnes.preprocessor.audio.vggish_example_helper.mel_features.
periodic_hann
(window_length)[source]¶ Calculate a “periodic” Hann window.
The classic Hann window is defined as a raised cosine that starts and ends on zero, and where every value appears twice, except the middle point for an oddlength window. Matlab calls this a “symmetric” window and np.hanning() returns it. However, for Fourier analysis, this actually represents just over one cycle of a period N1 cosine, and thus is not compactly expressed on a lengthN Fourier basis. Instead, it’s better to use a raised cosine that ends just before the final zero value  i.e. a complete cycle of a periodN cosine. Matlab calls this a “periodic” window. This routine calculates it.
 Args:
 window_length: The number of points in the returned window.
 Returns:
 A 1D np.array containing the periodic hann window.

gnes.preprocessor.audio.vggish_example_helper.mel_features.
spectrogram_to_mel_matrix
(num_mel_bins=20, num_spectrogram_bins=129, audio_sample_rate=8000, lower_edge_hertz=125.0, upper_edge_hertz=3800.0)[source]¶ Return a matrix that can postmultiply spectrogram rows to make mel.
Returns a np.array matrix A that can be used to postmultiply a matrix S of spectrogram values (STFT magnitudes) arranged as frames x bins to generate a “mel spectrogram” M of frames x num_mel_bins. M = S A.
The classic HTK algorithm exploits the complementarity of adjacent mel bands to multiply each FFT bin by only one mel weight, then add it, with positive and negative signs, to the two adjacent mel bands to which that bin contributes. Here, by expressing this operation as a matrix multiply, we go from num_fft multiplies per frame (plus around 2*num_fft adds) to around num_fft^2 multiplies and adds. However, because these are all presumably accomplished in a single call to np.dot(), it’s not clear which approach is faster in Python. The matrix multiplication has the attraction of being more general and flexible, and much easier to read.
 Args:
 num_mel_bins: How many bands in the resulting mel spectrum. This is
 the number of columns in the output matrix.
 num_spectrogram_bins: How many bins there are in the source spectrogram
 data, which is understood to be fft_size/2 + 1, i.e. the spectrogram only contains the nonredundant FFT bins.
 audio_sample_rate: Samples per second of the audio at the input to the
 spectrogram. We need this to figure out the actual frequencies for each spectrogram bin, which dictates how they are mapped into mel.
 lower_edge_hertz: Lower bound on the frequencies to be included in the mel
 spectrum. This corresponds to the lower edge of the lowest triangular band.
upper_edge_hertz: The desired top edge of the highest frequency band.
 Returns:
 An np.array with shape (num_spectrogram_bins, num_mel_bins).
 Raises:
 ValueError: if frequency edges are incorrectly ordered or out of range.

gnes.preprocessor.audio.vggish_example_helper.mel_features.
stft_magnitude
(signal, fft_length, hop_length=None, window_length=None)[source]¶ Calculate the shorttime Fourier transform magnitude.
 Args:
 signal: 1D np.array of the input timedomain signal. fft_length: Size of the FFT to apply. hop_length: Advance (in samples) between each frame passed to FFT. window_length: Length of each block of samples to pass to FFT.
 Returns:
 2D np.array where each row contains the magnitudes of the fft_length/2+1 unique values of the FFT for the corresponding frame of input samples.