class, num_bands=64, sample_rate=16000, log_offset=0.01, example_window_seconds=0.96, example_hop_seconds=0.96, stft_window_length_seconds=0.025, stft_hop_length_seconds=0.01, mel_min_hz=125, mel_max_hz=7500, *args, **kwargs)[source]

Bases: gnes.preprocessor.base.BaseAudioPreprocessor

Return type:None
train(*args, **kwargs)

Train the model, need to be overrided

waveform_to_examples(data, sample_rate)[source]

Converts audio waveform into an array of examples for VGGish.

data: np.array of either one dimension (mono) or two dimensions
(multi-channel, with the outer dimension representing channels). Each sample is generally expected to lie in the range [-1.0, +1.0], although this is not required.

sample_rate: Sample rate of data.

3-D np.array of shape [num_examples, num_frames, num_bands] which represents a sequence of examples, each of which contains a patch of log mel spectrogram, covering num_frames frames of audio and num_bands mel frequency bands, where the frame length is vggish_params.STFT_HOP_LENGTH_SECONDS.