class int = 96, num_bands: int = 64, sample_rate: int = 16000, log_offset: float = 0.01, example_window_seconds: float = 0.96, example_hop_seconds: float = 0.96, stft_window_length_seconds: float = 0.025, stft_hop_length_seconds: float = 0.01, mel_min_hz: int = 125, mel_max_hz: int = 7500, *args, **kwargs)[source]

Bases: gnes.preprocessor.base.BaseAudioPreprocessor

apply(doc: gnes_pb2.Document) → None[source]
train(*args, **kwargs)

Train the model, need to be overrided

waveform_to_examples(data, sample_rate)[source]

Converts audio waveform into an array of examples for VGGish.

data: np.array of either one dimension (mono) or two dimensions
(multi-channel, with the outer dimension representing channels). Each sample is generally expected to lie in the range [-1.0, +1.0], although this is not required.

sample_rate: Sample rate of data.

3-D np.array of shape [num_examples, num_frames, num_bands] which represents a sequence of examples, each of which contains a patch of log mel spectrogram, covering num_frames frames of audio and num_bands mel frequency bands, where the frame length is vggish_params.STFT_HOP_LENGTH_SECONDS.