gnes.encoder.audio.vggish_cores.vggish_postprocess module

Post-process embeddings from VGGish.

class gnes.encoder.audio.vggish_cores.vggish_postprocess.Postprocessor(pca_params_npz_path)[source]

Bases: object

Post-processes VGGish embeddings.

The initial release of AudioSet included 128-D VGGish embeddings for each segment of AudioSet. These released embeddings were produced by applying a PCA transformation (technically, a whitening transform is included as well) and 8-bit quantization to the raw embedding output from VGGish, in order to stay compatible with the YouTube-8M project which provides visual embeddings in the same format for a large set of YouTube videos. This class implements the same PCA (with whitening) and quantization transformations.

Constructs a postprocessor.

Args:
pca_params_npz_path: Path to a NumPy-format .npz file that
contains the PCA parameters used in postprocessing.
postprocess(embeddings_batch)[source]

Applies postprocessing to a batch of embeddings.

Args:
embeddings_batch: An nparray of shape [batch_size, embedding_size]
containing output from the embedding layer of VGGish.
Returns:
An nparray of the same shape as the input but of type uint8, containing the PCA-transformed and quantized version of the input.