Post-process embeddings from VGGish.


Bases: object

Post-processes VGGish embeddings.

The initial release of AudioSet included 128-D VGGish embeddings for each segment of AudioSet. These released embeddings were produced by applying a PCA transformation (technically, a whitening transform is included as well) and 8-bit quantization to the raw embedding output from VGGish, in order to stay compatible with the YouTube-8M project which provides visual embeddings in the same format for a large set of YouTube videos. This class implements the same PCA (with whitening) and quantization transformations.

Constructs a postprocessor.

pca_params_npz_path: Path to a NumPy-format .npz file that
contains the PCA parameters used in postprocessing.

Applies postprocessing to a batch of embeddings.

embeddings_batch: An nparray of shape [batch_size, embedding_size]
containing output from the embedding layer of VGGish.
An nparray of the same shape as the input but of type uint8, containing the PCA-transformed and quantized version of the input.