StandardScaler Base¶

Data scaler class that implments standarization. This transforms data to have a mean of zero and variance of 1.

Scaler must be ‘trained’ (fit) before it can be used to scale new results.

Examples

Feature extraction objects by default load with a StandardScaler and are designed to work with the DatasetGenetor

import spiegelib as spgl

synth = spgl.synth.SynthVST("/Library/Audio/Plug-Ins/VST/Dexed.vst")
spectral_features = spgl.features.SpectralSummarized()
generator = spgl.DatasetGenerator(synth, spectral_features, scale=True)
generator.generate(1000)
generator.save_scaler('./data_scaler.pkl')
The scaler is saved in the spectral_features object and is used to scale future data. The scaler is also saved and can be reloaeded for future feature extraction. See load_scaler()

Using StandardScaler independently

import spiegelib as spgl
import numpy as np

# Generate a dataset without scaling
synth = spgl.synth.SynthVST("/Library/Audio/Plug-Ins/VST/Dexed.vst",
                            render_length_secs=1.0)

mfcc = spgl.features.MFCC(num_mfccs=13)
generator = spgl.DatasetGenerator(synth, mfcc)
generator.generate(1000, file_prefix="train_")


# DatasetGenerator will automatically save the extracted features as a npy
# file in the current working directory
dataset = np.load('./train_features.npy')

scaler = spgl.features.StandardScaler()
scaler.fit(dataset)
scaled_data = scaler.transform(dataset)


# Now we can add the scaler to the MFCC feature extractor and use it to
# scale any future feature extractions
mfcc.set_scaler(scaler)
random_audio = synth.get_random_example()
scaled_mfccs = mfcc(random_audio, scale=True)

Scaling along certain dimensions

The dimension that scaling is applied to depends on the fit axis. Our MFCC dataset generated in the previous example has the shape (1000, 13, 88) where each axis corresponds to (batches, mfccs, time slices).
# Same dataset as before and new scaler object
dataset = np.load('./train_features.npy')
scaler = spgl.features.StandardScaler()

# This flattens the entire dataset and calculates
# the mean and variance on the flattened array
scaler.fit(dataset)

# Since the batch is on the first axis, this will calculate
# the mean and variance independently for each MFCC and time slice
scaler.fit(dataset, axis=0)

# This will scale each MFCC band independently
scaler.fit(dataset, axis=(0,2))
To control the scale axis when using the DatasetGenerator from the first example, a custom axis can be set for a feature extraction object in the constructor. See the scale_axis argument in the FeaturesBase constructor.

class spiegelib.features.StandardScaler¶

Bases: spiegelib.features.data_scaler_base.DataScalerBase

fit(data, axis=None)¶

Compute mean and std for later scaling

Parameters

data (np.ndarray) – data to calculate mean and standard deviation on for future scaling
axis (int, tuple, optional) – axis or axes to use for calculating scaling parameteres. Defaults to None which will flatten the array first.

transform(data)¶

Scale data

Parameters: data (np.ndarray) – data to scale
Returns: scaled data
Return type: np.ndarray