StandardScaler Base

Data scaler class that implments standarization. This transforms data to have a mean of zero and variance of 1.

Scaler must be ‘trained’ (fit) before it can be used to scale new results.

Examples

  1. Feature extraction objects by default load with a StandardScaler and are designed to work with the DatasetGenetor

    import spiegelib as spgl
    
    synth = spgl.synth.SynthVST("/Library/Audio/Plug-Ins/VST/Dexed.vst")
    spectral_features = spgl.features.SpectralSummarized()
    generator = spgl.DatasetGenerator(synth, spectral_features, scale=True)
    generator.generate(1000)
    generator.save_scaler('./data_scaler.pkl')
    

    The scaler is saved in the spectral_features object and is used to scale future data. The scaler is also saved and can be reloaeded for future feature extraction. See load_scaler()

  2. Using StandardScaler independently

    import spiegelib as spgl
    import numpy as np
    
    # Generate a dataset without scaling
    synth = spgl.synth.SynthVST("/Library/Audio/Plug-Ins/VST/Dexed.vst",
                                render_length_secs=1.0)
    
    mfcc = spgl.features.MFCC(num_mfccs=13)
    generator = spgl.DatasetGenerator(synth, mfcc)
    generator.generate(1000, file_prefix="train_")
    
    
    # DatasetGenerator will automatically save the extracted features as a npy
    # file in the current working directory
    dataset = np.load('./train_features.npy')
    
    scaler = spgl.features.StandardScaler()
    scaler.fit(dataset)
    scaled_data = scaler.transform(dataset)
    
    
    # Now we can add the scaler to the MFCC feature extractor and use it to
    # scale any future feature extractions
    mfcc.set_scaler(scaler)
    random_audio = synth.get_random_example()
    scaled_mfccs = mfcc(random_audio, scale=True)
    
  3. Scaling along certain dimensions

    The dimension that scaling is applied to depends on the fit axis. Our MFCC dataset generated in the previous example has the shape (1000, 13, 88) where each axis corresponds to (batches, mfccs, time slices).

    # Same dataset as before and new scaler object
    dataset = np.load('./train_features.npy')
    scaler = spgl.features.StandardScaler()
    
    # This flattens the entire dataset and calculates
    # the mean and variance on the flattened array
    scaler.fit(dataset)
    
    # Since the batch is on the first axis, this will calculate
    # the mean and variance independently for each MFCC and time slice
    scaler.fit(dataset, axis=0)
    
    # This will scale each MFCC band independently
    scaler.fit(dataset, axis=(0,2))
    

    To control the scale axis when using the DatasetGenerator from the first example, a custom axis can be set for a feature extraction object in the constructor. See the scale_axis argument in the FeaturesBase constructor.

class spiegelib.features.StandardScaler

Bases: spiegelib.features.data_scaler_base.DataScalerBase

fit(data, axis=None)

Compute mean and std for later scaling

Parameters
  • data (np.ndarray) – data to calculate mean and standard deviation on for future scaling

  • axis (int, tuple, optional) – axis or axes to use for calculating scaling parameteres. Defaults to None which will flatten the array first.

transform(data)

Scale data

Parameters

data (np.ndarray) – data to scale

Returns

scaled data

Return type

np.ndarray