deepextractor.data.preprocessing

Module Contents

class deepextractor.data.preprocessing.ChannelStandardScaler[source]

Per-channel standard scaler for (N, C, T) time-series data.

Fits one mean and std per channel across all samples and time points. The mean_ and scale_ attributes (shape (C,)) are compatible with the HDF5Dataset input_scaler interface.

Compatible with joblib.dump / pickle for serialisation.

mean_ = None[source]
scale_ = None[source]
n_channels_ = None[source]
fit(X: numpy.ndarray) ChannelStandardScaler[source]

Fit on array X of shape (N, C, T).

For large datasets use fit_from_hdf5 instead to avoid loading everything into memory.

transform(X: numpy.ndarray) numpy.ndarray[source]
inverse_transform(X: numpy.ndarray) numpy.ndarray[source]
fit_transform(X: numpy.ndarray) numpy.ndarray[source]
fit_from_hdf5(hdf5_path: str, key: str, chunk_size: int = 2048) ChannelStandardScaler[source]

Fit on a dataset too large to load at once.

Computes per-channel mean and variance in two online passes over the HDF5 dataset — first pass for the mean, second for the variance. Memory usage is O(chunk_size * C * T) rather than O(N * C * T).

Parameters:
  • hdf5_path – Path to the HDF5 file.

  • key – Dataset key with shape (N, C, T).

  • chunk_size – Number of samples to process at a time.