deepextractor.data.preprocessing¶

Module Contents¶

class deepextractor.data.preprocessing.ChannelStandardScaler[source]¶

Per-channel standard scaler for (N, C, T) time-series data.

Fits one mean and std per channel across all samples and time points. The mean_ and scale_ attributes (shape (C,)) are compatible with the HDF5Dataset input_scaler interface.

Compatible with joblib.dump / pickle for serialisation.

mean_ = None[source]¶

scale_ = None[source]¶

n_channels_ = None[source]¶

fit(X: numpy.ndarray) → ChannelStandardScaler[source]¶

Fit on array X of shape (N, C, T).

For large datasets use fit_from_hdf5 instead to avoid loading everything into memory.

transform(X: numpy.ndarray) → numpy.ndarray[source]¶

inverse_transform(X: numpy.ndarray) → numpy.ndarray[source]¶

fit_transform(X: numpy.ndarray) → numpy.ndarray[source]¶

fit_from_hdf5(hdf5_path: str, key: str, chunk_size: int = 2048) → ChannelStandardScaler[source]¶

Fit on a dataset too large to load at once.

Computes per-channel mean and variance in two online passes over the HDF5 dataset — first pass for the mean, second for the variance. Memory usage is O(chunk_size * C * T) rather than O(N * C * T).

Parameters:

hdf5_path – Path to the HDF5 file.
key – Dataset key with shape (N, C, T).
chunk_size – Number of samples to process at a time.