deepextractor.data.datasets

Module Contents

class deepextractor.data.datasets.TimeSeriesDataset(input_npy, target_npy, transform=None)[source]

Bases: torch.utils.data.Dataset

An abstract class representing a Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite __getitem__(), supporting fetching a data sample for a given key. Subclasses could also optionally overwrite __len__(), which is expected to return the size of the dataset by many Sampler implementations and the default options of DataLoader. Subclasses could also optionally implement __getitems__(), for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.

Note

DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.

inputs[source]
targets[source]
transform = None[source]
class deepextractor.data.datasets.SpectrogramDataset(input_npy, target_npy, transform=None)[source]

Bases: torch.utils.data.Dataset

An abstract class representing a Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite __getitem__(), supporting fetching a data sample for a given key. Subclasses could also optionally overwrite __len__(), which is expected to return the size of the dataset by many Sampler implementations and the default options of DataLoader. Subclasses could also optionally implement __getitems__(), for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.

Note

DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.

input_path[source]
target_path[source]
input_shape[source]
target_shape[source]
input_channels_needed[source]
target_channels_needed[source]
transform = None[source]
class deepextractor.data.datasets.HDF5Dataset(hdf5_path, input_key, background_key, signal_key, input_scaler=None, target_signal_only=False, transform=None)[source]

Bases: torch.utils.data.Dataset

HDF5-backed dataset for time-domain two-detector signal/glitch separation.

Lazy-opens the HDF5 file per worker process. Use shuffle=False in DataLoader — data is pre-shuffled at generation time; random HDF5 seeks are expensive.

Parameters:
  • hdf5_path – Path to the HDF5 file.

  • input_key – Dataset key for the 2-channel (H1+L1) strain inputs.

  • background_key – Dataset key for the background (noise) targets.

  • signal_key – Dataset key for the signal targets.

  • input_scaler – Optional sklearn-compatible scaler (must expose mean_ and scale_ attributes, shaped (n_channels,)). Applied to inputs only; targets are assumed to be whitened already.

  • target_signal_only – If True, return only the signal targets (2-channel). If False (default), concatenate [background, signal] → 4-channel target.

  • transform – Optional callable with signature transform(input_ts=…, target_ts=…) → dict with same keys.

hdf5_path[source]
input_key[source]
background_key[source]
signal_key[source]
input_scaler = None[source]
target_signal_only = False[source]
transform = None[source]