deepextractor.data.datasets =========================== .. py:module:: deepextractor.data.datasets Module Contents --------------- .. py:class:: TimeSeriesDataset(input_npy, target_npy, transform=None) Bases: :py:obj:`torch.utils.data.Dataset` An abstract class representing a :class:`Dataset`. All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:`__getitem__`, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:`__len__`, which is expected to return the size of the dataset by many :class:`~torch.utils.data.Sampler` implementations and the default options of :class:`~torch.utils.data.DataLoader`. Subclasses could also optionally implement :meth:`__getitems__`, for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples. .. note:: :class:`~torch.utils.data.DataLoader` by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided. .. py:attribute:: inputs .. py:attribute:: targets .. py:attribute:: transform :value: None .. py:class:: SpectrogramDataset(input_npy, target_npy, transform=None) Bases: :py:obj:`torch.utils.data.Dataset` An abstract class representing a :class:`Dataset`. All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:`__getitem__`, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:`__len__`, which is expected to return the size of the dataset by many :class:`~torch.utils.data.Sampler` implementations and the default options of :class:`~torch.utils.data.DataLoader`. Subclasses could also optionally implement :meth:`__getitems__`, for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples. .. note:: :class:`~torch.utils.data.DataLoader` by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided. .. py:attribute:: input_path .. py:attribute:: target_path .. py:attribute:: input_shape .. py:attribute:: target_shape .. py:attribute:: input_channels_needed .. py:attribute:: target_channels_needed .. py:attribute:: transform :value: None .. py:class:: HDF5Dataset(hdf5_path, input_key, background_key, signal_key, input_scaler=None, target_signal_only=False, transform=None) Bases: :py:obj:`torch.utils.data.Dataset` HDF5-backed dataset for time-domain two-detector signal/glitch separation. Lazy-opens the HDF5 file per worker process. Use shuffle=False in DataLoader — data is pre-shuffled at generation time; random HDF5 seeks are expensive. :param hdf5_path: Path to the HDF5 file. :param input_key: Dataset key for the 2-channel (H1+L1) strain inputs. :param background_key: Dataset key for the background (noise) targets. :param signal_key: Dataset key for the signal targets. :param input_scaler: Optional sklearn-compatible scaler (must expose mean_ and scale_ attributes, shaped (n_channels,)). Applied to inputs only; targets are assumed to be whitened already. :param target_signal_only: If True, return only the signal targets (2-channel). If False (default), concatenate [background, signal] → 4-channel target. :param transform: Optional callable with signature transform(input_ts=..., target_ts=...) → dict with same keys. .. py:attribute:: hdf5_path .. py:attribute:: input_key .. py:attribute:: background_key .. py:attribute:: signal_key .. py:attribute:: input_scaler :value: None .. py:attribute:: target_signal_only :value: False .. py:attribute:: transform :value: None