Dataset

class pyeem.datasets.Dataset(data_dir, raman_instrument, absorbance_instrument, eem_instrument, scan_sets_subdir='raw_sample_sets', metadata_filename='metadata.csv', hdf_filename='root.hdf5', calibration_sources=None, progress_bar=False, **kwargs)

Bases: object

An EEM dataset which keeps track of measurement data and metadata.

Parameters
  • data_dir (str) – The path for the directory which contains the raw data and metadata.

  • raman_instrument (str, optional) – The type of instrument used to collect Raman scans. Defaults to None.

  • absorbance_instrument (str, optional) – The type of instrument used to collect absorbance scans. Defaults to None.

  • eem_instrument (str, optional) – The type of instrument used to collect EEM scans. Defaults to None.

  • scan_sets_subdir (str, optional) – The path for subdirectory containing the sample sets. Defaults to “raw_sample_sets”.

  • metadata_filename (str, optional) – The filename of the metadata file which keeps track of all the sample sets. Defaults to “metadata.csv”.

  • hdf_filename (str, optional) – The filename of the HDF5 file. Defaults to “root.hdf5”.

  • (dict of {str (calibration_sources) – str}, optional): A dictionary of calibration sources measured in the dataset. Each source must be specified with its units. Defaults to None.

  • progress_bar (bool, optional) – Determines whether or not a progress bar will be displayed to show progress of dataset loading. Defaults to False.

Attributes Summary

data_dir

attrgetter(attr, …) –> attrgetter object

metadata_path

attrgetter(attr, …) –> attrgetter object

scan_sets_subdir

attrgetter(attr, …) –> attrgetter object

Methods Summary

load_metadata()

Loads the metadata file which keeps track of all the sample sets.

load_sample_sets()

Loads all sample sets which are tracked in the metadata from disk and write to the HDF5 file.

metadata_summary_info()

Summary information about the dataset which is stored in the metadata.

Attributes Documentation

data_dir

attrgetter(attr, …) –> attrgetter object

Return a callable object that fetches the given attribute(s) from its operand. After f = attrgetter(‘name’), the call f(r) returns r.name. After g = attrgetter(‘name’, ‘date’), the call g(r) returns (r.name, r.date). After h = attrgetter(‘name.first’, ‘name.last’), the call h(r) returns (r.name.first, r.name.last).

metadata_path

attrgetter(attr, …) –> attrgetter object

Return a callable object that fetches the given attribute(s) from its operand. After f = attrgetter(‘name’), the call f(r) returns r.name. After g = attrgetter(‘name’, ‘date’), the call g(r) returns (r.name, r.date). After h = attrgetter(‘name.first’, ‘name.last’), the call h(r) returns (r.name.first, r.name.last).

scan_sets_subdir

attrgetter(attr, …) –> attrgetter object

Return a callable object that fetches the given attribute(s) from its operand. After f = attrgetter(‘name’), the call f(r) returns r.name. After g = attrgetter(‘name’, ‘date’), the call g(r) returns (r.name, r.date). After h = attrgetter(‘name.first’, ‘name.last’), the call h(r) returns (r.name.first, r.name.last).

Methods Documentation

load_metadata()

Loads the metadata file which keeps track of all the sample sets.

Returns

The metadata which tracks samples and their associated info.

Return type

pandas.DataFrame

load_sample_sets()

Loads all sample sets which are tracked in the metadata from disk and write to the HDF5 file.

metadata_summary_info()

Summary information about the dataset which is stored in the metadata.

Returns

The summary table.

Return type

pandas.DataFrame