Welcome to pyglotaran’s documentation!

Introduction

Pyglotaran is a python library for global analysis of time-resolved spectroscopy data. It is designed to provide a state of the art modeling toolbox to researchers, in a user-friendly manner.

Its features are:

  • user-friendly modeling with a custom YAML (*.yml) based modeling language

  • parameter optimization using variable projection and non-negative least-squares algorithms

  • easy to extend modeling framework

  • battle-hardened model and algorithms for fluorescence dynamics

  • build upon and fully integrated in the standard Python science stack (NumPy, SciPy, Jupyter)

A Note To Glotaran Users

Although closely related and developed in the same lab, pyglotaran is not a replacement for Glotaran - A GUI For TIMP. Pyglotaran only aims to provide the modeling and optimization framework and algorithms. It is of course possible to develop a new GUI which leverages the power of pyglotaran (contributions welcome).

The current ‘user-interface’ for pyglotaran is Jupyter Notebook. It is designed to seamlessly integrate in this environment and be compatible with all major visualization and data analysis tools in the scientific python environment.

If you are a non-technical user, you should give these tools a try, there are numerous tutorials how to use them. You don’t need to really learn to program. If you can use e.g. Matlab or Mathematica, you can use Jupyter and Python.

Installation

Prerequisites

  • Python 3.6 or later

Windows

The easiest way of getting Python (and some basic tools to work with it) in Windows is to use Anaconda, which provides python.

You will need a terminal for the installation. One is provided by Anaconda and is called Anaconda Console. You can find it in the start menu.

Note

If you use a Windows Shell like cmd.exe or PowerShell, you might have to prefix ‘$PATH_TO_ANACONDA/’ to all commands (e.g. C:/Anaconda/pip.exe instead of pip)

Stable release

Warning

pyglotaran is early development, so for the moment stable releases are sparse and outdated. We try to keep the master code stable, so please install from source for now.

This is the preferred method to install pyglotaran, as it will always install the most recent stable release.

To install pyglotaran, run this command in your terminal:

$ pip install pyglotaran

If you don’t have pip installed, this Python installation guide can guide you through the process.

If you want to install it via conda, you can run the following command:

$ conda install -c conda-forge pyglotaran

From sources

First you have to install or update some dependencies.

Within a terminal:

$ pip install -U numpy scipy Cython

Alternatively, for Anaconda users:

$ conda install numpy scipy Cython

Afterwards you can simply use pip to install it directly from Github.

$ pip install git+https://github.com/glotaran/pyglotaran.git

For updating pyglotaran, just re-run the command above.

If you prefer to manually download the source files, you can find them on Github. Alternatively you can clone them with git (preferred):

$ git clone https://github.com/glotaran/pyglotaran.git

Within a terminal, navigate to directory where you have unpacked or cloned the code and enter

$ pip install -e .

For updating, simply download and unpack the newest version (or run $ git pull in pyglotaran directory if you used git) and and re-run the command above.

This page was generated from docs/source/notebooks/quickstart/quickstart.ipynb. Interactive online version: Binder badge

Quickstart/Cheat-Sheet

Since this documentation is written in a jupyter-notebook we will import a little ipython helper function to display file with syntax highlighting.

[1]:
from glotaran.utils.ipython import display_file

To start using pyglotaran in your project, you have to import it first. In addition we need to import some extra components for later use.

[2]:
from glotaran.analysis.optimize import optimize
from glotaran.io import load_model
from glotaran.io import load_parameters
from glotaran.io import save_dataset
from glotaran.io.prepare_dataset import prepare_time_trace_dataset
from glotaran.project.scheme import Scheme

Let us get some example data to analyze:

[3]:
from glotaran.examples.sequential import dataset

dataset
[3]:
<xarray.Dataset>
Dimensions:   (spectral: 72, time: 2100)
Coordinates:
  * time      (time) float64 -1.0 -0.99 -0.98 -0.97 ... 19.96 19.97 19.98 19.99
  * spectral  (spectral) float64 600.0 601.4 602.8 604.2 ... 696.6 698.0 699.4
Data variables:
    data      (time, spectral) float64 0.01272 -0.003198 ... 1.718 1.542

Like all data in pyglotaran, the dataset is a xarray.Dataset. You can find more information about the xarray library the xarray hompage.

The loaded dataset is a simulated sequential model.

Plotting raw data

Now we lets plot some time traces.

[4]:
plot_data = dataset.data.sel(spectral=[620, 630, 650], method="nearest")
plot_data.plot.line(x="time", aspect=2, size=5);
_images/notebooks_quickstart_quickstart_8_0.svg

We can also plot spectra at different times.

[5]:
plot_data = dataset.data.sel(time=[1, 10, 20], method="nearest")
plot_data.plot.line(x="spectral", aspect=2, size=5);
_images/notebooks_quickstart_quickstart_10_0.svg

Preparing data

To get an idea about how to model your data, you should inspect the singular value decomposition. Pyglotaran has a function to calculate it (among other things).

[6]:
dataset = prepare_time_trace_dataset(dataset)
dataset
[6]:
<xarray.Dataset>
Dimensions:                      (left_singular_value_index: 72, right_singular_value_index: 72, singular_value_index: 72, spectral: 72, time: 2100)
Coordinates:
  * time                         (time) float64 -1.0 -0.99 -0.98 ... 19.98 19.99
  * spectral                     (spectral) float64 600.0 601.4 ... 698.0 699.4
Dimensions without coordinates: left_singular_value_index, right_singular_value_index, singular_value_index
Data variables:
    data                         (time, spectral) float64 0.01272 ... 1.542
    data_left_singular_vectors   (time, left_singular_value_index) float64 -2...
    data_singular_values         (singular_value_index) float64 4.62e+03 ... ...
    data_right_singular_vectors  (right_singular_value_index, spectral) float64 ...

First, take a look at the first 10 singular values:

[7]:
plot_data = dataset.data_singular_values.sel(singular_value_index=range(0, 10))
plot_data.plot(yscale="log", marker="o", linewidth=0, aspect=2, size=5);
_images/notebooks_quickstart_quickstart_14_0.svg

Working with models

To analyze our data, we need to create a model.

Create a file called model.yaml in your working directory and fill it with the following:

[8]:
display_file("model.yaml", syntax="yaml")
[8]:
type: kinetic-spectrum

initial_concentration:
 input:
   compartments: [s1, s2, s3]
   parameters: [input.1, input.0, input.0]

k_matrix:
 k1:
   matrix:
     (s2, s1): kinetic.1
     (s3, s2): kinetic.2
     (s3, s3): kinetic.3

megacomplex:
 m1:
   k_matrix: [k1]

irf:
 irf1:
   type: gaussian
   center: irf.center
   width: irf.width

dataset:
 dataset1:
   initial_concentration: input
   megacomplex: [m1]
   irf: irf1

Now you can load the model file.

[9]:
model = load_model("model.yaml")

You can check your model for problems with model.validate.

[10]:
model.validate()
[10]:
'Your model is valid.'

Working with parameters

Now define some starting parameters. Create a file called parameters.yaml with the following content.

[11]:
display_file("parameters.yaml", syntax="yaml")
[11]:
input:
 - ['1', 1, {'vary': False, 'non-negative': False}]
 - ['0', 0, {'vary': False, 'non-negative': False}]

kinetic: [
    0.5,
    0.3,
    0.1,
]

irf:
 - ['center', 0.3]
 - ['width', 0.1]
[12]:
parameters = load_parameters("parameters.yaml")

You can model.validate also to check for missing parameters.

[13]:
model.validate(parameters=parameters)
[13]:
'Your model is valid.'

Since not all problems in the model can be detected automatically it is wise to visually inspect the model. For this purpose, you can just print the model.

[14]:
model
[14]:

Model

Type: kinetic-spectrum

Initial Concentration
  • input:

  • Label: input

  • Compartments: [‘s1’, ‘s2’, ‘s3’]

  • Parameters: [input.1, input.0, input.0]

  • Exclude From Normalize: []

K Matrix
  • k1:

  • Label: k1

  • Matrix:

    • (‘s2’, ‘s1’): kinetic.1

    • (‘s3’, ‘s2’): kinetic.2

    • (‘s3’, ‘s3’): kinetic.3

Irf
  • irf1 (gaussian):

  • Label: irf1

  • Type: gaussian

  • Center: irf.center

  • Width: irf.width

  • Normalize: True

  • Backsweep: False

Dataset
  • dataset1:

  • Label: dataset1

  • Megacomplex: [‘m1’]

  • Initial Concentration: input

  • Irf: irf1

Megacomplex
  • m1 (None):

  • Label: m1

  • K Matrix: [‘k1’]

The same way you should inspect your parameters.

[15]:
parameters
[15]:
  • input:

    Label

    Value

    StdErr

    Min

    Max

    Vary

    Non-Negative

    Expr

    1

    1

    0

    -inf

    inf

    False

    False

    None

    0

    0

    0

    -inf

    inf

    False

    False

    None

  • irf:

    Label

    Value

    StdErr

    Min

    Max

    Vary

    Non-Negative

    Expr

    center

    0.3

    0

    -inf

    inf

    True

    False

    None

    width

    0.1

    0

    -inf

    inf

    True

    False

    None

  • kinetic:

    Label

    Value

    StdErr

    Min

    Max

    Vary

    Non-Negative

    Expr

    1

    0.5

    0

    -inf

    inf

    True

    False

    None

    2

    0.3

    0

    -inf

    inf

    True

    False

    None

    3

    0.1

    0

    -inf

    inf

    True

    False

    None

Optimizing data

Now we have everything together to optimize our parameters. First we import optimize.

[16]:
scheme = Scheme(model, parameters, {"dataset1": dataset})
result = optimize(scheme)
result
   Iteration     Total nfev        Cost      Cost reduction    Step norm     Optimality
       0              1         7.5559e+00                                    1.58e+01
       1              2         7.5557e+00      2.17e-04       8.28e-05       6.62e-02
       2              3         7.5557e+00      3.99e-11       4.37e-09       5.12e-06
Both `ftol` and `xtol` termination conditions are satisfied.
Function evaluations 3, initial cost 7.5559e+00, final cost 7.5557e+00, first-order optimality 5.12e-06.
[16]:

Optimization Result

Number of residual evaluation

3

Number of variables

5

Number of datapoints

151200

Degrees of freedom

151195

Chi Square

1.51e+01

Reduced Chi Square

9.99e-05

Root Mean Square Error (RMSE)

1.00e-02

Model

Type: kinetic-spectrum

Initial Concentration
  • input:

  • Label: input

  • Compartments: [‘s1’, ‘s2’, ‘s3’]

  • Parameters: [input.1: 1.00000e+00 (fixed), input.0: 0.00000e+00 (fixed), input.0: 0.00000e+00 (fixed)]

  • Exclude From Normalize: []

K Matrix
  • k1:

  • Label: k1

  • Matrix:

    • (‘s2’, ‘s1’): kinetic.1: 5.00082e-01 (StdErr: 7e-05 ,initial: 5.00000e-01)

    • (‘s3’, ‘s2’): kinetic.2: 2.99990e-01 (StdErr: 4e-05 ,initial: 3.00000e-01)

    • (‘s3’, ‘s3’): kinetic.3: 9.99963e-02 (StdErr: 5e-06 ,initial: 1.00000e-01)

Irf
  • irf1 (gaussian):

  • Label: irf1

  • Type: gaussian

  • Center: irf.center: 3.00002e-01 (StdErr: 5e-06 ,initial: 3.00000e-01)

  • Width: irf.width: 1.00006e-01 (StdErr: 7e-06 ,initial: 1.00000e-01)

  • Normalize: True

  • Backsweep: False

Dataset
  • dataset1:

  • Label: dataset1

  • Megacomplex: [‘m1’]

  • Initial Concentration: input

  • Irf: irf1

Megacomplex
  • m1 (None):

  • Label: m1

  • K Matrix: [‘k1’]

[17]:
result.optimized_parameters
[17]:
  • input:

    Label

    Value

    StdErr

    Min

    Max

    Vary

    Non-Negative

    Expr

    1

    1

    0

    -inf

    inf

    False

    False

    None

    0

    0

    0

    -inf

    inf

    False

    False

    None

  • irf:

    Label

    Value

    StdErr

    Min

    Max

    Vary

    Non-Negative

    Expr

    center

    0.300002

    5.00976e-06

    -inf

    inf

    True

    False

    None

    width

    0.100006

    6.70229e-06

    -inf

    inf

    True

    False

    None

  • kinetic:

    Label

    Value

    StdErr

    Min

    Max

    Vary

    Non-Negative

    Expr

    1

    0.500082

    7.2547e-05

    -inf

    inf

    True

    False

    None

    2

    0.29999

    4.18992e-05

    -inf

    inf

    True

    False

    None

    3

    0.0999963

    4.77853e-06

    -inf

    inf

    True

    False

    None

You can get the resulting data for your dataset with result.get_dataset.

[18]:
result_dataset = result.data["dataset1"]
result_dataset
[18]:
<xarray.Dataset>
Dimensions:                                   (clp_label: 3, component: 3, from_species: 3, left_singular_value_index: 72, right_singular_value_index: 72, singular_value_index: 72, species: 3, spectral: 72, time: 2100, to_species: 3)
Coordinates:
  * time                                      (time) float64 -1.0 ... 19.99
  * spectral                                  (spectral) float64 600.0 ... 699.4
  * clp_label                                 (clp_label) <U2 's1' 's2' 's3'
  * species                                   (species) <U2 's1' 's2' 's3'
    rate                                      (component) float64 -0.5001 ......
    lifetime                                  (component) float64 -2.0 ... -10.0
  * to_species                                (to_species) <U2 's1' 's2' 's3'
  * from_species                              (from_species) <U2 's1' 's2' 's3'
Dimensions without coordinates: component, left_singular_value_index, right_singular_value_index, singular_value_index
Data variables: (12/23)
    data                                      (time, spectral) float64 0.0127...
    data_left_singular_vectors                (time, left_singular_value_index) float64 ...
    data_singular_values                      (singular_value_index) float64 ...
    data_right_singular_vectors               (spectral, right_singular_value_index) float64 ...
    matrix                                    (time, clp_label) float64 6.153...
    clp                                       (spectral, clp_label) float64 1...
    ...                                        ...
    decay_associated_spectra                  (spectral, component) float64 2...
    a_matrix                                  (component, species) float64 1....
    k_matrix                                  (to_species, from_species) float64 ...
    k_matrix_reduced                          (to_species, from_species) float64 ...
    irf_center                                float64 0.3
    irf_width                                 float64 0.1
Attributes:
    root_mean_square_error:           0.009997166315912881
    weighted_root_mean_square_error:  0.009997166315912881

Visualize the Result

The resulting data can be visualized the same way as the dataset. To judge the quality of the fit, you should look at first left and right singular vectors of the residual.

[19]:
residual_left = result_dataset.residual_left_singular_vectors.sel(left_singular_value_index=0)
residual_right = result_dataset.residual_right_singular_vectors.sel(right_singular_value_index=0)
residual_left.plot.line(x="time", aspect=2, size=5)
residual_right.plot.line(x="spectral", aspect=2, size=5);
_images/notebooks_quickstart_quickstart_36_0.svg
_images/notebooks_quickstart_quickstart_36_1.svg

Finally, you can save your result.

[20]:
save_dataset(result_dataset, "dataset1.nc")

Changelog

0.4.0 (2021-06-25)

✨ Features

  • Add basic spectral model (#672)

  • Add Channel/Wavelength dependent shift parameter to irf. (#673)

  • Refactored Problem class into GroupedProblem and UngroupedProblem (#681)

  • Plugin system was rewritten (#600, #665)

  • Deprecation framework (#631)

  • Better notebook integration (#689)

🩹 Bug fixes

  • Fix excessive memory usage in _create_svd (#576)

  • Fix several issues with KineticImage model (#612)

  • Fix exception in sdt reader index calculation (#647)

  • Avoid crash in result markdown printing when optimization fails (#630)

  • ParameterNotFoundException doesn’t prepend ‘.’ if path is empty (#688)

  • Ensure Parameter.label is str or None (#678)

  • Properly scale StdError of estimated parameters with RMSE (#704)

  • More robust covariance_matrix calculation (#706)

  • ParameterGroup.markdown() independent parametergroups of order (#592)

🔌 Plugins

  • ProjectIo ‘folder’/’legacy’ plugin to save results (#620)

  • Model ‘spectral-model’ (#672)

📚 Documentation

  • User documentation is written in notebooks (#568)

  • Documentation on how to write a DataIo plugin (#600)

🗑️ Deprecations (due in 0.6.0)

  • glotaran.ParameterGroup -> glotaran.parameterParameterGroup

  • glotaran.read_model_from_yaml -> glotaran.io.load_model(..., format_name="yaml_str")

  • glotaran.read_model_from_yaml_file -> glotaran.io.load_model(..., format_name="yaml")

  • glotaran.read_parameters_from_csv_file -> glotaran.io.load_parameters(..., format_name="csv")

  • glotaran.read_parameters_from_yaml -> glotaran.io.load_parameters(..., format_name="yaml_str")

  • glotaran.read_parameters_from_yaml_file -> glotaran.io.load_parameters(..., format_name="yaml")

  • glotaran.io.read_data_file -> glotaran.io.load_dataset

  • result.save -> glotaran.io.save_result(result, ..., format_name="legacy")

  • result.get_dataset("<dataset_name>") -> result.data["<dataset_name>"]

  • glotaran.analysis.result -> glotaran.project.result

  • glotaran.analysis.scheme -> glotaran.project.scheme

  • model.simulate -> glotaran.analysis.simulation.simulate(model, ...)

0.3.3 (2021-03-18)

  • Force recalculation of SVD attributes in scheme._prepare_data (#597)

  • Remove unneeded check in spectral_penalties._get_area Fixes (#598)

  • Added python 3.9 support (#450)

0.3.2 (2021-02-28)

  • Re-release of version 0.3.1 due to packaging issue

0.3.1 (2021-02-28)

  • Added compatibility for numpy 1.20 and raised minimum required numpy version to 1.20 (#555)

  • Fixed excessive memory consumption in result creation due to full SVD computation (#574)

  • Added feature parameter history (#557)

  • Moved setup logic to setup.cfg (#560)

0.3.0 (2021-02-11)

  • Significant code refactor with small API changes to parameter relation specification (see docs)

  • Replaced lmfit with scipy.optimize

0.2.0 (2020-12-02)

  • Large refactor with significant improvements but also small API changes (see docs)

  • Removed doas plugin

0.1.0 (2020-07-14)

  • Package was renamed to pyglotaran on PyPi

0.0.8 (2018-08-07)

  • Changed nan_policiy to omit

0.0.7 (2018-08-07)

  • Added support for multiple shapes per compartment.

0.0.6 (2018-08-07)

  • First release on PyPI, support for Windows installs added.

  • Pre-Alpha Development

Authors

Development Lead

Contributors

Special Thanks

  • Stefan Schuetz

  • Sergey P. Laptenok

Supervision

Original publications

  1. Joris J. Snellenburg, Sergey Laptenok, Ralf Seger, Katharine M. Mullen, Ivo H. M. van Stokkum. “Glotaran: A Java-Based Graphical User Interface for the R Package TIMP”. Journal of Statistical Software (2012), Volume 49, Number 3, Pages: 1–22. URL https://dx.doi.org/10.18637/jss.v049.i03

  2. Katharine M. Mullen, Ivo H. M. van Stokkum. “TIMP: An R Package for Modeling Multi-way Spectroscopic Measurements”. Journal of Statistical Software (2007), Volume 18, Number 3, Pages 1-46, ISSN 1548-7660. URL https://dx.doi.org/10.18637/jss.v018.i03

  3. Ivo H. M. van Stokkum, Delmar S. Larsen, Rienk van Grondelle, “Global and target analysis of time-resolved spectra”. Biochimica et Biophysica Acta (BBA) - Bioenergetics (2004), Volume 1657, Issues 2–3, Pages 82-104, ISSN 0005-2728. URL https://doi.org/10.1016/j.bbabio.2004.04.011

Overview

Data IO

Plotting

Modelling

Parameter

Optimizing

API Documentation

The API Documentation for pyglotaran is automatically created from its docstrings.

glotaran

Glotaran package __init__.py

Plugins

To be as flexible as possible pyglotaran uses a plugin system to handle new Models, DataIo and ProjectIo. Those plugins can be defined by pyglotaran itself, the user or a 3rd party plugin package.

Builtin plugins

Models

  • KineticSpectrumModel

  • KineticImageModel

Data Io

Plugins reading and writing data to and from xarray.Dataset or xarray.DataArray.

  • AsciiDataIo

  • NetCDFDataIo

  • SdtDataIo

Project Io

Plugins reading and writing, Model,:class:Schema,:class:ParameterGroup or Result.

  • YmlProjectIo

  • CsvProjectIo

  • FolderProjectIo

Reproducibility and plugins

With a plugin ecosystem there always is the possibility that multiple plugins try register under the same format/name. This is why plugins are registered at least twice. Once under the name the developer intended and secondly under their full name (full import path). This allows to ensure that a specific plugin is used by manually specifying the plugin, so if someone wants to run your analysis the results will be reproducible even if they have conflicting plugins installed. You can gain all information about the installed plugins by calling the corresponding *_plugin_table function with both options (plugin_names and full_names) set to true. To pin a used plugin use the corresponding set_*_plugin function with the intended name (format_name/model_name) and the full name (full_plugin_name) of the plugin to use.

If you wanted to ensure that the pyglotaran builtin plugin is used for sdt files you could add the following lines to the beginning of your analysis code.

from glotaran.io import set_data_plugin
set_data_plugin("sdt", "glotaran.builtin.io.sdt.sdt_file_reader.SdtDataIo_sdt")

Models

The functions for model plugins are located in glotaran.model and called model_plugin_table and set_model_plugin.

Data Io

The functions for data io plugins are located in glotaran.io and called data_io_plugin_table and set_data_plugin.

Project Io

The functions for project io plugins are located in glotaran.io and called project_io_plugin_table and set_project_plugin.

3rd party plugins

Plugins not part of pyglotaran itself.

  • Not yet, why not be the first? Tell us about your plugin and we will feature it here.

Contributing

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

You can contribute in many ways:

Types of Contributions

Report Bugs

Report bugs at https://github.com/glotaran/pyglotaran/issues.

If you are reporting a bug, please include:

  • Your operating system name and version.

  • Any details about your local setup that might be helpful in troubleshooting.

  • Detailed steps to reproduce the bug.

Fix Bugs

Look through the GitHub issues for bugs. Anything tagged with “bug” and “help wanted” is open to whoever wants to implement it.

Implement Features

Look through the GitHub issues for features. Anything tagged with “enhancement” and “help wanted” is open to whoever wants to implement it.

Write Documentation

pyglotaran could always use more documentation, whether as part of the official pyglotaran docs, in docstrings, or even on the web in blog posts, articles, and such. If you are writing docstrings please use the NumPyDoc style to write them.

Submit Feedback

The best way to send feedback is to file an issue at https://github.com/glotaran/pyglotaran/issues.

If you are proposing a feature:

  • Explain in detail how it would work.

  • Keep the scope as narrow as possible, to make it easier to implement.

  • Remember that this is a volunteer-driven project, and that contributions are welcome :)

Get Started!

Ready to contribute? Here’s how to set up pyglotaran for local development.

  1. Fork the pyglotaran repo on GitHub.

  2. Clone your fork locally:

    $ git clone https://github.com/<your_name_here>/pyglotaran.git
    
  3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:

    $ mkvirtualenv pyglotaran
    (pyglotaran)$ cd pyglotaran
    (pyglotaran)$ python -m pip install -r requirements_dev.txt
    (pyglotaran)$ pip install -e . --process-dependency-links
    
  4. Install the pre-commit hooks, to automatically format and check your code:

    $ pre-commit install
    
  5. Create a branch for local development:

    $ git checkout -b name-of-your-bugfix-or-feature
    

    Now you can make your changes locally.

  6. When you’re done making changes, check that your changes pass flake8 and the tests, including testing other Python versions with tox:

    $ pre-commit run -a
    $ py.test
    

    Or to run all at once:

    $ tox
    
  7. Commit your changes and push your branch to GitHub:

    $ git add .
    $ git commit -m "Your detailed description of your changes."
    $ git push origin name-of-your-bugfix-or-feature
    
  8. Submit a pull request through the GitHub website.

Note

By default pull requests will use the template located at .github/PULL_REQUEST_TEMPLATE.md. But we also provide custom tailored templates located inside of .github/PULL_REQUEST_TEMPLATE. Sadly the GitHub Web Interface doesn’t provide an easy way to select them as it does for issue templates (see this comment for more details).

To use them you need to add the following query parameters to the url when creating the pull request and hit enter:

  • ✨ Feature PR: ?expand=1&template=feature_PR.md

  • 🩹 Bug Fix PR: ?expand=1&template=bug_fix_PR

  • 📚 Documentation PR: ?expand=1&template=docs_PR.md

Pull Request Guidelines

Before you submit a pull request, check that it meets these guidelines:

  1. The pull request should include tests.

  2. If the pull request adds functionality, the docs should be updated. Put your new functionality into a function with a docstring.

  3. The pull request should work for Python 3.8 and 3.9 Check your Github Actions https://github.com/<your_name_here>/pyglotaran/actions and make sure that the tests pass for all supported Python versions.

Docstrings

We use numpy style docstrings, which can also be autogenerated from function/method signatures by extensions for your editor.

Some extensions for popular editors are:

Note

If your pull request improves the docstring coverage (check pre-commit run -a interrogate), please raise the value of the interrogate setting fail-under in pyproject.toml. That way the next person will improve the docstring coverage as well and everyone can enjoy a better documentation.

Warning

As soon as all our docstrings are in proper shape we will enforce that it stays that way. If you want to check if your docstrings are fine you can use pydocstyle and darglint.

Tips

To run a subset of tests:

$ py.test tests.test_pyglotaran

Deprecations

Only maintainers are allowed to decide about deprecations, thus you should first open an issue and check back with them if they are ok with deprecating something.

To make deprecations as robust as possible and give users all needed information to adjust their code, we provide helper functions inside the module glotaran.deprecation.

The functions you most likely want to use are

  • deprecate() for functions, methods and classes

  • warn_deprecated() for call arguments

  • deprecate_module_attribute() for module attributes

  • deprecate_submodule() for modules

Those functions not only make it easier to deprecate something, but they also check that that deprecations will be removed when they are due and that at least the imports in the warning work. Thus all deprecations need to be tested.

Tests for deprecations should be placed in glotaran/deprecation/modules/test which also provides the test helper functions deprecation_warning_on_call_test_helper and changed_import_test_warn. Since the tests for deprecation are mainly for maintainability and not to test the functionality (those tests should be in the appropriate place) deprecation_warning_on_call_test_helper will by default just test that a DeprecationWarning was raised and ignore all raise Exception s. An exception to this rule is when adding back removed functionality (which shouldn’t happen in the first place but might), which should be implemented in a file under glotaran/deprecation/modules and filenames should be like the relative import path from glotaran root, but with _ instead of ..

E.g. glotaran.analysis.scheme would map to analysis_scheme.py

The only exceptions to this rule are the root __init__.py which is named glotaran_root.py and testing changed imports which should be placed in test_changed_imports.py.

Deprecating a Function, method or class

Deprecating a function, method or class is as easy as adding the deprecate decorator to it. Other decorators (e.g. @staticmethod or @classmethod) should be placed both deprecate in order to work.

glotaran/some_module.py
from glotaran.deprecation import deprecate

@deprecate(
    deprecated_qual_name_usage="glotaran.some_module.function_to_deprecate(filename)",
    new_qual_name_usage='glotaran.some_module.new_function(filename, format_name="legacy")',
    to_be_removed_in_version="0.6.0",
)
def function_to_deprecate(*args, **kwargs):
    ...

Deprecating a call argument

When deprecating a call argument you should use warn_deprecated and set the argument to deprecate to a default value (e.g. "deprecated") to check against. Note that for this use case we need to set check_qual_names=(False, False) which will deactivate the import testing. This might not always be possible, e.g. if the argument is positional only, so it might make more sense to deprecate the whole callable, just discuss what to do with our trusted maintainers.

glotaran/some_module.py
from glotaran.deprecation import deprecate

def function_to_deprecate(args1, new_arg="new_default_behavior", deprecated_arg="deprecated", **kwargs):
    if deprecated_arg != "deprecated":
        warn_deprecated(
            deprecated_qual_name_usage="deprecated_arg",
            new_qual_name_usage='new_arg="legacy"',
            to_be_removed_in_version="0.6.0",
            check_qual_names=(False, False)
        )
        new_arg = "legacy"
    ...

Deprecating a module attribute

Sometimes it might be necessary to remove an attribute (function, class, or constant) from a module to prevent circular imports or just to streamline the API. In those cases you would use deprecate_module_attribute inside a module __getattr__ function definition. This will import the attribute from the new location and return it when an import or use is requested.

glotaran/old_package/__init__.py
def __getattr__(attribute_name: str):
    from glotaran.deprecation import deprecate_module_attribute

    if attribute_name == "deprecated_attribute":
        return deprecate_module_attribute(
            deprecated_qual_name="glotaran.old_package.deprecated_attribute",
            new_qual_name="glotaran.new_package.new_attribute_name",
            to_be_removed_in_version="0.6.0",
        )

    raise AttributeError(f"module {__name__} has no attribute {attribute_name}")

Deprecating a submodule

For a better logical structure, it might be needed to move modules to a different location in the project. In those cases, you would use deprecate_submodule, which imports the module from the new location, add it to sys.modules and as an attribute to the parent package.

glotaran/old_package/__init__.py
from glotaran.deprecation import deprecate_submodule

module_name = deprecate_submodule(
    deprecated_module_name="glotaran.old_package.module_name",
    new_module_name="glotaran.new_package.new_module_name",
    to_be_removed_in_version="0.6.0",
)

Deploying

A reminder for the maintainers on how to deploy. Make sure all your changes are committed (including an entry in HISTORY.rst), the version number only needs to be changed in glotaran/__init__.py.

Then make a new release on GitHub and give the tag a proper name, e.g. 0.3.0 since might be included in a citation.

Github Actions will then deploy to PyPI if the tests pass.

Plugin development

If you don’t find the plugin that fits your needs you can always write your own. This sections will explain you how and what you need to know.

In time we will also provide you with a cookiecutter template, to kickstart your new plugin for publishing as a package on PyPi.

This page was generated from docs/source/notebooks/plugin_system/plugin_howto_write_a_io_plugin.ipynb. Interactive online version: Binder badge

How to Write your own Io plugin

There are all kinds of different data formats, so it is quite likely that your experimental setup uses a format which isn’t yet supported by a glotaran plugin and want to write your own DataIo plugin to support this format.

Since json is very common format (admittedly not for data, but in general) and python has builtin support for it we will use it as an example.

First let’s have a look which DataIo plugins are already installed and which functions they support.

[1]:
from glotaran.io import data_io_plugin_table
[2]:
data_io_plugin_table()
[2]:

Format name

load_dataset

save_dataset

ascii

*

*

nc

*

*

sdt

*

/

Looks like there isn’t a json plugin installed yet, but maybe someone else did already write one, so have a look at the `3rd party plugins list in the user docsumentation <https://pyglotaran.readthedocs.io/en/latest/user_documentation/using_plugins.html>`__ before you start writing your own plugin.

For the sake of the example, we will write our json plugin even if there already exists one by the time you read this.

First you need to import all needed libraries and functions.

  • from __future__ import annotations: needed to write python 3.10 typing syntax (|), even with a lower python version

  • json,xarray: Needed for reading and writing itself

  • DataIoInterface: needed to subclass from, this way you get the proper type and especially signature checking

  • register_data_io: registers the DataIo plugin under the given format_names

[3]:
from __future__ import annotations

import json

import xarray as xr

from glotaran.io.interface import DataIoInterface
from glotaran.plugin_system.data_io_registration import register_data_io

DataIoInterface has two methods we could implement load_dataset and save_dataset, which are used by the identically named functions in glotaran.io.

We will just implement both for our example to be complete. the quickest way to get started is to just copy over the code from DataIoInterface which already has the right signatures and some boilerplate docstrings, for the method arguments.

If the default arguments aren’t enough for your plugin and you need your methods to have additional option, you can just add those. Note the * between file_name and my_extra_option, this tell python that my_extra_option is an keyword only argument and `mypy <https://github.com/python/mypy>`__ won’t raise an [override] type error for changing the signature of the method. To help others who might use your plugin and your future self, it is good practice to documents what each parameter does in the methods docstring, which will be accessed by the help function.

Finally add the @register_data_io with the format_name’s you want to register the plugin to, in our case json and my_json.

Pro tip: You don’t need to implement the whole functionality inside of the method itself,

[4]:
@register_data_io(["json", "my_json"])
class JsonDataIo(DataIoInterface):
    """My new shiny glotaran plugin for json data io"""

    def load_dataset(
        self, file_name: str, *, my_extra_option: str = None
    ) -> xr.Dataset | xr.DataArray:
        """Read json data to xarray.Dataset


        Parameters
        ----------
        file_name : str
            File containing the data.
        my_extra_option: str
            This argument is only for demonstration
        """
        if my_extra_option is not None:
            print(f"Using my extra option loading json: {my_extra_option}")

        with open(file_name) as json_file:
            data_dict = json.load(json_file)
        return xr.Dataset.from_dict(data_dict)

    def save_dataset(
        self, dataset: xr.Dataset | xr.DataArray, file_name: str, *, my_extra_option=None
    ):
        """Write xarray.Dataset to a json file

        Parameters
        ----------
        dataset : xr.Dataset
            Dataset to be saved to file.
        file_name : str
            File to write the result data to.
        my_extra_option: str
            This argument is only for demonstration
        """
        if my_extra_option is not None:
            print(f"Using my extra option for writing json: {my_extra_option}")

        data_dict = dataset.to_dict()
        with open(file_name, "w") as json_file:
            json.dump(data_dict, json_file)

Let’s verify that our new plugin was registered successfully under the format_names json and my_json.

[5]:
data_io_plugin_table()
[5]:

Format name

load_dataset

save_dataset

ascii

*

*

json

*

*

my_json

*

*

nc

*

*

sdt

*

/

Now let’s use the example data from the quickstart to test the reading and writing capabilities of our plugin.

[6]:
from glotaran.examples.sequential import dataset
from glotaran.io import load_dataset
from glotaran.io import save_dataset
[7]:
dataset
[7]:
<xarray.Dataset>
Dimensions:   (spectral: 72, time: 2100)
Coordinates:
  * time      (time) float64 -1.0 -0.99 -0.98 -0.97 ... 19.96 19.97 19.98 19.99
  * spectral  (spectral) float64 600.0 601.4 602.8 604.2 ... 696.6 698.0 699.4
Data variables:
    data      (time, spectral) float64 0.003305 0.002727 ... 1.713 1.53

To get a feeling for our data, let’s plot some traces.

[8]:
plot_data = dataset.data.sel(spectral=[620, 630, 650], method="nearest")
plot_data.plot.line(x="time", aspect=2, size=5)
[8]:
[<matplotlib.lines.Line2D at 0x7f7012fa36d0>,
 <matplotlib.lines.Line2D at 0x7f7012fa3700>,
 <matplotlib.lines.Line2D at 0x7f7012fa3760>]
_images/notebooks_plugin_system_plugin_howto_write_a_io_plugin_14_1.svg

Since we want to see a difference of our saved and loaded data, we divide the amplitudes by 2 for no reason.

[9]:
dataset["data"] = dataset.data / 2

Now that we changed the data, let’s write them to a file.

But in which order were the arguments again? And are there any additional option?

Good thing we documented our new plugin, so we can just lookup the help.

[10]:
from glotaran.io import show_data_io_method_help

show_data_io_method_help("json", "save_dataset")
Help on method save_dataset in module __main__:

save_dataset(dataset: 'xr.Dataset | xr.DataArray', file_name: 'str', *, my_extra_option=None) method of __main__.JsonDataIo instance
    Write xarray.Dataset to a json file

    Parameters
    ----------
    dataset : xr.Dataset
        Dataset to be saved to file.
    file_name : str
        File to write the result data to.
    my_extra_option: str
        This argument is only for demonstration

Note that the function save_dataset has additional arguments:

  • format_name: overwrites the inferred plugin selection

  • allow_overwrite: Allows to overwrite existing files (USE WITH CAUTION!!!)

[11]:
help(save_dataset)
Help on function save_dataset in module glotaran.plugin_system.data_io_registration:

save_dataset(dataset: 'xr.Dataset | xr.DataArray', file_name: 'str | PathLike[str]', format_name: 'str' = None, *, allow_overwrite: 'bool' = False, **kwargs: 'Any') -> 'None'
    Save data from :xarraydoc:`Dataset` or :xarraydoc:`DataArray` to a file.

    Parameters
    ----------
    dataset : xr.Dataset | xr.DataArray
        Data to be written to file.
    file_name : str | PathLike[str]
        File to write the data to.
    format_name : str
        Format the file should be in, if not provided it will be inferred from the file extension.
    allow_overwrite : bool
        Whether or not to allow overwriting existing files, by default False
    **kwargs : Any
        Additional keyword arguments passes to the ``write_dataset`` implementation
        of the data io plugin. If you aren't sure about those use ``get_datawriter``
        to get the implementation with the proper help and autocomplete.

Since this is just an example and we don’t overwrite important data we will use allow_overwrite=True. Also it makes writing this documentation easier, not having to manually delete the test file each time you run the cell.

[12]:
save_dataset(
    dataset, "half_intensity.json", allow_overwrite=True, my_extra_option="just as an example"
)
Using my extra option for writing json: just as an example

Now let’s test our data loading functionality.

[13]:
reloaded_data = load_dataset("half_intensity.json", my_extra_option="just as an example")
reloaded_data
Using my extra option loading json: just as an example
[13]:
<xarray.Dataset>
Dimensions:   (spectral: 72, time: 2100)
Coordinates:
  * time      (time) float64 -1.0 -0.99 -0.98 -0.97 ... 19.96 19.97 19.98 19.99
  * spectral  (spectral) float64 600.0 601.4 602.8 604.2 ... 696.6 698.0 699.4
Data variables:
    data      (time, spectral) float64 0.001653 0.001363 ... 0.8567 0.7648
[14]:
reloaded_plot_data = reloaded_data.data.sel(spectral=[620, 630, 650], method="nearest")
reloaded_plot_data.plot.line(x="time", aspect=2, size=5)
[14]:
[<matplotlib.lines.Line2D at 0x7f70129b4400>,
 <matplotlib.lines.Line2D at 0x7f70129b4430>,
 <matplotlib.lines.Line2D at 0x7f70129b4550>]
_images/notebooks_plugin_system_plugin_howto_write_a_io_plugin_25_1.svg

Since this looks like the above plot, but with half the amplitudes, so writing and reading our data worked as we hoped it would.

Writing a ProjectIo plugin words analogous:

DataIo plugin

ProjectIo plugin

Register function

glotaran.plugin_system.da ta_io_registration.register _data_io

glotaran.plugin_system.pr oject_io_registration.regis ter_project_io

Baseclass

glotaran.io.interface.Dat aIoInterface

glotaran.io.interface.Dat aIoInterface

Possible methods

load_dataset , save_dataset

load_model , save_model , load_parameters , save_parameters , load_scheme , save_scheme , load_result , save_result

Of course you don’t have to implement all methods (sometimes that doesn’t even make sense), but only the ones you need.

Last but not least:

Chances are that if you need a plugin someone else does too, so it would awesome if you would publish it open source, so the wheel isn’t reinvented over and over again.

Indices and tables