How to Write your own Io plugin
There are all kinds of different data formats, so it is quite likely that your experimental setup uses a format which isn’t yet supported by a glotaran
plugin and want to write your own DataIo
plugin to support this format.
Since json
is very common format (admittedly not for data, but in general) and python has builtin support for it we will use it as an example.
First let’s have a look which DataIo
plugins are already installed and which functions they support.
[1]:
from glotaran.io import data_io_plugin_table
[2]:
data_io_plugin_table()
[2]:
Format name |
load_dataset |
save_dataset |
---|---|---|
|
* |
* |
|
* |
* |
|
* |
/ |
Looks like there isn’t a json
plugin installed yet, but maybe someone else did already write one, so have a look at the `3rd party plugins
list in the user docsumentation <https://pyglotaran.readthedocs.io/en/latest/user_documentation/using_plugins.html>`__ before you start writing your own plugin.
For the sake of the example, we will write our json
plugin even if there already exists one by the time you read this.
First you need to import all needed libraries and functions.
from __future__ import annotations
: needed to write python 3.10 typing syntax (|
), even with a lower python versionjson
,xarray
: Needed for reading and writing itselfDataIoInterface
: needed to subclass from, this way you get the proper type and especially signature checkingregister_data_io
: registers the DataIo plugin under the givenformat_name
s
[3]:
from __future__ import annotations
import json
import xarray as xr
from glotaran.io.interface import DataIoInterface
from glotaran.plugin_system.data_io_registration import register_data_io
DataIoInterface
has two methods we could implement load_dataset
and save_dataset
, which are used by the identically named functions in glotaran.io
.
We will just implement both for our example to be complete. the quickest way to get started is to just copy over the code from DataIoInterface
which already has the right signatures and some boilerplate docstrings, for the method arguments.
If the default arguments aren’t enough for your plugin and you need your methods to have additional option, you can just add those. Note the *
between file_name
and my_extra_option
, this tell python that my_extra_option
is an keyword only argument and `mypy
<https://github.com/python/mypy>`__ won’t raise an [override]
type error for changing the signature of the method. To help others who might use your plugin and your
future self, it is good practice to documents what each parameter does in the methods docstring, which will be accessed by the help function.
Finally add the @register_data_io
with the format_name
’s you want to register the plugin to, in our case json
and my_json
.
Pro tip: You don’t need to implement the whole functionality inside of the method itself,
[4]:
@register_data_io(["json", "my_json"])
class JsonDataIo(DataIoInterface):
"""My new shiny glotaran plugin for json data io"""
def load_dataset(
self, file_name: str, *, my_extra_option: str = None
) -> xr.Dataset | xr.DataArray:
"""Read json data to xarray.Dataset
Parameters
----------
file_name : str
File containing the data.
my_extra_option: str
This argument is only for demonstration
"""
if my_extra_option is not None:
print(f"Using my extra option loading json: {my_extra_option}")
with open(file_name) as json_file:
data_dict = json.load(json_file)
return xr.Dataset.from_dict(data_dict)
def save_dataset(
self, dataset: xr.Dataset | xr.DataArray, file_name: str, *, my_extra_option=None
):
"""Write xarray.Dataset to a json file
Parameters
----------
dataset : xr.Dataset
Dataset to be saved to file.
file_name : str
File to write the result data to.
my_extra_option: str
This argument is only for demonstration
"""
if my_extra_option is not None:
print(f"Using my extra option for writing json: {my_extra_option}")
data_dict = dataset.to_dict()
with open(file_name, "w") as json_file:
json.dump(data_dict, json_file)
Let’s verify that our new plugin was registered successfully under the format_name
s json
and my_json
.
[5]:
data_io_plugin_table()
[5]:
Format name |
load_dataset |
save_dataset |
---|---|---|
|
* |
* |
|
* |
* |
|
* |
* |
|
* |
* |
|
* |
/ |
Now let’s use the example data from the quickstart to test the reading and writing capabilities of our plugin.
[6]:
from glotaran.io import load_dataset
from glotaran.io import save_dataset
from glotaran.testing.simulated_data.sequential_spectral_decay import DATASET as dataset
[7]:
dataset
[7]:
<xarray.Dataset> Size: 1MB Dimensions: (time: 2100, spectral: 72) Coordinates: * time (time) float64 17kB -1.0 -0.99 -0.98 -0.97 ... 19.97 19.98 19.99 * spectral (spectral) float64 576B 600.0 601.4 602.8 ... 696.6 698.0 699.4 Data variables: data (time, spectral) float64 1MB 0.005208 0.008865 ... 2.548 2.31 Attributes: source_path: dataset_1.nc
To get a feeling for our data, let’s plot some traces.
[8]:
plot_data = dataset.data.sel(spectral=[620, 630, 650], method="nearest")
plot_data.plot.line(x="time", aspect=2, size=5)
Matplotlib is building the font cache; this may take a moment.
[8]:
[<matplotlib.lines.Line2D at 0x7f3c30d73d60>,
<matplotlib.lines.Line2D at 0x7f3c30d73d90>,
<matplotlib.lines.Line2D at 0x7f3c30d73e80>]
Since we want to see a difference of our saved and loaded data, we divide the amplitudes by 2 for no reason.
[9]:
dataset["data"] = dataset.data / 2
Now that we changed the data, let’s write them to a file.
But in which order were the arguments again? And are there any additional option?
Good thing we documented our new plugin, so we can just lookup the help.
[10]:
from glotaran.io import show_data_io_method_help
show_data_io_method_help("json", "save_dataset")
Help on method save_dataset in module __main__:
save_dataset(dataset: 'xr.Dataset | xr.DataArray', file_name: 'str', *, my_extra_option=None) method of __main__.JsonDataIo instance
Write xarray.Dataset to a json file
Parameters
----------
dataset : xr.Dataset
Dataset to be saved to file.
file_name : str
File to write the result data to.
my_extra_option: str
This argument is only for demonstration
Note that the function save_dataset
has additional arguments:
format_name
: overwrites the inferred plugin selectionallow_overwrite
: Allows to overwrite existing files (USE WITH CAUTION!!!)
[11]:
help(save_dataset)
Help on function save_dataset in module glotaran.plugin_system.data_io_registration:
save_dataset(dataset: 'xr.Dataset | xr.DataArray', file_name: 'StrOrPath', format_name: 'str | None' = None, *, data_filters: 'list[str] | None' = None, allow_overwrite: 'bool' = False, update_source_path: 'bool' = True, **kwargs: 'Any') -> 'None'
Save data from :xarraydoc:`Dataset` or :xarraydoc:`DataArray` to a file.
Parameters
----------
dataset : xr.Dataset | xr.DataArray
Data to be written to file.
file_name : StrOrPath
File to write the data to.
format_name : str
Format the file should be in, if not provided it will be inferred from the file extension.
data_filters : list[str] | None
Optional list of items in the dataset to be saved.
allow_overwrite : bool
Whether or not to allow overwriting existing files, by default False
update_source_path: bool
Whether or not to update the ``source_path`` attribute to ``file_name`` when saving.
by default True
**kwargs : Any
Additional keyword arguments passes to the ``write_dataset`` implementation
of the data io plugin. If you aren't sure about those use ``get_datasaver``
to get the implementation with the proper help and autocomplete.
Since this is just an example and we don’t overwrite important data we will use allow_overwrite=True
. Also it makes writing this documentation easier, not having to manually delete the test file each time you run the cell.
[12]:
save_dataset(
dataset, "half_intensity.json", allow_overwrite=True, my_extra_option="just as an example"
)
Using my extra option for writing json: just as an example
Now let’s test our data loading functionality.
[13]:
reloaded_data = load_dataset("half_intensity.json", my_extra_option="just as an example")
reloaded_data
Using my extra option loading json: just as an example
[13]:
<xarray.Dataset> Size: 1MB Dimensions: (time: 2100, spectral: 72) Coordinates: * time (time) float64 17kB -1.0 -0.99 -0.98 -0.97 ... 19.97 19.98 19.99 * spectral (spectral) float64 576B 600.0 601.4 602.8 ... 696.6 698.0 699.4 Data variables: data (time, spectral) float64 1MB 0.002604 0.004432 ... 1.274 1.155 Attributes: source_path: half_intensity.json loader: <function load_dataset at 0x7f3c44199c60>
[14]:
reloaded_plot_data = reloaded_data.data.sel(spectral=[620, 630, 650], method="nearest")
reloaded_plot_data.plot.line(x="time", aspect=2, size=5)
[14]:
[<matplotlib.lines.Line2D at 0x7f3c3077a020>,
<matplotlib.lines.Line2D at 0x7f3c3077a050>,
<matplotlib.lines.Line2D at 0x7f3c3077a140>]
Since this looks like the above plot, but with half the amplitudes, so writing and reading our data worked as we hoped it would.
Writing a ProjectIo
plugin words analogous:
|
|
|
---|---|---|
Register function |
|
|
Baseclass |
|
|
Possible methods |
|
|
Of course you don’t have to implement all methods (sometimes that doesn’t even make sense), but only the ones you need.
Last but not least:
Chances are that if you need a plugin someone else does too, so it would awesome if you would publish it open source, so the wheel isn’t reinvented over and over again.