Quickstart/Cheat-Sheet
Since this documentation is written in a jupyter-notebook we will import a little ipython helper function to display file with syntax highlighting.
[1]:
from glotaran.utils.ipython import display_file
To start using pyglotaran
in your project, you have to import it first. In addition we need to import some extra components for later use.
[2]:
from glotaran.io import load_model
from glotaran.io import load_parameters
from glotaran.io import save_dataset
from glotaran.io.prepare_dataset import prepare_time_trace_dataset
from glotaran.optimization.optimize import optimize
from glotaran.project.scheme import Scheme
Let us get some example data to analyze:
[3]:
from glotaran.testing.simulated_data.sequential_spectral_decay import DATASET as dataset
dataset
[3]:
<xarray.Dataset> Dimensions: (time: 2100, spectral: 72) Coordinates: * time (time) float64 -1.0 -0.99 -0.98 -0.97 ... 19.96 19.97 19.98 19.99 * spectral (spectral) float64 600.0 601.4 602.8 604.2 ... 696.6 698.0 699.4 Data variables: data (time, spectral) float64 -0.0007207 -0.01058 ... 2.548 2.304 Attributes: source_path: dataset_1.nc
Like all data in pyglotaran
, the dataset is a xarray.Dataset. You can find more information about the xarray
library the xarray hompage.
The loaded dataset is a simulated sequential model.
Plotting raw data
Now we lets plot some time traces.
[4]:
plot_data = dataset.data.sel(spectral=[620, 630, 650], method="nearest")
plot_data.plot.line(x="time", aspect=2, size=5);
We can also plot spectra at different times.
[5]:
plot_data = dataset.data.sel(time=[1, 10, 20], method="nearest")
plot_data.plot.line(x="spectral", aspect=2, size=5);
Preparing data
To get an idea about how to model your data, you should inspect the singular value decomposition. Pyglotaran has a function to calculate it (among other things).
[6]:
dataset = prepare_time_trace_dataset(dataset)
dataset
[6]:
<xarray.Dataset> Dimensions: (time: 2100, spectral: 72, left_singular_value_index: 72, singular_value_index: 72, right_singular_value_index: 72) Coordinates: * time (time) float64 -1.0 -0.99 -0.98 ... 19.98 19.99 * spectral (spectral) float64 600.0 601.4 ... 698.0 699.4 Dimensions without coordinates: left_singular_value_index, singular_value_index, right_singular_value_index Data variables: data (time, spectral) float64 -0.0007207 ... 2.304 data_left_singular_vectors (time, left_singular_value_index) float64 1.... data_singular_values (singular_value_index) float64 6.577e+03 ...... data_right_singular_vectors (right_singular_value_index, spectral) float64 ... Attributes: source_path: dataset_1.nc
First, take a look at the first 10 singular values:
[7]:
plot_data = dataset.data_singular_values.sel(singular_value_index=range(0, 10))
plot_data.plot(yscale="log", marker="o", linewidth=0, aspect=2, size=5);
Working with models
To analyze our data, we need to create a model.
Create a file called model.yaml
in your working directory and fill it with the following:
[8]:
display_file("model.yaml", syntax="yaml")
[8]:
default_megacomplex: decay
initial_concentration:
input:
compartments: [s1, s2, s3]
parameters: [input.1, input.0, input.0]
k_matrix:
k1:
matrix:
(s2, s1): kinetic.1
(s3, s2): kinetic.2
(s3, s3): kinetic.3
megacomplex:
m1:
k_matrix: [k1]
irf:
irf1:
type: gaussian
center: irf.center
width: irf.width
dataset:
dataset1:
initial_concentration: input
megacomplex: [m1]
irf: irf1
Now you can load the model file.
[9]:
model = load_model("model.yaml")
You can check your model for problems with model.validate
.
[10]:
model.validate()
[10]:
Your model is valid.
Working with parameters
Now define some starting parameters. Create a file called parameters.yaml
with the following content.
[11]:
display_file("parameters.yaml", syntax="yaml")
[11]:
input:
- ['1', 1, {'vary': False, 'non-negative': False}]
- ['0', 0, {'vary': False, 'non-negative': False}]
kinetic: [
0.5,
0.3,
0.1,
]
irf:
- ['center', 0.3]
- ['width', 0.1]
[12]:
parameters = load_parameters("parameters.yaml")
You can model.validate
also to check for missing parameters.
[13]:
model.validate(parameters=parameters)
[13]:
Your model is valid.
Since not all problems in the model can be detected automatically it is wise to visually inspect the model. For this purpose, you can just print the model.
[14]:
model
[14]:
Model
Megacomplex Types: decay
Dataset Groups
default:
Label: default
residual_function: variable_projection
link_clp: None
K Matrix
k1:
Label: k1
Matrix:
(‘s2’, ‘s1’): kinetic.1(nan)
(‘s3’, ‘s2’): kinetic.2(nan)
(‘s3’, ‘s3’): kinetic.3(nan)
Initial Concentration
input:
Label: input
Compartments:
s1
s2
s3
Parameters:
input.1(nan)
input.0(nan)
input.0(nan)
Exclude From Normalize:
Irf
irf1 (gaussian):
Label: irf1
Type: gaussian
Center: irf.center(nan)
Width: irf.width(nan)
Normalize: True
Backsweep: False
Megacomplex
m1 (None):
Label: m1
Dimension: time
K Matrix:
k1
Dataset
dataset1:
Label: dataset1
Group: default
Megacomplex:
m1
Initial Concentration: input
Irf: irf1
The same way you should inspect your parameters.
[15]:
parameters
[15]:
input:
Label
Value
Standard Error
t-value
Minimum
Maximum
Vary
Non-Negative
Expression
1
1.000e+00
nan
nan
-inf
inf
False
False
None
0
0.000e+00
nan
nan
-inf
inf
False
False
None
irf:
Label
Value
Standard Error
t-value
Minimum
Maximum
Vary
Non-Negative
Expression
center
3.000e-01
nan
nan
-inf
inf
True
False
None
width
1.000e-01
nan
nan
-inf
inf
True
False
None
kinetic:
Label
Value
Standard Error
t-value
Minimum
Maximum
Vary
Non-Negative
Expression
1
5.000e-01
nan
nan
-inf
inf
True
False
None
2
3.000e-01
nan
nan
-inf
inf
True
False
None
3
1.000e-01
nan
nan
-inf
inf
True
False
None
Optimizing data
Now we have everything together to optimize our parameters. First we import optimize.
[16]:
scheme = Scheme(model, parameters, {"dataset1": dataset})
result = optimize(scheme)
result
Iteration Total nfev Cost Cost reduction Step norm Optimality
0 1 7.5684e+00 1.95e+02
1 2 7.5680e+00 4.56e-04 6.99e-05 1.10e-01
2 3 7.5680e+00 4.81e-11 4.78e-09 8.58e-06
Both `ftol` and `xtol` termination conditions are satisfied.
Function evaluations 3, initial cost 7.5684e+00, final cost 7.5680e+00, first-order optimality 8.58e-06.
[16]:
Optimization Result |
|
---|---|
Number of residual evaluation |
3 |
Number of parameters |
5 |
Number of datapoints |
151200 |
Degrees of freedom |
151195 |
Chi Square |
1.51e+01 |
Reduced Chi Square |
1.00e-04 |
Root Mean Square Error (RMSE) |
1.00e-02 |
Model
Megacomplex Types: decay
Dataset Groups
default:
Label: default
residual_function: variable_projection
link_clp: None
K Matrix
k1:
Label: k1
Matrix:
(‘s2’, ‘s1’): kinetic.1(5.00e-01±6.79e-05, t-value: 7363, initial: 5.00e-01)
(‘s3’, ‘s2’): kinetic.2(3.00e-01±3.94e-05, t-value: 7619, initial: 3.00e-01)
(‘s3’, ‘s3’): kinetic.3(1.00e-01±4.23e-06, t-value: 23655, initial: 1.00e-01)
Initial Concentration
input:
Label: input
Compartments:
s1
s2
s3
Parameters:
input.1(1.00e+00, fixed)
input.0(0.00e+00, fixed)
input.0(0.00e+00, fixed)
Exclude From Normalize:
Irf
irf1 (gaussian):
Label: irf1
Type: gaussian
Center: irf.center(3.00e-01±5.04e-06, t-value: 59524, initial: 3.00e-01)
Width: irf.width(1.00e-01±6.72e-06, t-value: 14871, initial: 1.00e-01)
Normalize: True
Backsweep: False
Megacomplex
m1 (None):
Label: m1
Dimension: time
K Matrix:
k1
Dataset
dataset1:
Label: dataset1
Group: default
Megacomplex:
m1
Initial Concentration: input
Irf: irf1
[17]:
result.optimized_parameters
[17]:
input:
Label
Value
Standard Error
t-value
Minimum
Maximum
Vary
Non-Negative
Expression
1
1.000e+00
nan
nan
-inf
inf
False
False
None
0
0.000e+00
nan
nan
-inf
inf
False
False
None
irf:
Label
Value
Standard Error
t-value
Minimum
Maximum
Vary
Non-Negative
Expression
center
3.000e-01
5.040e-06
59524
-inf
inf
True
False
None
width
1.000e-01
6.725e-06
14871
-inf
inf
True
False
None
kinetic:
Label
Value
Standard Error
t-value
Minimum
Maximum
Vary
Non-Negative
Expression
1
4.999e-01
6.790e-05
7363
-inf
inf
True
False
None
2
3.000e-01
3.938e-05
7619
-inf
inf
True
False
None
3
1.000e-01
4.227e-06
23655
-inf
inf
True
False
None
You can get the resulting data for your dataset with result.get_dataset
.
[18]:
result_dataset = result.data["dataset1"]
result_dataset
[18]:
<xarray.Dataset> Dimensions: (clp_label: 3, time: 2100, spectral: 72, left_singular_value_index: 72, singular_value_index: 72, right_singular_value_index: 72, species: 3, component_m1: 3, species_m1: 3, to_species_m1: 3, from_species_m1: 3) Coordinates: * clp_label (clp_label) object 's1' 's2' 's3' * time (time) float64 -1.0 ... 19.99 * spectral (spectral) float64 600.0 ... 699.4 * species (species) <U2 's1' 's2' 's3' * component_m1 (component_m1) int64 1 2 3 rate_m1 (component_m1) float64 0.4999 .... lifetime_m1 (component_m1) float64 2.0 ... ... * species_m1 (species_m1) <U2 's1' 's2' 's3' initial_concentration_m1 (species_m1) float64 1.0 0.0 0.0 * to_species_m1 (to_species_m1) <U2 's1' 's2' 's3' * from_species_m1 (from_species_m1) <U2 's1' ... ... Dimensions without coordinates: left_singular_value_index, singular_value_index, right_singular_value_index Data variables: (12/25) data (time, spectral) float64 -0.000... data_left_singular_vectors (time, left_singular_value_index) float64 ... data_singular_values (singular_value_index) float64 ... data_right_singular_vectors (spectral, right_singular_value_index) float64 ... matrix (time, clp_label) float64 6.131... clp (spectral, clp_label) float64 1... ... ... irf_center float64 0.3 irf_width float64 0.1 decay_associated_spectra_m1 (spectral, component_m1) float64 ... a_matrix_m1 (component_m1, species_m1) float64 ... k_matrix_m1 (to_species_m1, from_species_m1) float64 ... k_matrix_reduced_m1 (to_species_m1, from_species_m1) float64 ... Attributes: source_path: dataset_1.nc global_dimension: spectral model_dimension: time root_mean_square_error: 0.010005283396422237 weighted_root_mean_square_error: 0.010005283396422237 dataset_scale: 1
Visualize the Result
The resulting data can be visualized the same way as the dataset. To judge the quality of the fit, you should look at first left and right singular vectors of the residual.
[19]:
residual_left = result_dataset.residual_left_singular_vectors.sel(left_singular_value_index=0)
residual_right = result_dataset.residual_right_singular_vectors.sel(right_singular_value_index=0)
residual_left.plot.line(x="time", aspect=2, size=5)
residual_right.plot.line(x="spectral", aspect=2, size=5);
Finally, you can save your result.
[20]:
save_dataset(result_dataset, "dataset1.nc")