Commit aacf5f7a authored by BorjaEst's avatar BorjaEst
Browse files

Merge branch '24-generate-documentation' into dev#...

Merge branch '24-generate-documentation' into dev# Conflicts:#	o3skim/sources.py#	o3skim/utils.py#	sources_example.yaml#	tests/mockup_data.py#	tests/sources_base.yaml#	tests/sources_err.yaml
parents 4a4ce680 c0cb1972
......@@ -22,20 +22,15 @@
# 📝 Table of Contents
- [About](#about)
- [Getting Started](#getting_started)
- [Deployment](#deployment)
- [Built Using](#built_using)
- [Installing](#Installing)
- [Build using docker](#build)
- [Run using udocker](#deployment)
- [Documentation](#doc)
- [Authors](#authors)
- [Acknowledgments](#acknowledgement)
- [TODO](https://git.scc.kit.edu/synergy.o3as/o3skim/-/issues)
# About <a name = "about"></a>
This project provides the tools to preprocess, standarise and reduce ozone data for later transfer and plot.
# Getting Started <a name = "getting_started"></a>
See [deployment](#deployment) for notes on how to deploy the project on a live system.
This project provides the tools to preprocess, standardize and reduce ozone data for later transfer and plot.
## Prerequisites
To run the project as container, you need the following systems and container technologies:
......@@ -45,7 +40,7 @@ To run the project as container, you need the following systems and container te
> Note udocker cannot be used to build containers, only to run them.
## Built using docker <a name = "built_using"></a>
# Built using docker <a name = "build"></a>
Download the repository at the __Build machine__ using git.
```sh
$ git clone git@git.scc.kit.edu:synergy.o3as/o3skim.git
......@@ -59,7 +54,7 @@ $ docker build --tag o3skim .
Successfully built 69587025a70a
Successfully tagged o3skim:latest
```
If the build process succeded, you can list the image on the docker image list:
If the build process succeeded, you can list the image on the docker image list:
```sh
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
......@@ -67,42 +62,14 @@ o3skim latest 69587025a70a xx se
...
```
## Running the tests <a name = "tests"></a>
To run tests, you need to install the tool in your system without docker.
As first step ensure you have the following dependencies:
- [python 3.8](https://www.python.org/downloads/release/python-385/)
- [pip 20.0.2](https://pypi.org/)
- [gcc](https://gcc.gnu.org/)
- [g++]()
After download and dependencies check, install with pip:
```sh
$ pip install -e .
```
Tests should run using
[tox](https://tox.readthedocs.io/en/latest/).
To install it with pip use:
```sh
$ pip install tox
```
To start testing simply run:
```sh
$ tox
...
py37: commands succeeded
py38: commands succeeded
```
# Deployment <a name = "deployment"></a>
# Run using udocker <a name = "deployment"></a>
To deploy the the application using __udocker__ at the __Runtime machine__ you need:
- Input path with data to skim, to be mounter on `/app/data` inside the container.
- Output path for skimmed results, to be mounted on `/app/output` inside the container.
- Configuration file with a data structure desctiption at the input path in [YAML](https://yaml.org/) format. This configuration file has to be mounted on `/app/sources.yaml` inside the container. See [sources_example.yaml](/sources_example.yaml) for a configuration example.
- Configuration file with a data structure description at the input path in [YAML](https://yaml.org/) format.
This configuration file has to be mounted on `/app/sources.yaml` inside the container.
Once the requierement are needed, pull the image from the image registry.
Once the requirement are completed, pull the image from the image registry.
For example, to pull it from the synergy-imk official registry use:
```sh
$ udocker pull synergyimk/o3skim
......@@ -140,11 +107,16 @@ $ udocker run --user=application o3skim --help
```
# Documentation <a name = "doc"></a>
- [TODO]()
# Authors <a name = "authors"></a>
- [@V.Kozlov](https://git.scc.kit.edu/eo9869) - TBD
- [@T.Kerzenmacher](https://git.scc.kit.edu/px5501) - TBD
- [@B.Esteban](https://git.scc.kit.edu/zr5094) - TBD
# Acknowledgements <a name = "acknowledgement"></a>
-
### Documentation builds
_build
_static
_templates
Build
===================
Download the repository at the **Build machine** using git.
Download the code from the o3skim_ repository at the **Build machine**.
For example, using git_:
.. code-block:: bash
......@@ -9,8 +11,13 @@ Download the repository at the **Build machine** using git.
Cloning into 'o3skim'...
...
.. _o3skim: https://git.scc.kit.edu/synergy.o3as/o3skim
.. _git: https://git-scm.com/
Build the container image at the **Build machine** using **docker**.
Build the container image at the **Build machine**.
For example, using docker_:
.. code-block:: bash
......@@ -19,8 +26,10 @@ Build the container image at the **Build machine** using **docker**.
Successfully built 69587025a70a
Successfully tagged o3skim:latest
.. _docker: https://docs.docker.com/engine/reference/commandline/build
If the build process succeded, you should see the image name on the container images list:
If the build process succeeded, then you should see the image name on the container images list:
.. code-block:: bash
......@@ -29,3 +38,29 @@ If the build process succeded, you should see the image name on the container im
o3skim latest 69587025a70a xx seconds ago 557MB
...
To use your new generated image on the **Runtime machine**, the easiest way is to
push to a dockerhub repository. For example, with docker_:
.. code-block:: bash
$ docker push <repository>/o3skim:<tag>
The push refers to repository [docker.io/........./o3skim]
...
7e84795fccac: Preparing
7e84795fccac: Layer already exists
ffaeb20d9e23: Layer already exists
4cdd6a90e552: Layer already exists
3e0762bebc71: Layer already exists
1e441fe06d90: Layer already exists
98ff2784e9f5: Layer already exists
2b99e2403063: Layer already exists
d0f104dc0a1f: Layer already exists
...: digest: sha256:...................... size: 2004
If you do not have internet access from the **Build machine** or **Runtime machine**
it is also possible to use `docker save`_ to export your images.
.. _`docker save`: https://docs.docker.com/engine/reference/commandline/save/
Command Line Interface
=======================
Finally, run the container. Note the described `data`, `output` and
`sources.yaml` have to be provided. Also it is needed to specify the
Usage:
.. code-block:: bash
usage: main [-h] [-f SOURCES_FILE] [-s {year,decade}]
[-v {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
To run the the application using **udocker** at the **Runtime machine**
you need to provide the following volumes to the container:
- --volume, mount `/app/data`: Input path with data to skim.
- --volume, mount `/app/output`: Output path for skimmed results.
- --volume, mount `/app/sources.yaml`: Configuration file with a data structure
description at the input path in YAML_ format.
See :doc:`/user_guide/source-file` for a configuration example.
.. _YAML: https://yaml.org/
Also, in the specific case of udocker_, it is needed to specify that the
user `application` should run inside the container:
.. _udocker: https://indigo-dc.gitbook.io/udocker
For example,to run the container using udocker_ use the following:
.. code-block:: bash
$ udocker run \
......@@ -26,6 +49,15 @@ For the main function description and commands help you can call:
.. code-block:: bash
$ udocker run --user=application o3skim --help
...
As optional arguments, it is possible to indicate:
- -h, --help: show this help message and exit
- -f, --sources_file SOURCES_FILE: Custom sources YAML configuration. (default: ./sources.yaml)
- -s, --split_by {year,decade}: Period time to split output (default: None)
- -v, --verbosity {DEBUG,INFO,WARNING,ERROR,CRITICAL}: Sets the logging level (default: ERROR)
Note that SOURCES_FILE is only modified for development purposes as usually any
file from host can be mounted using the container directive '--volume'.
......@@ -2,36 +2,50 @@ Deployment
==================================
To deploy the the application using **udocker** at the **Runtime machine**
you need:
you need the o3skim container image.
- Input path with data to skim, to be mounter on `/app/data` inside the
container.
- Output path for skimmed results, to be mounted on `/app/output` inside
the container.
- Configuration file with a data structure desctiption at the input path
in YAML_ format. This configuration file has to be mounted on
`/app/sources.yaml` inside the container.
See [sources_example.yaml](/sources_example.yaml) for a configuration
example.
The easiest way to deploy in your **Runtime machine** is by pulling the image
from a remote registry. You can use the official registry at synergyimk_ or use
the instructions at :doc:`build` to create your image and uploaded at your own registry.
.. _YAML: https://yaml.org/
.. _synergyimk: https://hub.docker.com/r/synergyim
Once the requierement are needed, pull the image from the image registry.
Once you decide from which registry download, pull the image that image registry.
For example, to pull it from the synergy-imk official registry use:
.. code-block:: bash
$ udocker pull synergyimk/o3skim
...
Downloading layer: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
Downloading layer: sha256:......
...
Note it is also possible to use `udocker load`_ to import images generated by
`docker save`_.
.. _`udocker load`: https://indigo-dc.gitbook.io/udocker/user_manual#1-4-basic-flow
.. _`docker save`: https://docs.docker.com/engine/reference/commandline/save
Once the repository is added and the image downloaded, create the local container:
Once the image is downloaded or imported, create the local container.
For example, if it was downloaded from synergyimk registry you can use:
.. code-block:: bash
$ udocker create --name=o3skim synergyimk/o3skim
fa42a912-b0d4-3bfb-987f-1c243863802d
Check the containers
available at the **Runtime machine**:
.. code-block:: bash
$ udocker ps
CONTAINER ID P M NAMES IMAGE
...
fa42a912-b0d4-3bfb-987f-1c243863802d . W ['o3skim'] synergyimk/o3skim:latest
Now you are ready to start using the container as `o3skim`. Read how to use the
:doc:`cli` as first steps to skim your data.
......@@ -3,12 +3,14 @@ Prerequisites
To run the project as container, you need the following systems and container technologies:
- **Build machine** with docker_
- **Runtime machine** with udocker_
- **Build machine** with docker_ in case you want/need to build your own image.
- **Runtime machine** with udocker_ and access to the data to skim.
In case you do not want to create your image, last images are uploaded in dockerhub
at synergyimk_.
.. rubric:: Note udocker_ cannot be used to build containers, only to run them.
.. rubric:: Note udocker_ cannot build containers but run them.
.. _docker: https://docs.docker.com/engine/install/
.. _udocker: https://indigo-dc.gitbook.io/udocker/installation_manual
.. _synergyimk: https://hub.docker.com/r/synergyimk
=========================================
o3skim
=========================================
.. rubric:: Data skiming for ozone assessment.
.. rubric:: Data skimming for ozone assessment.
**o3skim** is an open source project and Python package that provides the
tools to preprocess, standarise and reduce ozone data from netCDF_ models to
tools to pre-process, standardize and reduce ozone data from netCDF_ models to
simplify and speed up ozone data transfer and plot.
.. _netCDF: http://www.unidata.ucar.edu/software/netcdf
......@@ -12,7 +12,7 @@ simplify and speed up ozone data transfer and plot.
Documentation
-------------
**Getting Started**
**Getting started**
* Which :doc:`getting_started/prerequisites` you need to start.
* How to :doc:`getting_started/build` your o3skim container.
......@@ -20,36 +20,39 @@ Documentation
* How to use the o3skim :doc:`getting_started/cli`
.. toctree::
:maxdepth: 1
:maxdepth: 2
:hidden:
:caption: Getting Started
:caption: Getting started
getting_started/prerequisites
getting_started/build
getting_started/deployment
getting_started/cli
**User Guide**
**User guide**
* :doc:`user_guide/source-file`
* :doc:`user_guide/API-reference`
* Create your :doc:`user_guide/source-file` to point what to skim.
* Learn how to skim by reading the :doc:`user_guide/o3skim`
.. toctree::
:maxdepth: 1
:caption: Contents:
:maxdepth: 2
:hidden:
:caption: User guide:
user_guide/source-file
user_guide/API-reference
user_guide/o3skim
**Developer Guide**
**Developer guide**
* :doc:`dev_guide/local-install`
* :doc:`dev_guide/tests`
* How to do a :doc:`dev_guide/local-install`
* How to run and create new :doc:`dev_guide/tests`
.. toctree::
:maxdepth: 1
:caption: Contents:
:maxdepth: 2
:hidden:
:caption: Developer guide:
dev_guide/local-install
dev_guide/tests
......@@ -65,9 +68,10 @@ See also
Authors
-------
- [@V.Kozlov](https://git.scc.kit.edu/eo9869) - TBD
- [@T.Kerzenmacher](https://git.scc.kit.edu/px5501) - TBD
- [@B.Esteban](https://git.scc.kit.edu/zr5094) - TBD
- `@V.Kozlov <https://git.scc.kit.edu/eo9869>`_ - TBD
- `@T.Kerzenmacher <https://git.scc.kit.edu/px5501>`_ - TBD
- `@B.Esteban <https://git.scc.kit.edu/zr5094>`_ - TBD
Acknowledgements
......
API-reference
==================================
\ No newline at end of file
o3skim package
==============
.. automodule:: o3skim
:members:
:undoc-members:
:show-inheritance:
o3skim.sources module
---------------------
.. automodule:: o3skim.sources
:members:
:undoc-members:
:show-inheritance:
o3skim.utils module
-------------------
.. automodule:: o3skim.utils
:members:
:undoc-members:
:show-inheritance:
Source file
==================================
\ No newline at end of file
==================================
This is an example configuration file for sources to be skimmed
Note the following **metadata_1** for keys and values
(always following YAML standards):
- **CUSTOMIZABLE_KEY**: Indicates the key can be any value
- **FIXED_KEY**: The must match the example string
- **CORRECT_VALUE**: The value must be in line with the source data
Note the following **metadata_2** for keys and values
(always following YAML standards):
- **MANDATORY**: The key/value is mandatory to exist inside the section
- **OPTIONAL**: The key/value is optional to exist inside the section
For example: At the 3rd level, the variables are specified. The configuration
specification for variables are "[**FIXED_KEY** -- **OPTIONAL**]"; If the variable
key *tco3_zm* is specified, the application searches for tco3 data
on the source. When it is not specified, variable data are not searched so the
output folder [x-]_[y-] does not contain tco3 dataset files.
First example - CCMI-1
---------------------------
In this example, the data source has only one model, therefore it is
expected to have only one folder output named "CCMI-1_IPSL".
This model has 2 variables (*tco3_zm* and *vmro3_zm*) which datasets are
located in different directories. Therefore the key *path* is the different
in both of them. Therefore, the output expected at "CCMI-1_IPSL" is
2 type of files:
- tco3_zm_[YEAR]-[YEAR].nc: With tco3 skimmed data
- vmro3_zm_[YEAR]-[YEAR].nc: With vmro3 skimmed data
Where [YEAR] are optional text output depending on how the `–split_by`
argument is configured at the :doc:`/getting_started/cli` call.
.. code-block:: yaml
# This is the preceded -x1- string at the output folder: '[x1]_[y-]'
# [CUSTOMIZABLE_KEY -- MANDATORY]
CCMI-1:
# This is the preceded -y1- string at the output folder: '[x1]_[y1]'
# [CUSTOMIZABLE_KEY -- MANDATORY]
IPSL:
# Represents the information related to tco3 data
# [FIXED_KEY -- OPTIONAL]
tco3_zm:
# Variable name for tco3 array inside the dataset
# [FIXED_KEY -- MANDATORY]: [CORRECT_VALUE -- MANDATORY]
name: toz
# Reg expression, how to load the netCDF files
# [FIXED_KEY -- MANDATORY]: [CORRECT_VALUE -- MANDATORY]
paths: Ccmi/mon/toz/*.nc
# Coordinates description for tco3 data.
# [FIXED_KEY -- MANDATORY]:
coordinates:
# [FIXED_KEY -- MANDATORY]: [CORRECT_VALUE -- MANDATORY]
time: time
# [FIXED_KEY -- MANDATORY]: [CORRECT_VALUE -- MANDATORY]
lat: lat
# [FIXED_KEY -- MANDATORY]: [CORRECT_VALUE -- MANDATORY]
lon: lon
# Represents the information related to vmro3 data
# [FIXED_KEY -- OPTIONAL]
vmro3_zm:
# Variable name for vmro3 array inside the dataset
# [FIXED_KEY -- MANDATORY]: [CORRECT_VALUE -- MANDATORY]
name: vmro3
# Reg expression, how to load the netCDF files
# [FIXED_KEY -- MANDATORY]: [CORRECT_VALUE -- MANDATORY]
paths: Ccmi/mon/vmro3
# Coordinates description for vmro3 data.
# [FIXED_KEY -- MANDATORY]:
coordinates:
# [FIXED_KEY -- MANDATORY]: [CORRECT_VALUE -- MANDATORY]
time: time
# [FIXED_KEY -- MANDATORY]: [CORRECT_VALUE -- MANDATORY]
plev: plev
# [FIXED_KEY -- MANDATORY]: [CORRECT_VALUE -- MANDATORY]
lat: lat
# [FIXED_KEY -- MANDATORY]: [CORRECT_VALUE -- MANDATORY]
lon: lon
Second example - ECMWF
-----------------------------------
In this example, the data source has two models, therefore it is
expected to have two folder outputs ["ECMWF_ERA-5", "ECMWF_ERA-i"].
The model ERA-5 has only information tco3 data, there is no vmro3 data.
Therefore, only one type of files is expected at "ECMWF_ERA-5":
- tco3_zm_[YEAR].nc: With tco3 skimmed data
This case of ERA-i indeed has 2 variables (*tco3_zm* and *vmro3_zm*) but in
this case, are located inside the same dataset files, therefore the
key *path* should be the same for both variables. The output expected at
"ECMWF_ERA-5" are 2 type of files:
- tco3_zm_[YEAR].nc: With tco3 skimmed data
- vmro3_zm_[YEAR].nc: With vmro3 skimmed data
.. code-block:: yaml
# This is the preceded -x2- string at the output folder: '[x2]_[y-]'
# [CUSTOMIZABLE_KEY -- MANDATORY]
ECMWF:
# This is the preceded -y1- string at the output folder: '[x2]_[y1]'
# [CUSTOMIZABLE_KEY -- MANDATORY]
ERA-5:
# Represents the information related to tco3 data
# [FIXED_KEY -- OPTIONAL]
tco3_zm:
# Variable name for tco3 array inside the dataset
# [FIXED_KEY -- MANDATORY]: [CORRECT_VALUE -- MANDATORY]
name: tco3
# Reg expression, how to load the netCDF files
# [FIXED_KEY -- MANDATORY]: [CORRECT_VALUE -- MANDATORY]
paths: Ecmwf/Era5
# Coordinates description for tco3 data.
# [FIXED_KEY -- MANDATORY]:
coordinates:
# [FIXED_KEY -- MANDATORY]: [CORRECT_VALUE -- MANDATORY]
lat: latitude
# [FIXED_KEY -- MANDATORY]: [CORRECT_VALUE -- MANDATORY]
lon: longitude
# [FIXED_KEY -- MANDATORY]: [CORRECT_VALUE -- MANDATORY]
time: time
# This is the preceded -y2- string at the output folder: '[x2]_[y2]'
# [CUSTOMIZABLE_KEY -- MANDATORY]
ERA-i:
# Represents the information related to tco3 data
# [FIXED_KEY -- OPTIONAL]
tco3_zm:
# Variable name for tco3 array inside the dataset
# [FIXED_KEY -- MANDATORY]: [CORRECT_VALUE -- MANDATORY]
name: toz
# Reg expression, how to load the netCDF files
# [FIXED_KEY -- MANDATORY]: [CORRECT_VALUE -- MANDATORY]
paths: Ecmwf/Erai
# Coordinates description for tco3 data.
# [FIXED_KEY -- MANDATORY]:
coordinates:
# [FIXED_KEY -- MANDATORY]: [CORRECT_VALUE -- MANDATORY]
time: time
# [FIXED_KEY -- MANDATORY]: [CORRECT_VALUE -- MANDATORY]
lat: latitude
# [FIXED_KEY -- MANDATORY]: [CORRECT_VALUE -- MANDATORY]
lon: longitude
# Represents the information related to vmro3 data
# [FIXED_KEY -- OPTIONAL]
vmro3_zm:
# Variable name for vmro3 array inside the dataset
# [FIXED_KEY -- MANDATORY]: [CORRECT_VALUE -- MANDATORY]
name: vmro3
# Reg expression, how to load the netCDF files
# [FIXED_KEY -- MANDATORY]: [CORRECT_VALUE -- MANDATORY]
paths: Ecmwf/Erai