|
|
# Legacy documentation (copied from TFSoftware wiki) |
|
|
\ No newline at end of file |
|
|
# Legacy documentation (copied from TFSoftware wiki)
|
|
|
(XXX macro: "PageOutline")
|
|
|
|
|
|
Navigation: [[Mainpage], [wiki:trunk], [wiki:branches], [wiki:tags], [wiki:vendor], [wiki:export], [wiki:admin], [wiki:docs|WikiStart]]
|
|
|
|
|
|
`csback` is a software suite written in **Python** http://www.python.org/ to manage backups and test their integrity from time to time.
|
|
|
|
|
|
## Open issues
|
|
|
[[TicketQuery(status!=closed&status=accepted&component=csback,order=id,desc=1,format=table,col=owner|summary|priority|reporter|component)]]
|
|
|
|
|
|
## Aims
|
|
|
* Use already existing tools and programs which are already well tested to avoid writing the same application just in another software language a second time.
|
|
|
* Follow the Unix philosophy and write for each task one tool.
|
|
|
* Software should work both under Python2 and Python3 to avoid compatibility problems. Successfully tested both under Python 2.7 and Python 3.x.
|
|
|
|
|
|
|
|
|
## Notes
|
|
|
Notes, ideas and thoughts due to the implementation of the `csback` software suite.
|
|
|
* Overview:
|
|
|
1. `csback` uses a central node to generate or rather update checksumfiles. This node must have access (up to now via NFS) to all data which should be backed up. An advantage of this approach is to avoid great traffic and intensive calculation processes on low resource systems.
|
|
|
2. In an exemplary backup sequence `csback` applies the following process sequence:
|
|
|
a. Copy procedures are executed using `rsync`.
|
|
|
b. Generate checksumfiles with checksums of files of the `rsync` source directory. The checksumfile will be generated within the target directory.
|
|
|
c. To check the files' integrity apply `csbackchk` which calculates checksums of the files in the target directory and compares it's results with the content of the checksumfile. The results are written to so called `checksumfile.result` files.
|
|
|
d. At least `csbackgen.py` and `csbackchk.py` provide a logging mechanism which sends log messages to the local systemlogger on tcp port 3333.
|
|
|
3. Additionally to the backup procedure mentioned above `csback` provides the possibilities to:
|
|
|
a. only execute a copy procedure using `rsnyc`.
|
|
|
b. only generate/update a checksumfile with `csbackgen.py`.
|
|
|
c. only test the files' integrity using `csbackchk.py`.
|
|
|
d. generate/update a checksumfile and test the files' integrity afterwards.
|
|
|
e. check the logfile from time to time and report the current status via email or by using its tiny nagios interface.
|
|
|
4. `csback` is able to exclude files and directories from copy, generation and test procedures using:
|
|
|
a. matching regular expressions.
|
|
|
b. matching access, change and modify times using the same syntax as the unix `find` utility (only for generation procedures).
|
|
|
5. Checksumfiles only will be read by `csbackgen.py` or `csbackchk.py` or in case the directory contains new files additional lines will be appended to the checksumfile.
|
|
|
6. Each directory's subdirectory will contain its own checksumfile. So moving a subdirectory to another location and check the files integrity afterwards is no problem.
|
|
|
7. `csback` comes along with a tool called `csback2cron.py` to generate a crontab using the `csback` configuration file `csbackrc`.
|
|
|
* [source:tfsoftware/trunk/src/python/csback/csback2cron.py]\\
|
|
|
Tool to convert a csback configuration file to a crontab file which must be installed using the `crontab` command.
|
|
|
1. The syntax is similar to Windows `*.ini` files.
|
|
|
2. 4 section types are available: `copy`, `backup`, `test` and `mail`. For `copy`, `backup` and `test` sections the keys must be declared.
|
|
|
a. `copy`:
|
|
|
Specify a copy procedure using `rsync`.
|
|
|
b. `backup`:
|
|
|
Provides full backup procedure. First copy the files using `rsync` (optional) then generate checksumfiles with `csbackgen.py` and last but not least check the integrity of the copied files using `csbackchk.py` (optional).
|
|
|
c. `test`:
|
|
|
Only perform a integrity check of files as defined within a `test` section.
|
|
|
d. `mail`:
|
|
|
Specify the mail settings for `csbackntfy.py`.
|
|
|
* [source:tfsoftware/trunk/src/python/csback/csbackgen.py]
|
|
|
1. Generate or rather update (if already existing) a checksumfile. Only append to checksumfiles so already existing lines never are touched again.
|
|
|
2. If a specified directory contains subdirectories each subdirectory will contain its own checksumfile.
|
|
|
3. Up to now hash functions of the SHA-2 family (SHA-224, SHA-256, SHA-384, SHA-512) are allowed.
|
|
|
4. Generating a checksumfile with arbitrary source files (located in a different source) is provided.\\
|
|
|
If logging is enabled log messages will be sent to the systemlogger of the operating system on tcp port 3333.
|
|
|
* [source:tfsoftware/trunk/src/python/csback/csbackchk.py]
|
|
|
Check the integrity of a file that is registered in a `csback` checksumfile.\\
|
|
|
This program will take either one or two arguments. In case of passing two arguments `csbackchk` will take the checksumfile of the directory passed as the second argument (`PATH`) and compare its checksums with the checksums of the files located in the directory passed as first argument (`SOURCEPATH`). In case only one argument had been passed `csbackchk.py` will perform a integrity check of the files located in `PATH`. (In both cases a checksumfile **must** be located in `PATH`.)\\
|
|
|
If logging is enabled log messages will be sent to the systemlogger of the operating system on tcp port 3333.
|
|
|
* [source:tfsoftware/trunk/src/python/csback/csbackntfy.py]\\
|
|
|
Perform checks of the `csback` logfile and report the current status to the admin via email or using its tiny nagios interface.
|
|
|
* [source:tfsoftware/trunk/src/python/csback/csbackobs.py]\\
|
|
|
Small observer script which reports the exit status of external programs to the `csback` logfile in case of failure. Up to now only in use for `rsync`.
|
|
|
* [source:tfsoftware/trunk/src/python/csback/csfile.py]\\
|
|
|
Module which provides a class to read, write and treat with a checksum file.
|
|
|
Tools which will make use of this module are:
|
|
|
1. `csbackgen.py`
|
|
|
2. `csbackchk.py`
|
|
|
Additionally this module contains the functionality to exclude files regarding there access/change or modification times.
|
|
|
* [source:tfsoftware/trunk/src/python/csback/csbacklog.py]\\
|
|
|
Module that handles the `csback` logging mechanisms. The module provides as well a `[[ResultLogger]]` class which is in use of `csbackchk.py` to generate `checksumfile.result` files.
|
|
|
|
|
|
## Checksum file format
|
|
|
|| checksum || path || sha-2 algorithm || creation date file || creation location checksum || creation date of checksum ||
|
|
|
|| 65135b861519486f3fc171b8efcf7078d123b0d8b3cdcb305227facf8eae7659 || data/datafile || sha256 || 2007/15/09-00:22:10 || pinatubo || 2007/19/09-00:21:17 ||
|
|
|
|
|
|
With that a typical checksum file line will look as follows:
|
|
|
|
|
|
65135b861519486f3fc171b8efcf7078d123b0d8b3cdcb305227facf8eae7659 data/datafile sha256 2007/15/09-00:22:10 pinatubo 2007/19/09-00:21:17
|
|
|
|
|
|
* checksum: Checksum of the file created with functions of the python hashlib module: http://docs.python.org/library/hashlib.html .\\
|
|
|
Note: Allow only hashfunctions of the SHA-2 family (http://en.wikipedia.org/wiki/SHA-2) because of the security flaws for SHA-1 hash functions in 2005.
|
|
|
* path: Path of the data file to be checked.
|
|
|
* sha-2 algorithm: String to identify the hash function digest. Allowed are **sha224, sha256, sha384, sha512**.
|
|
|
* creation date of file: Creation date of data file. Date format is: **yyyy/MM/dd-hh:mm:ss**.
|
|
|
* creation location checksum: Hostname of the computer which calculated the checksum.
|
|
|
* creation date of checksum: Date where checksum for data file had been calculated. Date format is: **yyyy/MM/dd-hh:mm:ss**.
|
|
|
|
|
|
See as well ticket:158. The handling of saving the checking results changed.
|
|
|
|
|
|
|
|
|
## Todo
|
|
|
* [source:tfsoftware/trunk/src/python/csback/csback2cron.py]
|
|
|
1. ~~Implement a private function to test the cron timestamp in the Processor class.~~ No checks of the timestamps are performed. This task is delegated to the `cron deamon`.
|
|
|
2. ~~Implement the possibility to change the frequency which uses `csbackntfy` to report the current status.~~ Scheduling now works with a usual cron timestamp.
|
|
|
3. ~~Allow the user to check the files of a directory with an arbitrary source directory.~~
|
|
|
* [source:tfsoftware/trunk/src/python/csback/csbackgen.py]
|
|
|
1. ~~Implement the possibility to generate a checksumfile in a target with files of any source directory.~~
|
|
|
2. ~~Use here the logging module of python to report the current exit status. A great feature. It's convenient for debugging and verbosity, too.~~
|
|
|
3. ~~Use the PID lockfilehandler.~~ See ticket:153
|
|
|
4. ~~Ask Thomas which manner of path handling he would prefer. Each subdirectory contains its own checksumfile.~~ See ticket:158.
|
|
|
5. ~~File exclusion equal to unix `find` command.~~
|
|
|
6. Implement ssh-client feature. See ticket:154
|
|
|
7. Make use of *par2* hash algorithm. See ticket:165
|
|
|
* [source:tfsoftware/trunk/src/python/csback/csbackchk.py]
|
|
|
1. ~~Use here the logging module of python to report the current exit status.~~
|
|
|
Exit status will be reported in so called `checksumfile.result` files. A exemplary line will look as follows:
|
|
|
|
|
|
2012-01-29 21:26:23,683 sirius[2520] INFO Check of file 'tmp/data.sff' was successful.
|
|
|
|
|
|
|
|
|
2. ~~Use the PID lockfilehandler.~~ See ticket:153
|
|
|
3. Implement ssh-client feature. See ticket:154
|
|
|
* [source:tfsoftware/trunk/src/python/csback/csbackntfy.py]
|
|
|
1. ~~Parse csback logfile and report the status either via mail or using the nagios interface.~~
|
|
|
2. It would be nice to have options for a local SMTP server.
|
|
|
|
|
|
## Links
|
|
|
Links for cryptographic hash functions:
|
|
|
* Considerations of cryptographic hash algorithms: http://en.wikipedia.org/wiki/Cryptographic_hash_function#Cryptographic_hash_algorithms
|
|
|
* SHA-2 hash functions: http://en.wikipedia.org/wiki/SHA-2
|
|
|
|
|
|
Python specific:
|
|
|
* **Style Guide for Python Code:** http://www.python.org/dev/peps/pep-0008/
|
|
|
* Regular expression module: http://docs.python.org/library/re.html
|
|
|
* Basic tutorial for logging module: http://docs.python.org/howto/logging.html#logging-basic-tutorial
|
|
|
* Advanced tutorial for logging module: http://docs.python.org/howto/logging.html#logging-advanced-tutorial
|
|
|
(XXX macro: "Top")
|
|
|
## Hints
|
|
|
### Bookkeeping
|
|
|
Possibly csback will require some bookkeeping facility.
|
|
|
This is a place in a file, where the program keeps a note, which files are alreadyprocessed, which date it was last executed,...
|
|
|
I (TF) just like to point out, that the [[ThiesDL1 reading facility|trunk/src/conv/ThiesDL1]]
|
|
|
has a kind of bookkeeping device in the class Memory.
|
|
|
The source code can be found in [[and [source:trunk/src/conv/ThiesDL1/memory.cc|source:trunk/src/conv/ThiesDL1/memory.h]]].
|
|
|
It is used in [source:trunk/src/conv/ThiesDL1/DL1logger.cc@3084:349-370#L346].
|
|
|
|
|
|
(XXX macro: "Top") |