Legacy documentation (copied from TFSoftware wiki)
csback is a software suite written in Python http://www.python.org/ to manage backups and test their integrity from time to time.
- Use already existing tools and programs which are already well tested to avoid writing the same application just in another software language a second time.
- Follow the Unix philosophy and write for each task one tool.
- Software should work both under Python2 and Python3 to avoid compatibility problems. Successfully tested both under Python 2.7 and Python 3.x.
Notes, ideas and thoughts regarding the implementation of the
csbackuses a central node to generate or rather update checksumfiles. This node must have access (up to now via NFS) to all data which should be backed up. An advantage of this approach is to avoid great traffic and intensive calculation processes on low resource systems.
- In an exemplary backup sequence
csbackapplies the following process sequence: a. Copy procedures are executed using
rsync. b. Generate checksumfiles with checksums of files of the
rsyncsource directory. The checksumfile will be generated within the target directory. c. To check the files' integrity apply
csbackchkwhich calculates checksums of the files in the target directory and compares it's results with the content of the checksumfile. The results are written to so called
checksumfile.resultfiles. d. At least
csbackchk.pyprovide a logging mechanism which sends log messages to the local systemlogger on tcp port 3333.
- Additionally to the backup procedure mentioned above
csbackprovides the possibilities to: a. only execute a copy procedure using
rsnyc. b. only generate/update a checksumfile with
csbackgen.py. c. only test the files' integrity using
csbackchk.py. d. generate/update a checksumfile and test the files' integrity afterwards. e. check the logfile from time to time and report the current status via email or by using its tiny nagios interface.
csbackis able to exclude files and directories from copy, generation and test procedures using: a. matching regular expressions. b. matching access, change and modify times using the same syntax as the unix
findutility (only for generation procedures).
- Checksumfiles only will be read by
csbackchk.pyor in case the directory contains new files additional lines will be appended to the checksumfile.
- Each directory's subdirectory will contain its own checksumfile. So moving a subdirectory to another location and check the files integrity afterwards is no problem.
csbackcomes along with a tool called
csback2cron.pyto generate a crontab using the
Tool to convert a csback configuration file to a crontab file which must be installed using the
- The syntax is similar to Windows
- 4 section types are available:
testsections the keys must be declared. a.
copy: Specify a copy procedure using
backup: Provides full backup procedure. First copy the files using
rsync(optional) then generate checksumfiles with
csbackgen.pyand last but not least check the integrity of the copied files using
test: Only perform a integrity check of files as defined within a
- The syntax is similar to Windows
- Generate or rather update (if already existing) a checksumfile. Only append to checksumfiles so already existing lines never are touched again.
- If a specified directory contains subdirectories each subdirectory will contain its own checksumfile.
- Up to now hash functions of the SHA-2 family (SHA-224, SHA-256, SHA-384, SHA-512) are allowed.
- Generating a checksumfile with arbitrary source files (located in a different source) is provided.\ If logging is enabled log messages will be sent to the systemlogger of the operating system on tcp port 3333.
Check the integrity of a file that is registered in a
csbackchecksumfile.\ This program will take either one or two arguments. In case of passing two arguments
csbackchkwill take the checksumfile of the directory passed as the second argument (
PATH) and compare its checksums with the checksums of the files located in the directory passed as first argument (
SOURCEPATH). In case only one argument had been passed
csbackchk.pywill perform a integrity check of the files located in
PATH. (In both cases a checksumfile must be located in
PATH.)\ If logging is enabled log messages will be sent to the systemlogger of the operating system on tcp port 3333.
Perform checks of the
csbacklogfile and report the current status to the admin via email or using its tiny nagios interface.
Small observer script which reports the exit status of external programs to the
csbacklogfile in case of failure. Up to now only in use for
Module which provides a class to read, write and treat with a checksum file.
Tools which will make use of this module are:
csbackchk.pyAdditionally this module contains the functionality to exclude files regarding there access/change or modification times.
Module that handles the
csbacklogging mechanisms. The module provides as well a
[[ResultLogger]]class which is in use of
Checksum file format
|checksum||path||sha-2 algorithm||creation date file||creation location checksum||creation date of checksum|
With that a typical checksum file line will look as follows:
65135b861519486f3fc171b8efcf7078d123b0d8b3cdcb305227facf8eae7659 data/datafile sha256 2007/15/09-00:22:10 pinatubo 2007/19/09-00:21:17
- checksum: Checksum of the file created with functions of the python hashlib module: http://docs.python.org/library/hashlib.html .\ Note: Allow only hashfunctions of the SHA-2 family (http://en.wikipedia.org/wiki/SHA-2) because of the security flaws for SHA-1 hash functions in 2005.
- path: Path of the data file to be checked.
- sha-2 algorithm: String to identify the hash function digest. Allowed are sha224, sha256, sha384, sha512.
- creation date of file: Creation date of data file. Date format is: yyyy/MM/dd-hh:mm:ss.
- creation location checksum: Hostname of the computer which calculated the checksum.
- creation date of checksum: Date where checksum for data file had been calculated. Date format is: yyyy/MM/dd-hh:mm:ss.
See as well ticket:158. The handling of saving the checking results changed.
Implement a private function to test the cron timestamp in the Processor class.No checks of the timestamps are performed. This task is delegated to the
Implement the possibility to change the frequency which usesScheduling now works with a usual cron timestamp.
csbackntfyto report the current status.
Allow the user to check the files of a directory with an arbitrary source directory.
Implement the possibility to generate a checksumfile in a target with files of any source directory. Use here the logging module of python to report the current exit status. A great feature. It's convenient for debugging and verbosity, too.
Use the PID lockfilehandler.See ticket:153
Ask Thomas which manner of path handling he would prefer. Each subdirectory contains its own checksumfile.See ticket:158. File exclusion equal to unix
- Implement ssh-client feature. See ticket:154
- Make use of par2 hash algorithm. See ticket:165
Use here the logging module of python to report the current exit status.Exit status will be reported in so called
checksumfile.resultfiles. A exemplary line will look as follows:
2012-01-29 21:26:23,683 sirius INFO Check of file 'tmp/data.sff' was successful.
Use the PID lockfilehandler.See ticket:153
Implement ssh-client feature. See ticket:154
Parse csback logfile and report the status either via mail or using the nagios interface.
- It would be nice to have options for a local SMTP server.
Links for cryptographic hash functions:
- Considerations of cryptographic hash algorithms: http://en.wikipedia.org/wiki/Cryptographic_hash_function#Cryptographic_hash_algorithms
- SHA-2 hash functions: http://en.wikipedia.org/wiki/SHA-2
- Style Guide for Python Code: http://www.python.org/dev/peps/pep-0008/
- Regular expression module: http://docs.python.org/library/re.html
- Basic tutorial for logging module: http://docs.python.org/howto/logging.html#logging-basic-tutorial
- Advanced tutorial for logging module: http://docs.python.org/howto/logging.html#logging-advanced-tutorial