Icinga
Host: las126.las.kit.edu, las100, las101, +Opt-In
OS: Fedora, CentOS
Software name:
Icinga2 or other monitoring software
Software installation instruction if not in repos:
- Temperatures
- HDD live and
- Load
- Network connectivity are very simple to install as far as I know.
Status of our services
- DHCPd
More difficult/not implemented yet, but basic features might be detectable with other modules:
- IPA functionality
Probably there are already roles in the ansible-Galaxy.
Possibly also interesting for:
Clients as Opt-In, because it causes privacy issues (admins can see for how long the computer was turned on and how long a user was logged in, to name just a few)
User stories (kind of):
Clients:
- The user starts a job on his computer and he cannot log-in at the next morning. Is the computer gone for good? Is it just still to busy to take care of things like the log-in-manager? Are the hard-drives gone, because of the room heated up? -> Get hints of the cause of the problem.
- The user cannot log-in. Maybe IPA the network is down and therefore she cannot log-in, maybe IPA is down, maybe she just typed a wrong password.
Server:
- IPA went down and nobody notices it, because sssd caches it and no log-in errors occurred until half a year later. Then one can find out, since when IPA was not working and if a update might have triggered it. Or one can prevent it in the first place, by regularly monitoring the monitoring software.
- DHCPd went down and nobody notices it, because the workstations work with fixed IPs
- Docker GitLab-runner do not work and jobs have to fail to recognize it. Maybe an system update caused this and not a reboot without autostart.
- sharelatex is down and one gets a mail/call from CN, because they want to collaborate on a paper that needs to be submitted the next day.
/cc @project-manager