We are managing about 85 Ubuntu servers world wide. All these servers have some cron jobs to simplify our own work.
At this moment every cron job sends out the result via e-mail to a single mailbox at our HQ. This allows us to check if the cron job did run successful or not.
Drawback of the current setup
The setup was invented when we had a server or 10, so it was manageable.
Today 85 servers spit out 400 e-mails a day that need to be checked manually.
To make things worse, if a cron job doesn't run at all, the checker might not notice it due to the vast amount of e-mails.
Setup we are looking for
I'm not keen on changing all server cron jobs, but if it has to be done, then I'm fine with that.
But I rather seeks for a system that can parse the central mailbox and verify if the e-mails are indeed received in time and with correct data.
So I'm looking for a framework that:
- Allows us to define the tasks that need to run, and to which the framework matches the received e-mails to see if the task is indeed running.
- Allows to bump an alert if a task has failed or didn't run at all.
- Allows us to easily create new tasks based on either a received e-mail or a crontab read out from a remote server.
I like to ask you for idea's about our solution.
Perhaps we are looking for a solution that will never work, and perhaps we are still running our servers to much as home systems.
So share you thoughts, opinions, idea's to simplify our life as system engineer even more.