Late last year I began a new job. I am responsible for about 80 physical servers. There are 3 roles amongst these, the primary role, a storage role, and 2 hosts are CnC servers for all 78 hosts that spread over 3 locations.
The primary and storage roles both have an install image available for building/rebuilding them, even though all software (bar some custom scripts for administration) is opensource and available to the world. The CnC role is performed by a custom internal (closed source) set of applications, for which there are no .deb packages, source code, or system-install-image.
If one goes down, I have no way to rebuild the host. So what DR options are available? What is available to snapshot the system to allow a makeshift recovery?
I know a snapshot is less than ideal, but in this environment I don't have a great deal of options.
Also note that the servers all run hardware RAID1 arrays. One of my (terrible) ideas is, with some planned downtime, remove a drive from the array, sync the disks, then return the original disk to the array and store the freshly synced disk as a kind of snapshot.