Page 1 of 2 12 LastLast
Results 1 to 10 of 14

Thread: Snapshot a running physical host

  1. #1
    Join Date
    May 2012
    Location
    Queensland, Australia
    Beans
    51
    Distro
    Ubuntu 12.04 Precise Pangolin

    Snapshot a running physical host

    Late last year I began a new job. I am responsible for about 80 physical servers. There are 3 roles amongst these, the primary role, a storage role, and 2 hosts are CnC servers for all 78 hosts that spread over 3 locations.

    The primary and storage roles both have an install image available for building/rebuilding them, even though all software (bar some custom scripts for administration) is opensource and available to the world. The CnC role is performed by a custom internal (closed source) set of applications, for which there are no .deb packages, source code, or system-install-image.

    If one goes down, I have no way to rebuild the host. So what DR options are available? What is available to snapshot the system to allow a makeshift recovery?

    I know a snapshot is less than ideal, but in this environment I don't have a great deal of options.

    Also note that the servers all run hardware RAID1 arrays. One of my (terrible) ideas is, with some planned downtime, remove a drive from the array, sync the disks, then return the original disk to the array and store the freshly synced disk as a kind of snapshot.

    Thanks guys.
    I dream of a world where our lives can remain private, and our technology can remain open to all.

  2. #2
    Join Date
    Mar 2010
    Location
    Metro-ATL
    Beans
    Hidden!
    Distro
    Lubuntu 12.04 Precise Pangolin

    Re: Snapshot a running physical host

    "Snapshot" usually means a specific LVM command. It can be taken on running systems.
    If you are not using LVM, then a backup is the answer. There are many backup tools. Running backups is usually not an issue unless files are open and being written - this usually only happens with databases. That means you backup everything using normal backup methods, then take extra steps to backup a consistent DB. There are lots of ways to ensure a consistent DB backup.
    * shutdown the DB while the backup is going on
    * dump the db to CSV or some other format that can be backed up easily
    * use a DB backup tool specific to the DB being run.

    Simple. Of course, things can be as simple or complex as you choose. For example, I do not backup entire OSes here. I backup
    * data
    * configurations and
    * lists of packages installed
    * any extra programs installed outside the package manager.

    I know many places will not load programs unless a package has been created - PERIOD. If it doesn't already exist, they make the package themselves. From what I understand, it isn't too difficult to do this. I've used a tool called epkg myself, but since switching to APT, haven't needed that.

    RAID1 is great for HA, but doesn't help at all for backups. Backups are the most important thing that any admin can accomplish. The only thing more important is the restore.

    This might be an opportunity to swap out custom C&C for something like Rex, Puppet, Chef, Salt, or one of these: https://en.wikipedia.org/wiki/Compar...ement_software

    If you really want DR, everything should be 500+ miles apart and easily able to switch over with a few DNS changes. Depending on the RTO (how quickly the service needs to be available) and RPO (how old the data can be), there are hugely different requirements. https://en.wikipedia.org/wiki/Disaster_recovery_plan is a good introduction. Everyone will say 5 minutes for both RTO and RPO, until you show management the budget necessary. Then magically, 24 hr RPO is fine. Finding the correct solution for the criticality of the data is a merged responsibility for admin, process owner and the guy paying. Where I've worked, we always asked for per-hour-of-outage costs. When someone sees a system is estimated at $8M/hr, funding a $10M DR solution becomes easier. If the budget doesn't let a solution be deployed that meets RTO/RPO requirements, that is a good hint to pass the buck to a corporate officer and let him/her accept the risks - IN WRITING. Don't let them claim "nobody told me." Delivering bad news is part of being a professional.

    BTW, using virtualization can really help with disaster recovery - virtual hardware is much easier to match than physical hardware.

    Feel free to ask more questions, however, more specifics about your setup - amount of data, which DB is used, any allowed downtime, current backup methods ... etc ... would be very helpful.

  3. #3
    Join Date
    Nov 2006
    Location
    Belgium
    Beans
    3,007
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: Snapshot a running physical host

    Yep. what TheFU says.

    On the short term, a decent backup and restore procedure with a reasonable short RPO and RTO will buy you some peace of mind.
    Virtualization is definitely something to consider, if only because it will allow you to backup the entire system as is (some applications / databases may still require some special care)


    On the long therm, what you describe is a death trap for the company. (presumably mission critical) software with no fallback whatsoever ? You don't need anything even close to a disaster to put you out of business - a (ordinarlily) minor human error or an (ordinarily) insignificant hardware problem will be enough. Unless you can devise a solid backup scheme and recovery plan (and get management to agree on RTO and RPO), moving away from that death trap application is the only good solution.

  4. #4
    Join Date
    May 2012
    Location
    Queensland, Australia
    Beans
    51
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: Snapshot a running physical host

    I certainly think that more information is in order. The majority of servers receive phone calls via isdn trunks, pass the calls to a commercial pabx, and record the calls (asterisk) taking place. The raw audio files are then copied to servers with larger arrays, from where management can browse the calls with a custom applicaiton.

    While I call them CnC hosts, the two hosts I'm concerned about are responsible simply for ensuring connectivity between and health of all the other hosts and for assigning file names to recorded files that follow a particular naming scheme. They do this via custom software. Drives for these systems are 70Gb. The two are in two different locations in the same city, and are load balancing when it comes to its file naming role. So I am able to bring about downtime for a couple of hours if I make my case well.

    The worst thing is that these systems are very old. 2009 was when the company stopped actively developing their custom applications, and developers either left or moved up. Now, my company have the sysadmin contract, to the major company that these systems are mission critical to. From what I can gather, when development internally ended, all backup and DR ended. My company then got the contract and have been reluctant to change anything as there is a huge resistance to change at our client.

    Both my company and our client seem to want to retain the separation of jurisdiction that using physical hosts in separate racks, and with aging hardware and outdated software, there is a project to replace the entire system with a commercial one this year.

    The age of the OS (2009) makes me extremely nervous about ever adding a package, or even asking whether that is allowable from our client. The custom applications seem to have been untarred from an archive rather than installed via any packaging tools, and with additional scripts all over the shop I'd lean more to shutting down and making a backup of the entire harddrive. No LVM installed.
    I dream of a world where our lives can remain private, and our technology can remain open to all.

  5. #5
    Join Date
    Feb 2007
    Location
    West Hills CA
    Beans
    7,939
    Distro
    Ubuntu 12.10 Quantal Quetzal

    Re: Snapshot a running physical host

    +1 for changing out the custom Command and Control for an open source configuration management (CFM). You can install a development environment using http://devstack.org. When the proprietary C&C dies, you will have a working CFM system in place that will act as the "backup generator" to keep the other servers going. Over time, the "backup generator" becomes the primary system. It's all curtains and smoke. You are just changing the the drapes and adding a little oil to the fire.
    -------------------------------------
    Oooh Shiny: PopularPages

    Unumquodque potest reparantur. Patientia sit virtus.

  6. #6
    Join Date
    Mar 2010
    Location
    Metro-ATL
    Beans
    Hidden!
    Distro
    Lubuntu 12.04 Precise Pangolin

    Re: Snapshot a running physical host

    Quote Originally Posted by RoosterHam View Post
    Both my company and our client seem to want to retain the separation of jurisdiction that using physical hosts in separate racks, and with aging hardware and outdated software, there is a project to replace the entire system with a commercial one this year.

    The age of the OS (2009) makes me extremely nervous about ever adding a package, or even asking whether that is allowable from our client. The custom applications seem to have been untarred from an archive rather than installed via any packaging tools, and with additional scripts all over the shop I'd lean more to shutting down and making a backup of the entire harddrive. No LVM installed.
    Given the above information, I think you should just tell your management of the risks and let them decide whether any work should be performed. BTW, there is nothing wrong with installing software via tar.gz ... provided the skills to manage it exist inside the company. Without those skills, it is only backups that will remain to recover the running processes. It is just another risk.

    Often in situations such as this - especially with older hardware that is not in-warranty, the best answer is to start with fresh, current, servers, virtualize everything, put in a SAN, and deploy using current best practices. Testing of the new setup can be accomplished disconnected from the old stuff while it is still in production. I've replaced 20+ old, unwarranteed servers with 2 new boxes running many VMs each at a client. The power and UPS costs dropped enough to pay for the new machines after a few years. Less heat, less cooling - we sold the 20K-VA APC too and put that money towards the SAN and a smaller, right-sized UPS. That happened 5 yrs ago and the client is ready for a server refresh now. Migrating VMs is trivial. Having a 100% backup of VMs is also trivial. Newer servers are 2x more capable and they are considering using enterprise-class SSDs for the high-disk IO VMs. Clearly, there are VM best-practices that need to be known, learned, and followed too.

    You can build a picture that management will love. Explain how all the current risks are addressed in the new architecture. Build out 2 locations 500 miles apart so that DR is built-in should something bad happen in 1 location. Being able to swap primary locations weekly means that both sides never become 2nd class locations - DR will actually work AND be tested routinely. DR plans that get tested for 72 hours every year usually fail. DR plans that are exercised weekly usually work perfectly when the time comes. It also means that you'll have the vital "how to fail back" problem solved too.

    Asterisk doesn't require much processing usually, so hosting lots of PBXes inside VMs works well, provided enough real-time access is possible. The local Asterisks Users Group can probably help you a bunch there. The local one here is sponsored by C/Beyond - a huge VoIP provider in the region.

    For small offices that just want to run a small PBX, a $200 dual-core Atom box, completely self contained, is hard to beat. 20-50 people can easily be handed on that.

  7. #7
    Join Date
    Aug 2008
    Location
    Victoria, BC Canada
    Beans
    1,594

    Re: Snapshot a running physical host

    With 80 servers I would look for a better management package. I have a lot of VMs for Linux as I have been testing Ubuntu rather extensively of late after identifying some problems.

    For example a 64 TB worth of hard disk, it fails to format it after partitioning
    SERVER: Azure datacenters, Hyper-V

  8. #8
    Join Date
    Jul 2010
    Location
    Michigan, USA
    Beans
    2,078
    Distro
    Ubuntu 14.04 Trusty Tahr

    Re: Snapshot a running physical host

    Quote Originally Posted by Vegan View Post
    For example a 64 TB worth of hard disk, it fails to format it after partitioning
    Using what filesystem?

  9. #9
    Join Date
    Jun 2011
    Beans
    330

    Re: Snapshot a running physical host

    I think there should be two approaches. The first is to get snapshots of both servers. Even if it's just a quick-n-dirty file system snapshot which could be provided by a tool like Clonezilla. That way you should be able to restore the OS if something happens. Basically, as others have said, make backups. The second step should be to get off that existing system into something that would be easier to replace and/or maintain.

  10. #10
    Join Date
    Aug 2008
    Location
    Victoria, BC Canada
    Beans
    1,594

    Re: Snapshot a running physical host

    After some more testing it seems that Ubuntu does not like Hyper-V version 3 at the moment. Guess I will have to wait for the next DVD to be released and try again.

    It keeps freezing at formatting / at 33% on the setup screens
    SERVER: Azure datacenters, Hyper-V

Page 1 of 2 12 LastLast

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •