PDA

View Full Version : Advice for Aspiring Sysops



SuperMike
December 15th, 2007, 07:43 PM
When I say sysop, I'm blending together Desktop Support Analyst, System Analyst, Systems Engineer, LAN Admin, and Network Engineer all into one. The "keeper of the systems".

Here's my tips for aspiring sysops just out of college:

* This is not the career to be in. Go back to school for something else. It's not a joke. As you read this, you'll realize that this career will chew you up and spit you out unless you go it alone as an entrepreneur, which is a big risk at times. As a grown man or woman, do you ever want to end up days where you're sobbing in the bathroom or wanting to rip the bathroom door off in a bout of anger? Well, face it, those days will happen in this job. If you don't like it, go back to school for some other kind of career.

* Sysops don't get let go -- they get fired (or at least it feels like it). As a sysop, you have so many opportunities to get fired because you touch production systems and also because you may have professional judgment that your managers often cannot understand. Some managers may consider you so intelligent that you're a challenge to their authority and may wish you gone. And even if you manage to keep yourself disciplined and God's perfect little employee, the last day for a sysop means they'll treat you like an animal, shutting off all your access immediately, telling you to get up from your desk without taking anything, having your bag searched as you leave, and so on. All in the name of security because you had the golden keys. And keep in mind that if you have the golden keys, there is no such thing as a negative performance review -- it's just a ticket out the door. They hold you to a higher standard than most any other employee.

* Get used to long hours. Let's hope you don't have a long commute on top of your daily routine because you're going to be working a lot of long hours. Sure, it's job security, but not family and marriage security. You save your job and lose your wife and kids in a divorce settlement. Brilliant.

* It's flex time, they tell you. Yeah, flex your a__ in at 8:30am and get ready to drive in even earlier if the alerts occur. It's a fact of life, especially on Monday's, because stuff happens overnight. Got an hour long commute with a lot of potential for accidents or issues? Then this job is not for you unless you leave 30 minutes to 45 minutes earlier than that hour commute, just in case. And just because you arrived early, and worked your 9 hours (including the lunch), don't think you can leave earlier than 6pm. If you don't like that, then this job is not for you.

* Backups often fail on Saturdays and Sundays. It's the off-days usually when you do the full backups. And way too often they fail, or need a tape change, and you better like needing to drive in to replace something, troubleshoot, spend an hour on the phone with a vendor, spend a day waiting for a part to arrive and swap something out, and so on. Oh, and to boot, you often arrive to find out that the servers have other "bang" lights on (problems) when you were only anticipating problems with the backups. Don't like it? Then this job isn't for you.

* Beg for on-call rotation. If you constantly get stuck with the bill of having to come in to do something because you're the SME, then you've screwed up, need to backup a bit, don't let that happen more than twice, and push extremely hard for an on-call rotation. And I've found that per 80 servers, you need a minimum of 5 people in that on-call rotation or it gets unbearable.

* You're going to need a tracking system. I've arrived at server rooms that were tiny and just starting, and where the manager was handing out orders three times daily for work, the guys often forgot stuff and had to go back to ask questions, and it became one great big memory game and he said/she said. So, do yourself a favor and get a tracking system. The trouble is, there are some annoying ones and some great ones, and too often companies do the dumb things of giving you annoying ones that suck your valuable time, suck up the time of people needing help, and also create lost tickets in the system. But management doesn't care -- as long as they get their pretty little reports the way they want them and with the features they want, they'll stick you with this annoying app for as long as they want.

* Get ready for the rat's nest. That's no joke. I've seen server rooms with wires all over the place. I've seen phone guys come in, ask how the hell this happened, and they have to tone everything out and run the risk of bringing down your call center by touching the wrong wire. And I've seen the exact opposite of this. You need to focus on making the wiring very orderly, and it's a constant battle for a rapidly growing business.

* Stuff's going to walk. I don't know why this is, but there's a lot of theft in data centers. I've admired a mentor before, only to find out he was stealing RAM and other expensive components from the server room and warehouse. I've been interviewed by the FBI and a private investigator as they chase stuff down. So, do yourself a favor and push for an inventory system and barcodes. And if I could ever get discrete barcodes where the bad guys don't know that something's in the inventory system and the label can't be removed, that's even better. It's fun to catch these guys on camera as they set off alarms leaving the warehouse with a laptop.

* Get used to office politics. The sysop job involves a lot of politics. You have to work with developers ironing out in which department the issue resides. There's a lot of judgment involved and questioning of that judgment. A lot is at stake -- sometimes as much as couple of a million dollars in some server rooms. You may have managers who are flared up with you from other departments and against whom you have to sick your manager.

* Telecommutes are rare. It's a server room, dude. It costs a lot of money. Someone has to go in and stick a CD in. Programmers get the telecommute jobs, not you. Forget it. Even if you have this brilliant idea of rotating who's in the office and managing to come in two days a week for meetings -- forget it. No one will buy that.

* Training opportunities are rare. I don't know why this is, but they are. Companies for some reason think that you can manage the servers by reading manuals, but developers get all the training dollars. Makes no sense to me, but it's what I've seen. I think it has to do with the fact that developers make apps that bring in cash, while sysops protect the cash that's already there. The top brass in many company divisions look at sysops and that department as nothing but a cost factory, while developers are a cash factory. It's no wonder the developers get the perks and you don't as a sysop.

* Sysops also get told to do the strangest things. Oh, such as changing the bulbs over the call center, climbing on the roof to look at the HVAC, unclogging a toilet or turning one off that's gone haywire. For some stupid reason, many companies I've seen combine Sysop with Facilities Technician and Security Guard.

* The wrong answer is no. It's yes. Always yes! Yes you can work late even if it means your wife will leave you. Yes you can unclog a toilet. Yes you can fix the wire rack from Hell.

* Other sysops will drive you nuts. Often times, you're not the only sysop, but a sysop in a chain above and below. So, to fix a router, you may have to call another sysop. And the personalities among sysops go from being slow, to nice and smart (average), to unbelievably cranky and unforgiving. There's a lot of easy cases where many sysops should have been fired long ago.

* Get ready for some collasal screwups. Have you ever thought you backed something up but find out two years later that you can't pass a business continuity audit because your assumptions about those backups were wrong? Have you ever been asked to find a file on a server from three years ago that means as much as several hundred thousand dollars, but find you cannot? Have you ever gone home on the weekend only to return to a server room with the PBX out, the call center down, and excess water pouring out from the HVAC unit on the roof into the server room through a leak? Just let your imagination roam and it's out there. If that scares you, you're in the wrong job.

* Cleanliness is next to Godliness. If you want to maintain that angelic realm at your job, cleanliness is much more important than you were probably taught in school about this career. You need your cables straight and orderly, properly tied down, and color-coded by type of task. You need the racks put tightly together on their sides and their backs need proper ventilation. You need the server room floor swept and everything vacuumed weekly. Most often the maid doesn't have the kind of security clearance for the server room, so you're stuck with the task and you better love it. In server rooms, you're constantly learning new ways to make them better, more organized, and cleaner. This is not only important for you to lure more clients, but important during times of crisis and quick problem resolution.

* Telecom, Physical security, power and electric, fire suppression, insurance, safety, temperature, and humidity are very important. Think that being a sysop is as simple as typing commands, racking servers, and cleaning stuff off? That's nowhere near half the picture. There's physical security to ensure that only the right people go into the server room. Some audits or server room tier categories won't even let business managers go into the server room because they have no business there like a sysop would. When you get serious in this industry, giving tours of your server room to clients is bad except through a large computer screen and a Star Trek-like problem management console area. Power is important because you need it to be steady, have some failsafes, test the failsafe backup generator and UPS, know their time limits, know who to call for electrical problems and make certain all the staff are trained on this. I don't need to go further because it's a lot to say here. There's fire suppression, HVAC, temperature and humidity checks, system alarms, and much much more in being a sysop than knowing how to rack servers, install stuff, type a few commands, back them up, and keep them clean.

* When you get serious about server rooms, it's too much for just two people to handle and you'll need people who focus on key areas as subject matter experts (SME). The problem is, companies don't like to budget for all that extra help and it's a constant battle. Remember, as a sysop, you're in a department that is a cost factory -- you don't generate much revenue, if any at all. Instead, you protect revenue. So get ready to get the team together and choose who would be the best SME in a given category, even if that person isn't particularly interested or yet have knowledge in that field. And don't just find one SME, but always have each SME train another secondary SME.

* Sysops hoarde knowledge. Everyone in the IT industry wants to protect their promotion and job opportunity. Many try to do this by hoarding knowledge and not sharing passcodes, discoveries, or procedures with a backup SME. Sometimes you have to work so hard and so fast as a sysop that you don't have time to keep updating a knowledge base, documentation, or even speak with a backup/secondary SME. So you're going to have to constantly review whether this is going on, identify what is key, and provide benefits and job security (when possible) to those who share their knowledge. You will need a culture of knowledge sharing, especially in times of crisis, or there is the potential something drastic could happen in the server room, the SME is on vacation and unavailable, and you just started losing $20K per half hour because the call center is down.

* It's a constant learning process. I actually think being a programmer is 1000x easier than being a sysop, and I know this from personal experience. When you take into account the fact that most of the risk is on your shoulders because you touch key systems while non-sysops cannot, and when you consider the huge investment from a company, and when you have all these systems that provide warning, prevention, suppression, and reversal of problems -- there's just too much to learn and you constantly need to study. Unfortunately, companies consider you a cost factory, so you may not get great training opportunities. Instead, you'll be stuck with hard-to-read manuals, tech support phone calls, and asking a lot of questions for the vendor tech that come in the door.

* Think of your server room in quadrants. Seriously, take a rectangle for your server room, draw items in it as you see them currently in the server room, split the diagram into quadrants, and know every inch of that server room, quadrant by quadrant. For instance, see the fire suppression system. Learn it. Test it. Train a secondary SME on it. Ask tough questions and wild scenarios about it. Read the manual. Know what it's rated and how much it can handle. Call tech support on it. Read what others on the web warn about it from their own personal discoveries. Document discoveries, passcodes, and procedures for it. Improve the procedures. Keep the procedures up to date. Then, ask yourself the question about how fault tolerant that item is and whether there's a redundant mechanism somewhere else as a failsafe. The same goes for an application server. Know every piece of it and what applications it holds. Know what jobs run and under what IDs, and what directories or remote connections (as well as port numbers) are involved. Ask if there's fault tolerance and load balancing with it.

* Strive for Google-like fault tolerance and load balancing. Sure, if your employer has the cash, it would be great to completely lose power in your server room, lower power in the generator eventually, and fail over instantly to another city. Not everyone has that, but it's a good goal to strive for. Before you get there, though, you need to work within. That means that you need to take a key application that the company uses, consider all the servers involved, and have things so redundant that you can power off one server at any part of the chain of servers and instantly things fail over with almost no impact except losing just 5 transactions, if that. That requires message queue servers, special expensive database products, load balancers, RAID, sophisticated engineering, scripts ready to go, and fault tolerance to the nines. And you need hard drives and servers sitting on shelves ready to go, or have a vendor that can provide those things in 4 hour windows, if that. Now, move down to applications with lesser and lesser importance until everything is covered. Sure, this costs money, so you'll have to make a lot of businesses cases to the business to explain why this is important.

* Tape backups are not enough. They're slow and klunky and break a lot. However, no matter what, they're necessary. These larger tapes sound like a neat idea and make easier backups, but then tape restores are painfully slow and you have a lot at stake when you lose a tape due to corruption. What you need is disk-to-disk backup, first, and then sick the tape backup on that. However, that's expensive. You'll have to present your case to the business on why this is important.

* Everyone runs around with their head cut off in a newly growing server room, even five years down the road, and eventually, if you're lucky, the company will begin to focus on efficiency and consolidation, and especially virtualization. Every server room goes through this cycle. They need a major upgrade, so they move to a new building with new features in the data center. Then the company focuses on maximizing that investment, stuffing servers in there fast and trying to increase customer load into the systems. However, you just can't keep stuffing servers in the server room every time you need one for a project. That's where efficiency, consolidation, and virtualization is key. Good luck convincing the business that you need to take a year to plan, prepare, and rollout that change.

* Batch scripts are a way of life. This is something newbies in this industry often don't understand for some reason. And managers of sysops often think that scripting and web pages are only supposed to be done by the programming department. Guess again. How are you going to make a fast way to search your logs for key words? Sure, use a product, but products don't cover all your requirements all the time. And how are you going to collect that SNMP data in an average value because your SNMP plotting application that you bought doesn't have that feature for some reason?

* Know your commands, know your SNMP, know your Linux, Unix, and other interesting things. There's some technical stuff in a server room. It's not all about clicking on things in windows, dude. It's not all about purchasing products that solve your problems because they often do not meet all your requirements without extra products, customization, or hooking in batch scripts.

* Audits are now the norm. It used to be in this industry that it was not self-regulated. You solved problems and moved on. Now, however, not only does the law require you to audit certain things like security, and your headquarters send in auditors for business continuity reasons, but your clients may tell you that they'll leave you unless you meet a certain standard and have auditors prove it to them. So, if you were previously excited about being the know-it-all cowboy that solved your company crises all the time, get ready to have to document it, cross your i's and dot your t's, and worry about failing an audit if you don't meet certain conditions. And if you fail an audit, often this is grounds for a written warning, followed by a termination on the next audit if you do it again (even in a different category). Moreover, it can take you 1/4 of your year to handle all the audit issues and ramp up for the auditors to return. Try and worry about scaling the data center and not getting behind on your deadlines while you also worry about meeting the audit requirements. And think your manager will send in someone to help you document things or type anything for you? Muwahaha. Think again. It's up to you, bud.

* It's common to see managers who are phonies in this industry. This industry is very gruff. You're not going to get someone as nice as Dr. Phil to be your boss, or your boss's boss, or your boss's boss's boss. You often may not get a seasoned manager who knows this industry well except by some other company for just a few short years. This is because managers must be the most accountable for problems, and they are often the first one to get scolded. You, the subject matter experts, have something that the manager often doesn't have -- a strong technical background and a lot of knowledge -- and so the company can't afford to cut you. But don't think they won't -- companies do stupid stuff too. And I've seen CTO's with PhDs who I swear are so stupid that they must have gotten their PhD by paying someone off in their host country of origin. I've seen those with "Senior" in their name who run the gamut of being God-like and worthy of that name, or dufuses who know how to play the game and brown-nose to the right people.

* There's more job security as a sysop than a programmer, but don't let that kid you. With programming contracts that end, it's so easy to cut programmers lose -- it's the norm in that industry ever since about 1995. However, sysops often remain on, year after year after year. Above I have spoken about all the downsides of a sysop, but don't think that as you move up through the ranks you can keep your job. Eventually it becomes like Logan's Run. In that movie and sci fi novel, you hit 33 and they kill you in order to maintain a young, vibrant society. The same thing happens with sysops except they cut you lose. I mean, old sysops cause problems. I know because I'm one of them. They know too much and have a lot of horror stories. They become one to talk so easily about negatives with the team, and they're a real downer for managers. Managers may consider them a threat to their own job stability because they know too much. And old sysops do, like me, move a little slower even though they know so much. So, the fun thing that companies like to do, unfortunately, is to take old sysops and kick them out when they hit 50. And companies can get rather arrogant and cut a darn good sysop on stupid reasons even though they might have systems fail the next day and stay broke for a good three weeks to two months, costing the company a good couple of million in lost revenue.

* If you're a sysop, try to only remain in that career choice for 10 years. That's tremendous experience, but it's time to leave the company for another company, doing something else in IT or in some other career type. Otherwise, you'll end up a burned out, super-smart server room war vet that tells old horror stories, creates negative energy on the team with that, and causes your often younger manager to think things would be better with you gone (even though that's likely not the case).

Are you wanting to be a web developer or programmer? Then see this item (http://ubuntuforums.org/showpost.php?p=3952812&postcount=1) for advice on that.