Network Management Archives

Data Centers understaffed, says Symantic poll


Network World reports on a survey by security software vendor Symantec which talks about data center staffing. Specifically, half of the respondents said that their data centers were either extremely or somewhat understaffed.

And of course, there’s always the usual suspect to trot out – the economy – forcing IT workers to do more with less, with cutbacks and layoffs hitting IT hard. But there’s also another factor – that it’s not just that the IT staffing budgets are decreasing but also that the job of the network engineer is becoming complex, thus increasing the overall workload.

This is especially true in mid-sized enterprises where new technologies which can save money but which are extremely complex, like virtualization, WAN optimization, and cloud computing are being implemented at a faster rate than either smaller enterprises or larger ones.

Well, if you don’t have enough manpower in your data center, there are three solutions I can think of off the top of my head. The first is to hire more people. This may not be feasible given current budgets.

The second is to decrease workload. In short, taking the approach that instead of trying to do “more with less,” that it’s okay to do “less with less.” Five nines of uptime give way to three nines, and applications previously handled in-house are outsourced to a cloud services provider. There are some disadvantages to doing things this way, of course.

The third is to find a way to decrease the complexity of your network – perhaps by using management tools that provide a broad overview of the network and how the applications are performing. The only downside there is that if you don’t use these tools correctly, instead of making the job easier, an additional manager could just end up increasing the complexity of the network that much more.

All three of these solutions have the possibility of being disruptive – at least in the short term – and monitoring your network for those disruptions is the quickest way to get to the root cause of them.

Though CA and CA|NetQoS are vendors of aforementioned management and monitoring tools, I’m pretty comfortable suggesting that if you can hire more people, that it might be a good idea to do that first, if you’re making decisions about where to spend the budget money. There’s a couple of reasons for this.

First, no diagnostic, monitoring, or management tool can replace a network engineer with a good head on his or her shoulders. All a tool can show you is where the problem lies; the engineer has to come up with the solution.

Second, if you have engineers who know what they’re doing, they’ll be the ones to suggest the tools that they need, rather than buying tools first and then trying to train engineers on the proper use of the tools chosen on their behalf. A good engineer with a mediocre management tool is better than a mediocre engineer with the best stuff in the world, after all.

(Not that we don’t want you to buy the best stuff in the world - which, if you haven’t guessed our particular bias, is our stuff…)


Network Management Archives

Cynical Cloud Computing


InfoWorld’s Paul Krill recently reported that Richard Marcello, President of technology, consulting, and integration solutions at Unisys, said something that could be regarded as a bit of a PR blunder at the Cloud Computing Conference and Expo.

“We were able to eliminate a whole bunch of actually U.S.-based jobs and kind of replace them with two folks out of India to serve a 1,200-person engineering organization.”

Well, great for those two guys in India, and I’m sure it’ll go a long way towards helping the Indian economy recover from its recent invasion of Dahler Mendi Clones… FROM SPACE! Still, I’m sure everyone who works in IT in the U.S. felt a chill go up their spine when reading that quote.

Now, part of the reason that Unisys was able to cut those jobs was because they set up a “private cloud” in the company, which allows them to do server provisioning in five minutes, compared to 10 days of manual provisioning. These provisioned servers could then be managed remotely.

This is true enough, as far as things go, but oftentimes it seems that companies view IT as nothing more than a capital expenditure which should be cut as much as possible. IT is not just a capital expense – it is, and always has been, the “force multiplier” of the business. IT doesn’t just cost money, it enables your company to grow with new challenges. Smart IT is about developing or delivering applications – and if you have a surplus of IT, consider using that surplus power to either improve performance for existing applications, or work on developing the applications to simplify workload.

The real power in cloud computing and virtualization is not just that it saves hardware costs, but that it frees up your engineers from doing things like maintenance and administration when they could be engineering – solving problems and improving solutions. In other words – IT doesn’t generate revenue directly, but they make the revenue generating parts of the business generate more revenue. If you don’t see value in that, you’re doing something wrong.


Network Management Archives

Hockey Night in the Data Center


Harwell Thrasher, author of “Boiling the IT Frog: How to make your business information technology wildly successful without having to learn anything technical,” has a blog post out talking about how, during the current economic situation, which has gone beyond “depression” and towards “the pit of despair,” companies are making dangerous cuts to IT staff.

He compares it to an ice hockey tactic called “pulling the goalie,” in which a team is down by a goal in an important game, and they will swap out the goalie for a sixth offensive player in a desperate effort to score.  Doing so is within the rules but leaves the goal undefended.  For example, an IT department that cancels offsite backup recovery solutions, stopped updating virus prevention software, and laid off the only guy in the company who really understands how to maintain and support custom systems all lead to the possibility of a grave disaster that threatens to seriously harm the company.

But the metaphor is flawed.  Pulling the goalie in hockey may reduce defenses but it gives hockey teams a better shot of playing on the offense.  A lot of IT cuts seem to be not pulling the goalie – most companies at least know to keep their anti-virus software up to date – but they might not take network performance as seriously as they once did, and make reductions in IT without realizing that it can be a false savings.

That is, it is difficult – but not impossible – to determine the costs of letting a particular application, like, say PeopleSoft, experience a “brownout” – still technically “up,” but performing poorly.  Losing money in lost productivity or sales or customer satisfaction.  At that point, it’s a simple equation: did the money saved from the IT cost cover the productivity, lost revenue, or irritated customer? If the answer is “no,” then it’s clearly a case of false economy.

This is especially important considering that companies are starting to reconsider the “do more with less” mentality and are now thinking about “doing less with less.”  And indeed, this can be a viable tactic – if you can save money by going for three nines of uptime instead of five nines of uptime, it can be worth it if you only need three nines of uptime. 

Network performance requirements can be cut in the same way, sort of.  I mean, while it actually hurts me, emotionally, to suggest this, “the best” network performance isn’t always the most cost effective network performance.  So, for example, if you can save money by allowing some periods of congestion on the WAN, so long as that congestion never gets over an acceptable amount, then it might work.

The problem is finding out what’s “acceptable.”  This means baselining performance and understanding what kind of performance your business applications need.  It’s for this reason that cuts in IT should not include the network engineers that make those determinations, nor the (self-interest alert!) network monitoring solutions they depend on.  IT without the former is “pulling the goalie,” while IT without the latter is putting the goalie out there without a stick, protective gear, or skates. 


Network Management Archives

The Robots Are Coming For You


As Halloween approaches, I’ve got a bit of a horror story to keep you up at night. 

There’s an interesting quote that’s somewhat appropriate now.  Well – song lyrics anyway.  “Did you feel you were tricked / by the future you picked?” Which, I’m told, are part of a Peter Gabriel tune for a Pixar movie, but which I only came across when reading speculative fiction about quantum AI computers running 419 scams.

The thing about the future is that by the time it gets here, it’s already the present. Wait, I’m sounding like Criswell there… what I mean to say is that only a couple years ago, the big story in technology was how IT departments were becoming centralized due to advances in virtualization technology that cut down on hardware requirements and power consumption.  Now the next level is cloud computing; an idea, fundamentally, that you can centralize data centers even further by centralizing them with the data centers for other companies via a third-party provider. 

Taken to an extreme, it’s easy to think of a day when even these cloud computing centers become even further consolidated – perhaps one on each inhabited continent.  “A world market for maybe five computers” indeed…

Except, it’s not quite that easy.  The transition from in-house architecture to cloud computing resources is just about as difficult as the transition from real servers to consolidated virtual ones, and the big problem is ensuring network performance – that data gets where it needs to go quickly.  


Much as the server consolidation/virtualization problem was helped with better virtualization technologies and advances in WAN optimization, the current rush in IT tool development is in the cloud computing area (not that we still don’t have a-ways to go with virtualization and consolidation).  And some of these cloud-computing tools are starting to appear – for example, self-managing environments

One of the newest approaches is the concept of the "dynamic infrastructure." Rather than a simple collection of humming boxes or cards designed to push data this way or that, the dynamic infrastructure brings together virtual networking, automation and resource management with tools like application management, security and policy management to create a self-managing environment that can react to changes in workloads and other needs with minimal human interference.

Lori MacVittie, technical marketing management for application services at F5 Networks is one of the prime movers of the concept, which she says will be the inevitable result of the transition to the cloud. 

"When the entire data center is founded on a dynamic infrastructure, the infrastructure can react itself to changing network and application conditions and needs," she says. "When the entire ecosystem is sharing status and information about performance, every component can adjust itself dynamically to what’s needed now to improve performance or maintain availability. And it happens automatically, based on the specific needs of the business and IT."


Virtualization has underscored the need for performance management; back when everything was run on actual servers, you could almost always fix a problem by finding out where the bottleneck lied and increasing the amount of stuff.  Not always, but almost always.  But with virtualization, you’re essentially managing an interconnected ecosystem of stuff and… well, stuff that’s not stuff.  “Unstuff,” to borrow a bit of NewSpeak. 

And this management is so complex that it has increased the demand for network engineers, yes, but it’s also increased the demand for software to come along and replace the more tedious tasks of network engineers, automating the processes where possible.

But what if there is no upper limit?  What if self-managed cloud computing software is exactly that – with computers calculating exactly what needs to be done to preserve performance and then automatically fix it? 

And that network monitoring software…. WAS ME THE WHOLE TIME!!!!!

AAAAAAAAAHHHHH!!!! 


Network Management Archives

The Re-Education of NetFlow


by Ben Erwin

NetFlow or NetFlow-esque technology (Jflow, Cflowd, NetStream, IPFIX, etc.) has been around the network management world for quite some time.  Thousands of IT shops worldwide leverage its capabilities to analyze traffic flowing across the network. 

Recently, some vendors have recently made somewhat misleading statements about NetFlow’s capabilities.  There are very good reasons why NetFlow is a de facto standard (and through IPFIX, soon to be an IETF standard).  Here are some quick reminders on why NetFlow is still the king:


  • 100% visibility across all network links.   A common misconception about NetFlow is that it samples traffic.  Netflow exports every transaction it sees, and provides a full picture of what traffic is flowing across the network.  Now, it is true that sFlow samples traffic for flow export, but NetFlow exports every transaction it sees.

  • Enabling at network aggregation points.   Instead of enabling NetFlow on every router, most NetFlow aficionados are able to enable NetFlow only on those aggregation routers that see the majority of network traffic.  This way, network managers can visualize their network traffic while not having to go overboard with router configuration. 

  • Granularity versus TCO.  It’s true that NetFlow does not provide Application Layer (Layer 7) information.  But even so, remains the best bang for the buck for network visibility – yes, you could deploy probes all over the network to gain Layer 7 visibility – but there’s a significant opportunity cost in time and manpower for deployment, configuration, and ongoing monitoring, and the total cost of ownership for a probe solution for Layer 7 visibility simply isn’t worth it.  Many IT shops have dumped probes altogether and gone with NetFlow despite this limitation. 

  • Free (if you use Cisco).  NetFlow is free on all Cisco routers.  All you have to do is enable it.  This makes it a very cost-effective solution compared to alternatives. 

These are all reasons why NetFlow will continue to be top dog for network visibility.  And while there are improvements to be made, certainly (there is no such thing as a “perfect” machine,) right now some of the best solutions for network visibility take advantage of the capabilities that NetFlow provides. 


Network Management Archives

Jim Metzler looks back at his 2009 predictions


In this video (part one of two), Jim Metzler looks back at some prediction he made at the beginning of the year, and how they're shaping up to reality in this retrospective interview with Jordan Weiss.


Network Management Archives

Worst. Year. Ever.


By Patrick Ancipink

Gartner has made it official: 2009 was the “worst year ever” for IT. I’m here at the Gartner Symposium in Orlando and about 15 minutes into the opening “parade of analysts” keynote yesterday, I was really hoping the Disney location would lighten the mood a tad but the Halloween nightmare continued for a while.

The only wealth evident was the amount of statistics that point to IT hurting for another few years—Gartner predicts it will take until 2012 for broad IT spending levels to equal 2008. As a result, infrastructure upgrades are being delayed (think servers, PCs, printers) to create another source of risk. And don’t forget that “trust” in the business is at an all-time low.

Some much needed genuine humor—in contrast to the creepy, awkward laughter of Windows 7 launch party propaganda—came in the form of VP and Gartner Fellow, Andy Kyte. While his message was serious— “You all have a bloated application portfolio”—his analogy was priceless. IT and the business prefer to be in the mode of making babies (new apps) but are not into responsible parenting. That’s why there are so many orphaned applications that were only funded to be birthed, but not cared for and raised to maturity. Mr. Kyte’s direction to this audience of “the world’s most important gathering of CIOs and senior IT executives:


“Make every application a wanted application.”


The morning keynote wrapped up with less doom and some good points of the inevitability of social media (start harnessing it and spend less time trying to kill it). All-in-all, the morning was a solid reminder of the reality of the IT marketplace in general.

Drilling into IT operations management topics today provided some points about what CIOs are doing and should do to get through the gloom. Starting with CIO priorities based on survey data, 2009 looks like this:


  1. Linking business and IT strategies and plans
  2. Reducing the cost of IT
  3. Delivering projects that enable business growth
  4. Improving IT governance
  5. Implementing IT process improvements
  6. Improving the quality of IS services

The 4th, 5th and 6th priorities are very related to the 2nd, so it becomes clear to see why the IT operations management segment, which includes network and applications performance management, are among the healthiest sectors in IT.  One way to think about it is that the cost savings need to be tempered with some sanity (automation and process) so application and service delivery are not overly compromised. With more dependence on the network and IT in general, the cost of downtime and poor application performance continues to rise.

None of that is surprising, but what I found interesting is the variability when you look at the changes in priorities over the years. For example, just last year reducing cost was all the way down at #10 on the list. Looking at 2012 predictions, the cost issue dives back down to #6 with the underpinning priorities (governance, process and quality) at the bottom of the list. In 2012, “leading enterprise change initiatives” rockets up the chart from #13 today to #3.

So the take away is that CIOs think they can hold the rudder steady for a few years, institute some much needed process maturity, and then be in position to contribute more back to the business.


Network Management Archives

Jim Metzler on Infrastructure Management Tools and Methodology



 
 


Network Management Archives

“President Obama, Will you save the Tiny Mars Humans?”


Monitoring your network is crucial to maintaining your network; but the two are obviously not the same. You can have all the data, have it presented in an easy to understand format, run report after report, and it won’t matter if, at the end of the day, the person whose job it is to interpret the data misinterprets it.

If you look for the wrong things – for example, if you’re still primarily concerned with availability rather than latency – you can miss the most important details and come to the wrong conclusions about your network.

It reminds me of this guy, who has analyzed the Mars Rover photos on the JPL Journal Web site, and believes that there is a vast conspiracy at NASA to trample tiny humans (about 5cm in height) under the wheels of the Mars Rover.


“Next three images shows [sic] typical areas on Mars where three sizes of humans and primates live a symbiotic lifestyle. Strangely, the primates appear to be sentient…”

“Next is the Tiny humans [sic] attempt to disable a Mars Rover. The reason; it is the machine that has cause numerous deaths among the smallest Humans who cannot detect or hear the Rover coming.”

“***Warning next 5 images show scenes of death by crushing.*** Americans have Constitutional rights to know this information I have discovered from public posted JPL images…. The second image is gruesome. It shows the Rover has driven through a thickly populated tiny human’s area, killing a great number of them…. We are not at war with them. Someone will answer for these deaths.”


The photos, obviously, contains blurry images of rock formations and dirt, the silhouettes of which may look vaguely human-like in a Rorschach-ian way. Personally, I don’t even think they look vaguely human.

I bring this up because it reminds me of the idea that network data can often be an ink blot test of sorts; if someone’s looking only for availability, they’re simply not going to see the problems that are caused by poorly performing (but still available) applications.


Network Management Archives

What Google Did Right In Yesterday’s “GFail” incident.


Google’s E-mail was down yesterday for 100 minutes – annoying many, and hurting the productivity of a number of companies that rely on Gmail.

You could chalk this up to the danger of relying on cloud apps… but, Google is supposed to be the cream of the Cloud providers. What happened? Well, according to the Google Blog:


Here's what happened: This morning (Pacific Time) we took a small fraction of Gmail's servers offline to perform routine upgrades. This isn't in itself a problem — we do this all the time, and Gmail's web interface runs in many locations and just sends traffic to other locations when one is offline.


However, as we now know, we had slightly underestimated the load which some recent changes (ironically, some designed to improve service availability) placed on the request routers — servers which direct web queries to the appropriate Gmail server for response. At about 12:30 pm Pacific a few of the request routers became overloaded and in effect told the rest of the system "stop sending us traffic, we're too slow!". This transferred the load onto the remaining request routers, causing a few more of them to also become overloaded, and within minutes nearly all of the request routers were overloaded. As a result, people couldn't access Gmail via the web interface because their requests couldn't be routed to a Gmail server. IMAP/POP access and mail processing continued to work normally because these requests don't use the same routers.


The Gmail engineering team was alerted to the failures within seconds (we take monitoring very seriously). After establishing that the core problem was insufficient available capacity, the team brought a LOT of additional request routers online (flexible capacity is one of the advantages of Google's architecture), distributed the traffic across the request routers, and the Gmail web interface came back online.


On one hand, Google didn’t understand their network well enough to know the effects that the change would have. On the other hand, Google did some things right – their monitoring software alerted them to the problems before the users started calling Google, they were quickly able to diagnose the problem, and that lead to a simple and direct solution to get up and running relatively quickly. 100 minutes may seem like a long time, but from the problem to the repair, it’s actually relatively short.



<< 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16