Network Management Archives

Change Management when the Network Changes Us.


If you want to score some easy points with the geek crowd, tell them that DRM (Digital Rights Management) stinks. But with the stress of the elections, I could use a few easy points, so humor me.

When you’re evaluating a change to the network, you have to think – always – what will the real effect on network performance be? And this is important – if there’s one overarching theme this blog has had over the past two years, it is that the network does not begin and end with the router. It doesn’t even begin and end with applications. No – when you think about what the real effect on network performance will be, you have to think beyond the technical into the realms of the personal and psychological. That’s true end-to-end performance.

There are two stories that have been making their rounds through the Web recently; the first, which we’ve covered extensively, is the Australian Internet filtering software. The second, is the release of the highly anticipated Fallout 3 PC video game being bundled with SecuROM DRM software. This is notable for two reasons: First, Fallout 3’s publishers, Bethesda Softworks, earlier took a stand against DRM for the release of their other major product, Oblivion. Second, SecuROM, made by Sony, is particularly invasive, and particularly when used with EA’s hit, “Spore,” caused a particularly nasty backlash – which included a campaign to pirate the game via BitTorrent just to spite EA.

Bethesda Softworks insists that the SecuROM software is only used for a CD check and is not nearly as restrictive as the software that came with Spore.

A common complaint with DRM schemes is that they actually cause more problems for the people who legally purchase the game than for those who break the DRM in order to pirate the game. The worst DRM schemes can make using the product a hassle, cause system instability, and just generally be a pain in the butt, while doing nothing to stop piracy. The best DRM schemes don’t get in the end-user’s way, allows for reasonable use and portability, and is never under threat of expiration. Of course, these don’t do anything to stop piracy either, but let’s only concern ourselves with the latter for right now.

Point is, when you’re hassling only the people who adhere to the rules, you’re creating a situation where people will get around the rules.

In the earlier post we did on Australian net filtering; the point was made that for five dollars a month, you can set up a VPN in the United States get around all the restrictions of any of the proposed filters. However, in doing so, you’re essentially routing all traffic through the United States and back again to Australia – putting added stresses on the very expensive international pipes even when accessing local content that would have been better served via local pipes.

This is obviously a large-scale version of the problem, but enterprises that deny access (rather than de-prioritize bandwidth) to particular protocols, sites, or applications will find employees will get around the obstacles in order to do their job.

But even this is a very specific example of a larger point – one that goes beyond tactics, beyond strategy – to network philosophy. Even if you don’t think of it as a change to the network, when you change how users behave – even if it’s a new policy from the HR department that gets printed on a piece of paper, you change how the network is being used. Changes can occur to the network from outside the network as well – when culture changes, so does the way that people use the network.

All of this is leading to the point: Even though you aren’t planning on changing the network yourself, you should consider always keeping a close eye on your network with network performance monitoring tools. Change management is not just for when you change the network; change management is for when the network changes you.


Network Management Archives

The Internet: Wrong for Twitter, Wrong for America.


Alex Payne, API Lead at Twitter, recently wrote about a blog post beginning “The Internet is built wrong.”

We happen to concur. There aren’t enough pneumatic tubes.

But aside from that, Payne’s criticism is that the Internet, designed years ago for pushing text, research data, and code was designed poorly for the problems we have now – the examples he states are IPv4 – which works well enough but is inadequate for the world’s need for IP addresses, and SMTP, the simple but unsecured nature of which has lead to problems with Spam and DOS attacks on e-mail boxes.

Moreover, he talks about performance:


“You needn’t do more than attempt to watch a streaming video on a busy office LAN or oversubscribed DSL circuit to understand that even the best-served markets for Internet connectivity are struggling to keep up with demand for networked content. Add to this that providing adequate security models for such content is a virtual impossibility on today’s Internet, and the need for a better approach is even clearer.”


The problem is that while Payne makes the case that the Internet doesn’t work as well as “something else” would, and we’re only using the technologies of the Internet because it works just well enough that “something else” can’t replace it even if it’s better – he leaves that “something else” to Van Jacobson and Jacobsen’s idea of “Network Channels.”

Jacobson gave a talk at Google in 2006 on this idea.

The cynics over at Slashdot immediately considered Payne’s idea of a “content centric approach to networking” as some sort of buzzword that means “owned by the media cartels.” However, I don’t think this is what Payne means at all. (If it is, I apologize for not being cynical enough.)

As Jacobson points out, TCP/IP is a very successful technology, but it has a few problems. You can’t connect to things that move – and we’re not talking Wi-Fi. Take trains, for example – as you switch seamlessly from one cell node to another, it’s easy to make a cellphone call on a train. But it’s not so easy to have a continuous Internet session. (In fact, if you do, it’s probably through the cellphone network…)

Additionally, the protocols that were designed for conversations between specific endpoints don’t work as well as they could with broadcasting because the network protocols in use were designed for conversations between two applications on two machines. But Jacobson believes that even the idea of the conversational model isn’t adequate to solve today’s new Internet problems.

From Jacobson’s video:


“We got a chance to look at the data on the routers downstream of NBC’s servers for the Olympics. At one time, their main router got severely congested when Body hit the pole on the slalom. In that router there were 6,000 copies of the same data. Everybody was pulling down the URL. The poor router can’t do anything about – you can’t optimize it, because its dynamic content. It’s all going out in separate conversations. All the router knows is I’ve got 6,000 separate TCP conversations. It’s the same data. If you could broadcast it, you could turn both the router and the downstream links from the server, reduce the bandwidth by three orders of magnitude, but our protocol architecture doesn’t support that. It works at the conversation level…

…Any of the measurements that I’ve seen recently saying that the high 90% level of traffic is people trying to get some named chunk of data. They hand in a URL, and they want to get something back. That’s not a conversation. That’s not a conversational model… that kind of interaction is a dissemination… It’s a point to multipoint or multipoint to multipoint.


Now, this is not to say that the traditional TCP/IP model doesn’t work – Jacobson is keen to point out that the problems we have today with networking only exist because the problems we used to have with networking that TCP/IP solved were solved extremely well. We just have new problems.

Weirdly enough, anecdotal evidence bears Jacobson’s model out. Look at BitTorrent, which is essentially multipoint-to-multipoint dissemination over the TCP stream. And it consists around 50% of the traffic on the Internet. Add eDonkey, another multipoint P2P app, and you get 70%. If that’s not a clear indication of changing demands, I don’t know what is.


Network Management Archives

Disasters in IT, and Ninja Networking


Other than Unix Beards and “funny” T-shirts with hex code on them – which more accurately qualify as fashion disasters – the biggest project disasters in IT, according to today’s top story in Computer World, tend to repeat themselves:


When you look at the reasons for project failure, "it's like a top 10 list that just repeats itself over and over again," says Holland, who is also a senior business architect and consultant with HP Services.


You’ve got your usual run of top-ten disasters in the article, including IBM’s Stretch project (Overpromised and underdelivered), Knight-Ridder’s Viewtron (misread the market), California and Washington States’ DMV overhaul and FoxMeyer’s ERP program, (didn’t make sure the new system worked better than the old one), Apple’s Copland (succumbed to feature-creep), Sainsbury’s warehouse automation (just plain didn’t work), and Canada’s Gun Registration System (cost much more than anticipated due to poor planning), and three U.S. government projects (multiple failures with perhaps more in the future).

But one of the things that I noticed was that it’s relatively rare (not unheard of, but relatively rare) to see networking take a prime role in the huge IT disaster stories that get passed around the campfire during IT tribe meetings. And I think that there are a few reasons why that is – the first is that most of these blunders would fall under the category of “strategic errors” as opposed to “tactical errors.” That is, network problems are usually subtle errors caused by mis-configurations and highly technical mistakes. The networking screw-up can be one of the most subtle, stealthy types, compared to the grandiosity of all-out strategic incompetence.

Or in other words, networking performance problems can cause the best laid plans to often go astray; the worst laid plans need no additional help.

Take, for example, a common error from back when they were first rolling out VoIP deployments – companies would roll out VoIP on the network as if it were just another data application, but then found that their other applications slowed to a crawl or even stopped working.

The problem was that VoIP packets are based on protocols designed to use as much of the pipeline as possible, while most applications are based on the TCP protocol, which is designed to throttle back it’s use of the pipeline if packets don’t go through. So what happened was that the VoIP packets would take more of the pipe, TCP applications would be crowded out and drop packets, which would cause the TCP protocol to throttle back, and the VoIP packets would now see the free space and take up more of the pipe, crowd out TCP packets and TCP would throttle back… creating a vicious cycle.

Was this a problem with strategy? Was it some form of bureaucratic incompetence? No – it’s just that it was a very subtle effect and if you didn’t know enough about the TCP and VoIP protocols (or even if you did, but didn’t put two and two together until it was deployed) you ended up with a problem.

Networking problems may have major effects but they’re rarely caused by major boneheaded screw-ups. I think that’s one of the reasons why the two major areas where IT departments spend a great deal of money – networking and security – is because those two problems are extremely subtle to detect and tricky to solve; security problems by malicious design, networking by nature.

Networking problems are subtle, can strike quickly, can often leave little trace of their presence. They’re the ninjas of IT problems.

Of course, ninjas can be defeated.


Network Management Archives

This-specific-end-to-that-specific-end network performance management.


EMA analyst Dennis Drogseth had a column in Network World yesterday talking about end-to-end application management. In it, he had this to say:


You might believe, and with some real justification, that the term “end to end” is only used by vendors who custom-fit the definition to the scope of their particular product.

Does “end-to-end” application management, for instance, include the mainframe? You bet it does if you’re a vendor that manages the mainframe environment! Does it include capturing the end user experience at the end station, desktop, or mobile device? Once again, the answer is a definitive “yes” if you’re a vendor that has strong QoE (Quality of Experience) roots. Or how about insights into the code and design of the application itself? If you’re one of the few vendors that does this, you’re proud of it and wouldn’t have it any other way!


And this concerned me because, if you do a google search for: [site:networkperformancedaily.com “end-to-end”], you get 122 results. The phrase, “end-to-end” appears in a little more than 1 in 5 posts we’ve made to this blog.

So, what do we mean by “end-to-end?”  We’re usually using the phrase in connection with network response times and the end-user experience at the end station; NetQoS is a “vendor that has strong QoE roots.”

Now, we do have some insight into the code and design of the application.  But that isn’t the focus of our tools; the focus is to tell you whether the problem is in the network, server, or application, and if it’s in the application, give you a good idea of where to start your investigation.  (For example, an application that is slow due to unnecessary round-trip transactions behaves differently from an application that is slow due to a memory leak on the server where it is being run.) 

Drogseth is right when he says that no one vendor is optimized to do it all.  In the future, there could be, but then you run into the quality vs. quantity problem.  Is it better to do it all adequately or to do a few things extremely well?

EMA defined five major technology spheres, and last June, they polled more than 400 respondents to find out which of them they believed “most critical to end-to-end application management in 2008.”  The answer was “Network Application Management,” focusing on application flows and end-to-end (as we define it) transaction capabilities. 

For more information on this, I recommend you read the original article up at Network World.  Additionally, Drogseth promises to follow-up in his next two columns. 


Network Management Archives

Three Things You Can Do Today To Improve Network Performance Without Spending a Dime


For months, we’ve been waiting to see what the fallout would be from the sub-prime mortgage crisis.

Apparently, the results are not unlike a hefty bag filled with chili con carne, dropped from the top of a skyscraper. Only instead of a hefty bag, it’s the U.S. economy.

So, as Wall Street explodes like an explosive so explosive it could explode and create a massive explosion, technology turnaround times will probably extend a couple more years as CIOs try to figure out how to use existing tools to solve network management problems and improve performance. How do you do that?

Luckily, there are ways to do that – Cisco routers and switches already have “application-aware” technologies and don’t require any additional purchases – including IP Service Level Agreement (IP SLA), Class Based Quality of Service (CBQoS), and Network Based Application Recognition (NBAR).

Managing Application Response Times with Cisco IP SLA

Now, measuring real application transactions is the most accurate method for measuring response times. But, failing that, you can use Cisco IP SLA to create synthetic transactions. This is not only useful when on an IT budget crunch but can also provide useful data when assessing whether or not to roll out a new application, or measuring a service provider’s SLA edge-to-edge.

IP SLA operates by sending synthetic transactions between two network devices or between a network device and a server. It can be configured to send different types of synthetic transactions based on port, packet size, type of service, and even more advanced characteristics, as is the case with Voice over Internet Protocol (VoIP) tests. When it gets a response, the sender then calculates the response-time metrics appropriate for the test type, and then repeats multiple times.

Some SNMP polling products can collect data automatically, store it in a database, display the results in a GUI, and provide analytical function beyond data collection, such as calculating baselines, displaying trends, and triggering threshold alerts based on collected IP SLA data. There’s also the possibility of simply getting the information from the CLI, but extracting the IP SLA response-time metrics and copying them to a spreadsheet can be difficult and tedious. However, for the extremely budget-conscious, it can be done.

Deploying Quality of Service with Cisco CBQoS

QoS is a blanket term for network policies and practices that help to manage different types of data traffic that share network links. Effectively, QoS determines how different types of traffic, with different priorities, are handled whenever tradeoffs that are likely to impede performance must be made.

Now, within any enterprise, the end-user experience with certain applications will always be more critical than it is with others. Strategies to avoid (or at least manage) congestion could include dropping traffic, adjusting application responses, and building packet queues. CBQoS is one way to do this – and comes with the CBQoS Management Information Base (MIB) to collect statistics about the traffic traversing the router and reports how the QoS configuration is being applied.

Here, an SNMP polling product with application-aware capabilities can get information on input and output QoS class map utilization, drop percentage, and packet counts. It can also get information on pre-versus-post QoS traffic volume, rate, and packet count. It can also point out traffic marked in conformance, in excess, and in violation of defined policies.

Without CBQoS, network managers don’t have a whole lot of evidence to verify that their QoS settings are actually improving network performance – in fact, they may even be inadvertently harming performance. CBQoS prevents network managers from flying blind with QoS deployments. And, like IP SLA, it’s built into Cisco IOS.

Gaining a New Level of Visibility with Cisco NBAR

From within the network device operating system, Cisco NBAR can inspect packets traversing the device and identify the corresponding application – for example, TCP traffic running on port 80 could be labeled as Google, SAP, SharePoint, SalesForce, etc. NBAR can also provide utilization, volume, and rate metrics on a per-application basis relative to the network circuit carrying the traffic.

It’s similar to NetFlow, but NetFlow identifies protocol traffic mixes – not application-layer visibility. NBAR identifies by application – which is important in setting proper QoS policies. And because NBAR is part of Cisco’s IOS, and the data can be collected with an application-aware SNMP poller (which many of you already have), it can be a more cost-effective solution than application discovery hardware.


Network Management Archives

Interop Survey Results: IT spending up in 2009?


While on Wall Street, banks were collapsing, IT pros were in New York as well for Interop.

We were a bit concerned, what with the economic downturn and all, as to what would happen with spending in IT in the upcoming year. So NetQoS ran a survey polling 112 respondents who attended Interop New York about how much they would spend on network management disciplines and other IT initiatives in 2009.

Here’s what we found.

A plurality of respondents, 46 percent, said that their spending on network management disciplines would stay the same. Only 15 percent of respondents said that they would spend less on network management disciplines and 39 percent actually said that they would spend more on network management disciplines.

Considering the economic woes on everyone’s mind – that’s pretty huge. And it implies that network management is seen as a necessity rather than a luxury. For example, a plurality of 28 percent of survey respondents indicated that the least likely to see an increase in spending in 2009 was change management. This makes sense: No money, means no new projects, which means no change, and no need for change management. Plus, change management has been a heavy investment area over the past few years so more competency has been built in this discipline at the expense of others.

Overall, 34 percent of survey respondents actually plan to increase overall IT spending in 2009, with 54 percent keeping it the same level. Only 12 percent plan to cut IT spending in 2009.

Does this mean that the economy is better than perhaps we had thought? Unlikely. Instead, what I think this means is that either A) IT is seen as such a vital part of the company that companies aren’t likely to cut corners, B) the corners have already been cut so far that there isn’t much left to cut without hitting something vital, or C) IT is finally starting to make the case that spending there can reduce costs elsewhere in the company.

Look at the big trends in IT: Server virtualization, datacenter consolidation, WAN application development, teleconferencing – all of these are designed to reduce cost. To some extent, IT has always been about leveraging technology to do more with less money, but there’s definitely more of a pronounced emphasis on the “less money” part of that equation than the “do more” part.

If you’re interested, we have a press release about the survey on the NetQoS main Web site.



Network Management Archives

Work Harder, Puny Earthlings!


It is a great selling point for many networking vendors to point out exactly how much money you lose when networks aren’t performing to peak efficiency – and there are real savings from faster round trip application response times.

But as Mark Gibbs at Network World points out, when you start to equate worker productivity to network performance, it gets a little hairy.


The problem with these kinds of analyses is they aren’t identifying real costs because you can’t equate a solid hour of an employee’s time with an hour of his time that’s broken up into chunks of minutes or even seconds over a long period. 

If you are calculating the value of an employee it has to be on the basis of actual productive work done and revenue derived from that work.


So, for example, shaving 2 seconds off login times each day may make people slightly happier, but those two seconds really don’t “add up over time.” In many occupations, it is not the volume of transactions that determines the value the employee brings to the company – the creatives in marketing, the go-getters in sales, the brainiacs in R&D, and the psychos in management typically aren’t affected by the extra minute of time that it takes to log in each month.

This type of mentality – that all employees earn X dollars per second, and any second they are not working costs the company money – is a bit alien to me. And by “alien” I mean the kind of alien that enslaves the human race to make them build statues to their leaders and orbital brainwarp lasers. Yes, work ethic is important. But micro-managed employees are stressed and unhappy, and stressed and unhappy workers make mikstakes.

Of course, if you’re waiting 1 or 2 seconds for transactions that you do repeatedly, that does save time – and this is where latency actually produces a serious problem. Delay, the way a human normally thinks about it is a function of latency times the number of transactions. Focusing on latency is good, but focusing on the end-user experience is better – a lightning fast pipe doesn’t mean much if you’re sending data across it 30 to 40 times more than you have to.

For example, any automated system, like, say, algorithmic trading, will receive tons of value from lowering latency considering that robots, like interns, are well suited for mindless, repetitive tasks because they have no souls.

But back to the point; there are some jobs in the organization where network delay actually does affect productivity. I remember working at a medium sized supermarket retailer in the Northeast straight out of college as a data entry clerk. We used a piece of Java-based database software that was slow as hell – it would take seconds just to switch fields between different pieces of data entry on the same form. If I knew then what I know now, I’d probably say that the problem was that the software was designed for a low-latency LAN and had tons of connections through a higher-latency WAN. This caused tons of problems – what could have taken minutes took hours, what could have taken hours took days.

Of course, back then I was just a data entry clerk. I decided not to bring this up with management, considering that I didn’t want management to think about ways to improve network response times. After all, I had figured out by week three that my employment was preconditioned on management never figuring out that since we were just entering data that someone else printed out from Excel spreadsheets sent in by regional managers, the entire department could have easily been replaced by a very small shell script.


Network Management Archives

Doing It Wrong


Reprinted from TheDailyWTF.com:


At my company, there's a bit of a wall between Application Development and Network Operations. All "network and network-service related issues" must be reported through the porthole, a.k.a. Helpdesk. Quite often, this leads to interesting results.

"Helpdesk, Jerry speaking."

"Hey Jerry," I said, "this is Paul over in app dev. Our TerraTrade system has a defective ForEx feed that needs to be fixed right away. It's causing a bit of an outage, so if possible, can we open the ticket as 'Urgent'?"

"Not a problem," Jerry responded, "let me just have your name and number, and we'll take care of it."

I gave him a few more details and felt pretty happy that helpdesk was actually helpful. Five minutes after I hung up, an email message came in:

[URGENT] TICKET #71248 HAS BEEN ASSIGNED TO YOU

Not a moment later, my phone rang. I hesitantly picked it up.

"Hello is this Paul," the caller asked before I could even say Hello. I affirmed that it was me.

"Paul," he said, "this is Steve over at help desk. We've got an Urgent trouble ticket that just came in. It's for a, uh, Fortix feed defect. We just wanted to make sure you're on it?"

It took me a few minutes to explain to Steve that he was, in fact, assigning me the defect I had just reported.


Before you laugh too hard at the above story – it’s not that far removed from what many IT departments do daily – play the blame game. The user calls a problem into the help desk, then assigns it to the network. The network calls the helpdesk to tell them that it’s a problem with the server, and the server team calls the help desk to tell them it’s the application. If you’re lucky, someone along that chain will know how to fix the problem, but even if you are lucky, it’s still a lot of wasted time and energy.

This line from Manish Chacko’s article, “God Help the Help Desk” sums it up:


Imagine a man walking into a hospital, saying that he doesn't feel good, and doctors around the country are immediately called in, starting with the cardiologist, who rules out heart trouble. The man is next wheeled to a podiatrist, who rules out any problems with his feet. He's then wheeled to a gynecologist (But I'm a man... Ma'am, I'm a doctor. I think I should make that determination - and only after the tests come back.) If your diagnostic process is trial by error, you're not, technically, diagnosing.


This is why Dr. Jim Metzler recommended time and time again that application development and network operations merge into a single “application delivery” team. The primary job of IT is to deliver an application. Focusing on the performance your single group misses the point – it’s how the applications perform that is most important, and hardware, software, and networking are all part of that performance equation.


Network Management Archives

Cisco’s WAAS and the Olympics


I can’t believe I missed this the first time around.

I was so focused on how the online Olympic video was getting through the last mile, that I completely forgot to ask: How the heck are they getting it from Beijing to the U.S.?

Douglas Gourlay at Cisco has been blogging about how NBC’s been using Cisco’s Wide Area Application Services (WAAS) for WAN optimization, so that NBC’s video editors can use three 155Mbps OC-3 pipes, combined and load-balanced (with, of course, Cisco gear) to get the files directly from Beijing. While I’m not 100% sure on “as if they were stored locally,” holds true, it’s clear that WAAS is capable of some amazing stuff – we know because NetQoS has SuperAgent integration on WAAS devices and ACE load balancers. We track stuff like that all the time.


“This reduces operating costs of housing, air travel, transportation, and food. Avoiding 800 airplane trips also supports NBC’s green initiatives for the Olympic Games.”


It also probably makes the video editors a bit grumpy that they didn’t get to go to Beijing.

What I’m curious about is what will happen after the Olympics. Just as Olympic stadiums still stand – and are used – in every host city, I’m wondering if the infrastructure that NBC has to Beijing to deliver high definition video will remain after the Olympics. As China starts to become a new superpower, more news and information is bound to come from Beijing, after all.

And if this can be done for one series of events in one major city, is it that far off from having video-heavy WANs in every city to cover every major event?


Network Management Archives

Why the Olympics stay online – because fewer people than you think are watching.


While we’ve talked quite a bit about what impact the Olympics may have on an enterprise network’s performance, we haven’t talked much about the performance of the NBC site hosting the live streaming of the Olympics. 

According to Jason Perlow at ZDNet, Limelight networks (which hosts the streaming videos) deployed the videos by going to the public internet by hosting the content more locally – at the ISP.  That means you’re viewing the Olympics through your ISP’s internal network, and the broader internet doesn’t even enter into the connection. 

This is smart thinking, it appears to be working, and by all measures this should be applauded.  Perhaps even duplicated – if you know that multiple employees will download the same content, local hosting on the LAN is preferable to duplicate download streams tying up the more expensive, slower WAN lines.

From the enterprise end of the equation, the fact that Limelight is delivering Olympics video more effectively just means that IT managers cannot count on their servers going down from being unable to handle the demand – IT managers still need to monitor their own networks for performance problems when a big event like the Olympics come up. 

However, it would be wrong to assume that Limelight’s strategy is the only reason why Olympic live-streaming hasn’t slowed to a trickle.

First of all, the site blocks 95.44% of visitors from accessing the content – because it limits the content only to those in the United States.  That’s a lot of people.

Secondly, the site requires Microsoft Silverlight. Most people don’t have Silverlight installed.  Some can’t even install it on their systems.  And there are certainly going to be a quite a few people who just didn’t think installing Silverlight was worth the bother to watch five minutes of Olympic footage they may be mildly interested in. 

And finally – none of the really popular sports are being streamed.  Gymnastics, Women’s Beach Volleyball, Swimming (with the exception of synchronized) and most of the track and field events aren’t available live. So you’re left with judo, fencing, and the decathlon.

So while it is a true technological wonder that the lights have stayed on and the site performs admirably – it is important to recognize that Limelight has not found a magic bullet to deal with extremely high internet video demand. 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17