Network Performance Archives

1 = Alive, 0 = Dead. (or: Take 10 aspirin and call me in the morning.)


Shamus McGillicuddy, not to be confused with the McGillicuddy Serious Party of New Zealand, recently wrote at Searchnetworking.com about Don Lester, a senior network engineer with Wenatchee Valley Medical Center in Wenatchee, Washington.

Lester used the application response time and network traffic analysis modules of the NetQoS Performance Center to diagnose problems with performance of a medical records application, which “affects patient care because doctors won’t be able to do things they might normally want to do to help patients.”

The article shows our products at their best, and it would be a bit shameless of us to quote the bits we really like.

Shameless and fun!

“I popped up Reporter-Analyzer first because it's one of the quickest ways for me to look at what's going to a location," he said. "I was able to see that there was a whole bunch of traffic going to one of the locations from the workstation patching server."

Through some quick investigation, Lester was able to learn that the medical center's PC technicians were supposed to push out a patch in the middle of the night using their ability to turn on machines remotely for update. But something had gone wrong, and the PCs had never been turned on. Instead, the patching server waited until morning, when users started turning on their computers. The computers started pulling patches, which slowed down the WAN link at the remote location.

The problem at the second location was completely unrelated.

"There were no signs of anything," Lester said. "The link didn't have any significant traffic at all."

He pulled up another NetQoS tool -- SuperAgent, which analyzes TCP transaction -- and saw a high level of retransmission delay occurring at the second WAN link.

"So there was a problem with that circuit," he said. "We had to work with a third party who managed the circuit because it's not something we have a lot of eyes into. We told them to fix it. And we were able to use the same tool to tell when, in fact, they had fixed it and if it was as good as it was before the malfunction…

It kinda gives you a warm, squishy feeling in your heart, doesn’t it?

…As well as maintaining critical application performance, Lester's NetQoS tools control network costs.

"It isn't unusual for us to be negotiating a [WAN] contract based on a tier of service that is defined by the average amount of bandwidth consumed over a given period of time," he said. "There have been times when the vendor in question will inadvertently overstate our usage. When that happens, we will produce usage graphs over whatever time period is appropriate, detailing our actual usage -- and ultimately reclassify our tier level and negotiate a lower price."

Lester said a more common issue that comes up is dealing with application vendors. Often, when there is a performance problem with an application, the vendors will take a number of standard corrective actions to try to solve the problem. If none of these actions works, the application vendor will often blame the network and claim that the medical center needs more bandwidth.

"We are able to tell rather easily using NetQoS tools whether or not we really have a bandwidth issue and can then share that information in a concise graphical format," Lester said. "This not only saves us the cost of unnecessarily upgrading circuits, but it also results in better service to the end user since this re-engages the vendor. The troubleshooting escalates, and the root cause is isolated and repaired."

I think that the article illustrates one of the reasons that NetQoS focuses on performance rather than fault – that is, performance data is more valuable than fault data. You could use a quick and dirty program to check on fault and availability, but for more difficult performance problems, you need more sophisticated monitoring tools.

(I just got this idea for a comedy sketch in which you have a doctor that only checks for availability – all patients are either “alive” or “dead.”)

We contacted Don Lester for his thoughts. He said that:

“I have a lot more than two applications I worry about, but there are two that are more important than all the others. All in all though, it does detail what everyone should hope to have in their environment. Simply the ability to troubleshoot or analyze without having to guess or act on instinct or hunches. It is a lot easier to do this kind of work with solid, factual information than to sit around pinging everything and hoping you stumble onto a smoking gun.”


Network Performance Archives

Unified Communication and the Bouncing Grey Lady


We just announced that NetQoS Unified Communication Monitor works with Microsoft Office Communications Server 2007 Release 2 [OCS 07 R2] this morning, and while it’s easy to get into the small details of how unified communications applications place great demands on the network, and how to handle those demands, I found myself pausing for a moment.

Where, exactly, are we going with this?

And by “we,” I don’t just mean NetQoS as a company but I mean – Us. The big Us. The human condition.

That is, unified communications applications do place more demands on the network than any other type of data that’s come before it. So, why then, do we even do it in the first place?

It’s because treating voice and video as data to be sent over the network allows us to do more with the communication-as-data than we could with the analog alternatives, even if this makes the network as a whole slightly less effective due to congestion – or if you have good performance monitoring information perhaps no less effective, but perhaps more complex. (We try to simplify the complexity as best we can, by using metrics directly from OCS 07 R2, but the necessities of a mixed communications and data network are simply more complex than the needs of a pure data network alone.)

Everything is becoming binary data, and this process is not likely to stop. To those of earlier generations, the New York Times is a newspaper; to many of the young, the New York Times is a news Web page with video and audio content; text and images being one of many offerings possible. A back-of-the-envelope calculation shows that it would be cheaper, over the course of a year’s subscription, to send every New York Times subscriber a free Kindle E-book reader – at retail prices, no less – and send the newspaper to them digitally, than it costs to print out and deliver all those physical papers to the subscribers.

It doesn’t take long to realize three things: Technology is getting cheaper, paper and distribution is getting more expensive, and the market of people who read the New York Times but will only do so if they have a physical piece of paper are dying out. The Grey Lady will become data, or there will be no Grey Lady.

Every advantage of the physical paper - portability, permanency, and simplicity, is being lost as technology becomes more portable, more permanent, and simpler. Our standards for digital technology to replace the more traditional equivalent are relatively low. Assuming it takes two and a half seconds to locate an article by flipping through the pages, any system that can serve up a Web page in less than 2500ms is an improvement. Even the sheer scale involved – that you would measure the time it would take to find an article in milliseconds rather than seconds – implies an entire quantum leap from the old way of doing things.

Those of us who work closely with the Web – bloggers, Web designers, media professionals – are aware of CSS, which removes content from layout, and RSS, which removes content from context. How far can we be from a society in which all content is completely removed from any sort of context or layout? A society where everything is abstracted? Where you could download the model of a basketball, and print it out on a 3D printer. Or even, if you wish, have the New York Times printed daily on a basketball, if you so chose…

But in that world, where everything is data, network performance suddenly becomes one of the most important things in the world. The bottlenecks once caused by the unfortunate limitations of pure physics suddenly give way to a single bottleneck – that of network performance.

Digitization is an awesome and powerful force… and while it has been mostly beneficial, I think that too often we do not recognize the power of this inexorable tide – this benevolent but gargantuan inevitability.

I don’t know if I’m ready for that world. I’m not sure I want my news to bounce.


Network Performance Archives

XP, Virtually


An interesting post over at Slashdot pointed me over to a Network World story by Mitchell Ashley about how Windows 7 wouldn’t be a compelling upgrade from Windows XP. The interesting aspect is that Ashley goes on to suggest that perhaps the idea of OSes might become irrelevant if you move to thin clients connected to virtualized computers over the LAN.


The future Simon paints is one where these personal and business computing environments are virtualized onto the same computer, rather than intermingled as they are today. Businesses will deliver virtualized full OS plus apps, and stand-alone virtualized apps, to computers that users own. This maintains the security of corporate data and applications, and allows the business to viably deliver a computing environment they manage on computers they don't. Obviously this vision is in line where Citrix is going with XenServer, XenApp and their newly announced Project Independence, and given my own views about desktop and application virtualization I can see merit in a lot of Citrix's vision.

While we haven't seen much of this yet, I believe it would be wise for Microsoft to continue to improve Windows 7 as an easily virtualized OS and a platform for delivering virtualized applications. Microsoft has partially move Live Essential apps into the cloud. As they move Office and other apps online, the OS becomes thinner and thinner, less bloated with applications that entangle themselves into the registry and Windows folder of Windows 7.


You know what, I might be wrong on this – but I don’t see it happening.

This is not to say that thin clients and virtual desktops don’t have their place, but the problem with virtualizing the desktop is that offloading processing to a datacenter means, necessarily, more traffic on the LAN. Individual applications may be run remotely, sure, but operating systems – or specifically the graphical user interfaces of those operating systems – carry a lot of overheard. (If we were all using command line interfaces, the overhead would, of course, not be nearly as great.)

Additionally, companies have been moving more towards having consolidated servers connected to the end-user via the WAN – by bringing the servers closer to the datacenter, you’re also, in effect, moving the users further away. You can have virtualized desktops saturating your LAN, or virtualized servers saturating a WAN, but it would be extremely unlikely that you could have both on the same WAN.

What seems more likely is the use of virtual servers to serve up specific applications – applications that can be optimized to reduce overhead on the network. I just don’t see that happening with XP, nor Windows 7 – perhaps only the next generation – the Microsoft Cloud OS, perhaps – might be light enough to handle virtualized desktops. Then again, a computer’s what, $300 from Dell nowadays? And are you saving that much if you have to get a thin-client appliance for each user instead of buying a general purpose PC?

There have been some worthwhile attempts to do the WAN equivalent of having a cake and eating it too - a number of WAN Optimization vendors are putting in Windows services on the WAN Optimization blades themselves.  That is, one of the ways they optimize the WAN is to keep traffic off of it – and one of the ways to keep traffic off the WAN is to take care of Windows services on the branch LAN, before it even reaches the WAN.  In a way, it’s sort of the opposite of server consolidation – but since you have to run the blade for WAN optimization purposes anyway, you’re not adding additional hardware or sucking up much more power. 

This is pretty much speculation at this point – but speculation is fun, isn’t it? I’d love to hear your comments on this.


Network Performance Archives

Things getting jittery in Barcelona.


By Patrick Ancipink

So far, so good at Cisco Networkers in Barcelona this week. Despite spirits being a little tempered by the worldwide financial crisis, attendance seems to be quite good and we are noticing a radical shift in what enterprises and service providers are trying to accomplish with their networks.

One topic that’s a bit been very popular with this audience is the concern about VoIP and video quality on converged networks.  So you could say network pros are more jittery here about latency-sensitive UPD applications than they are about the macroeconomic situation.

Heh…

Moving on...

The primary concern about video has shifted from quashing recreational YouTube viewers to ensuring the network can carry video. We spoke with several different companies (with headquarters from Sweden to Qatar) that have requirements to stream video across the WAN as part of their mission. I was involved in several conversations and overheard several more where the main topic was using application-aware network management tools and techniques like IP SLA and how to determine, validate and assure QoS.

I have to say I was pleasantly surprised about the awareness of the Cisco NAM support we announced the day before yesterday. It seems like before the news even hit the wire we had several attendees asking how they could use their NAMs alongside NetQoS. There are some monster implementations of NAM out there so everybody’s happy when you can leverage the existing investment for better application response time monitoring and performance troubleshooting.

Tapas and Rioja are nothing to complain about either.


Network Performance Archives

Will the network be the bottleneck for multi-processor apps?


According to Brad Reed at Network World, Gartner has published a list of the “10 most important strategic technologies of 2009.”

The list includes green IT, mashups, Web-oriented architecture, and unified communications – repeats from last year. It also includes cloud computing, “beyond-blades servers,” business intelligence systems and heterogeneous systems, which are, according to Network World, systems that mix processor types in a single system under one OS in order to incorporate the functions of several different appliances into one server system.

Careful counters will notice that that’s two short. The full list and report is only available to those who purchase it from Gartner’s Web site. It’s like the punchline to the old joke, “How do you hold a technology blogger in suspense?”

But most of the other eight rely on an unstated assumption of a well performing network. Cloud computing and mashups may be getting information from a large number of geographically diverse servers and databases to get the information needed – network connections between those servers need to be monitored and maintained, whether it’s an internal IT department that’s responsible for the infrastructure or the service provider is responsible for it.

Unified communications, of course, requires deliberate and accurate configuration so that voice, video, and data can coexist on the same network. Green IT typically means some sort of consolidation – and either data center or server consolidation requires well functioning networks, as the users move further away from the application’s data.

But its heterogeneous computing that has my imagination.

Since we started establishing universal standards for communication (i.e., UDP, TCP, IP, etc.) the idea has always been that we can be more effective at performing tasks by networking computers together than any one particular computer can be. With multithreading, we can split an application up among multiple processors, with virtualization, we can host an application on any hardware; with cloud computing, we can remove the user interface from the computer that actually does the processing.

Isn’t this the next step? Programs that can be run from anywhere, and processed on any computer – or all computers – on the network, without regard to what type of hardware are in the computers or where the computers are physically located?

Well, what that would mean is that for processing-intensive tasks, such as video rendering, scientific modeling, data processing, etc., the bottleneck will move from the CPU to the network. That means that in such an environment, network performance would be the only IT performance that really matters.

Kinda gives me chills up my spine. Maybe it’s a future that never comes to pass – but I sure hope it does.


Network Performance Archives

TCP Slow Start - Whiteboard Series


Technically, it’s a powerpoint presentation, not a whiteboard sketch, but here, Robert Webb, Principal Network Consultant at NetQoS, brings a short sample of the type of training he does for the NetAnalyst program – in this case discussing TCP Slow Start.

The embedded version is low quality – you can head to the appropriate YouTube page for a high definition version of the video.


Network Performance Archives

Whiteboard Series: How To Manage QoS In Your Environment, Part 1 of 3


Ben Erwin starts off a three-part Whiteboard Series installment on how to manage QoS in your environment. In this first episode, “Leveraging Cisco Tools: Using CBQoS & NetFlow to Manage QoS Policies in Your Environment” Ben goes from the Whiteboard to actual CBQoS monitoring in the NetQoS Performance Center, illustrating some of the problems that can occur with QoS, and what steps to take to resolve them.

Below you’ll find the embedded video, now in widescreen YouTube HD. (Yes, we are aware of the irony of telling you how to watch out for things like, say, excessive YouTube traffic, with an excessively large YouTube video.) A low definition version can be found here.


Network Performance Archives

False security can lead to real performance problems


The Obama-Biden transition team promised last Monday, Dec. 8th, that they would provide most policy documents from meetings with outside groups – i.e., lobbyists – would be posted on the Change.gov Web site.

By Wednesday, Dec. 10th, this policy already saw some interesting results. David Kravets over at Wired’s Threat Level blog pointed out that the site has already published a paper detailing the requests of the MPAA’s lobbying organization, which include requesting filtering information from technology companies.

We’re not against the MPAA using the means available to protect their intellectual property concerns, but there are two problems with filtering: false positives, and performance degradation.

False positives are already a major problem with the content industry – back in 2003, the RIAA sent a cease and desist letter to Penn State University – they had confused work from Prof. Peter Usher at the Department of Astronomy and Astrophysics with that of Usher, the R&B pop singer.

This is also a recent problem; in October of 2007, Google launched a copyright filter for the YouTube Web site. It, too, has many false positives. For example, a fan production of the reality TV show “The Mole” was removed, presumably, because it was confused with the real thing by the filter. Judging from the production values of the fan-film, it’s very unlikely that a human censor would confuse the two.

(Fun fact I learned while researching this article: Andy Warhol made a “Batman” fan film back in 1964.)

Videos removed for copyright complaint – legitimately or not - have been catalogued (but not archived) at YouTomb, a project from MIT Free Culture.

But YouTube is one, privately operated Web site. Filtering the content as it is uploaded merely affects the time to publish, not the time to distribute. Additionally, videos can also be hosted on competing sites.

If one were to try to use filtering on the Internet as a whole, as the MPAA seems to be lobbying, it is likely that the results would be similar to the results of the tests run by the Australian government – where even the best of filters degraded network performance, and the better the filter was at avoiding false positives and false negatives, the more performance degraded. Even the best filter wasn’t very effective.

The lesson to learn from all of this is that too often, measures taken in the name of “computer security” – even if it’s to instill a false sense of security – can have serious impacts on network performance. For this reason, those in the enterprise responsible for making sure that networks remain secure and those responsible for making sure that applications remain responsive absolutely need to coordinate efforts.


Network Performance Archives

BitTorrent over UDP: End of the World or just End of the Beginning?


A column in The Register claims, amongst much wailing and gnashing of teeth, that implementation of BitTorrent-over-UDP (dubbed uTP) in the new alpha version of uTorrent, one of the official BitTorrent client applications, will end the Internet as we know it and completely congest the network.  The title is “Bittorrent declares war on VoIP, gamers.”

Considering the fact that BitTorrent, Inc., has, if anything, always gone out of it’s way to avoid declaring war on anybody, this seemed to me a little bit odd.

I’ll admit that it kind of worried me – TCP has traffic congestion management built into the protocol, UDP does not.  When UDP and TCP exist on the same network (for example, when rolling out VoIP on a corporate network), QoS policies are needed to keep UDP from taking up all the bandwidth while TCP meekly  throttles back.   Jim McQuaid has a Whiteboard series video up about it, and called it “Nice Guys Finish Last.”

The reason that UDP is popular is that it’s a lightweight protocol that doesn’t do much handshaking.  It sends the data “that-a-way” and doesn’t particularly care if it makes it.  That makes it perfect for VoIP, gaming, and other Internet protocols where latency is more important than throughput.  TCP, with congestion control and packet confirmation built in, sacrifices latency for accuracy.  A dropped packet in a phone conversation isn’t much to worry about, but a half-second delay is extremely annoying.  On the other hand, a half-second delay in downloading a computer program isn’t much to worry about, but a dropped packet means that the program won’t run. 

So, as a general rule, TCP runs data apps, while UDP runs real-time apps.  Rudimentary QoS policies based on giving UDP packets higher priority may not be perfect, but they can be a good start to improving performance on simpler networks. 

My concern was that putting BitTorrent, a non-latency sensitive application – on UDP would result in it receiving higher priority traffic.  But after speaking to Simon Morris, the Vice President of Product Management at BitTorrent, Inc., I was assured that this wasn’t the case. Morris explained:


[Editor's Note: Some words got a little garbled in the phone conversation I had with Simon Morris, and he sent me an e-mail with clarifications. I've made corrections via strikethroughs. --ed.]


“BitTorrent obviously needs to be a accuracy-sensitive protocol but we believe it needs to also be …MORE sensitive to latency than TCP, not less sensitive.

This is to say that with uTP, we have taken UDP, implemented a layer of reliability and spent a great deal of time implementing a congestion control mechanism that is better than the one used in TCP (better = faster to detect issues, faster to react).

It’s not QoS, but rather a congestion management mechanism implemented at the end-user’s layer 7 [Application]. This will stop uTP from eating up traffic bandwidth reserved for latency sensitive apps like VoIP and gaming. What’s more, the congestion management mechanism isn’t something we implemented as an afterthought – it’s the whole point of uTP.


In short, it seems that rather than using UDP as a way of getting around TCP’s traffic congestion features, the new protocol is rebuilding better traffic congestion features at layer 7, using the lightweight UDP protocol as a simple base.  In short, to mangle a metaphor, they’re re-inventing a better wheel. 


If uTP does congestion control for BitTorrent as an application, this could provide an answer to BitTorrent critics, and ISPs who claim that BT throttling is necessary because it’s “eating up bandwidth.” I asked how uTP implemented congestion control at Layer 7 [Application] rather than Layer 4 [Transport].


Morris: “So basically, what TCP does is that it stops detects congestion only when it detects packet loss and then it throttles back.  Because we control both ends of the transfer, we can actually measure the single-trip time between when the packet is sent and a packet arrives.  (Not round-trip-times but single-trip-times…)  We have essentially, an ability to monitor single trips over the internet across millions and millions of terminals, and we built an algorithm around that to do things like eliminate the discrepancy between the stock clocksettings on different terminals - to identify where there is actual - very fine grain, down to milliseconds - changes in the speed of which packets are arriving….” 

“The way that prioritization policies are set [by network operators] is extremely varied, and so, it's possible that [uTP-based] BitTorrent traffic will get a higher prioritization, but only in cases where it's not causing any type of congestion at all… [uTP] will never trample over UDP based latency sensitive traffic, nor TCP-based traffic.  [UDP is] designed to throttle back if there's anything else on the line. 

“I mean, it's essentially designed to be - a term that we use internally is - a "scavenger protocol."  It scavanges and uses bandwidth that is not being used by other applications at all, and it's designed to throttle back very very quickly in case there is any type of congestion on the line.

“Unfortunately the way that TCP works is - just profoundly broken as a method of control congestion on the Internet.  Especially when there are applications out there that are designed to get the most out of network bandwidth - like BitTorrent.  People have tried to make TCP better, but the problem is that it's such a huge implementation task to make it happen, because you need to upgrade all of the Web servers and all of the terminals.  Now, the insight here is that in most manycases, we have control of both ends of the [communication].  So we can actually take the right steps in the direction of solving this problem. 


Those concerned about BitTorrent (either classic or newfangled) traffic on their networks might want to check out a solution designed to monitor and track the types of traffic going on the network, including information about what applications are transmitting and receiving what amounts of traffic. 


Network Performance Archives

Black Friday, Cyber Monday.


Boy, what a difference a year makes.

“Black Friday” usually refers to the day after Thanksgiving, when retailers, both online and offline, started getting rushes of orders in order to fulfill Christmas demand. But unless you’re a Wall Street firm, for whom Christmas has come early, you’re probably cutting back on expenditures this holiday season.

Now, it’s probably referring to any Friday that you re-read your quarterly 401(k) or 529 statement.

Still, whether or not people –spend- more online this holiday season, they’ll probably be making a similar number of transactions – that is, Hershey bars instead of Godiva chocolate, Playstation 2s instead of Wiis, Go-Bots instead of Transformers…

And with every dollar counting, the one thing that retailers and suppliers can’t afford on Black Friday and Cyber Monday this year are performance slowdowns like the ones that hit Costco, Victoria’s Secret, Lowe's, and Macy's last year.

Additionally, even if you aren’t a retailer, Cyber Monday typically sends some Web traffic spikes over company networks as employees use the high speed connections work provides in order to make their purchases.

In either case, you can analyze network traffic flows to identify what traffic is mission critical, what is mission irrelevant, and what is mission impossible. After quantifying the impact of certain types of traffic on network performance, you can then implement quality of service policies to ensure that business critical apps have priority access to network resources.

We know that on Friday and next Monday, there will be a higher than normal volume of Internet traffic.  The trick is finding out how much of an impact it will have and preventing it from impacting application performance.



1 2 3 4 5 6 7 8 9 10