Application Management Archives

Three Things You Can Do Today To Improve Network Performance Without Spending a Dime


For months, we’ve been waiting to see what the fallout would be from the sub-prime mortgage crisis.

Apparently, the results are not unlike a hefty bag filled with chili con carne, dropped from the top of a skyscraper. Only instead of a hefty bag, it’s the U.S. economy.

So, as Wall Street explodes like an explosive so explosive it could explode and create a massive explosion, technology turnaround times will probably extend a couple more years as CIOs try to figure out how to use existing tools to solve network management problems and improve performance. How do you do that?

Luckily, there are ways to do that – Cisco routers and switches already have “application-aware” technologies and don’t require any additional purchases – including IP Service Level Agreement (IP SLA), Class Based Quality of Service (CBQoS), and Network Based Application Recognition (NBAR).

Managing Application Response Times with Cisco IP SLA

Now, measuring real application transactions is the most accurate method for measuring response times. But, failing that, you can use Cisco IP SLA to create synthetic transactions. This is not only useful when on an IT budget crunch but can also provide useful data when assessing whether or not to roll out a new application, or measuring a service provider’s SLA edge-to-edge.

IP SLA operates by sending synthetic transactions between two network devices or between a network device and a server. It can be configured to send different types of synthetic transactions based on port, packet size, type of service, and even more advanced characteristics, as is the case with Voice over Internet Protocol (VoIP) tests. When it gets a response, the sender then calculates the response-time metrics appropriate for the test type, and then repeats multiple times.

Some SNMP polling products can collect data automatically, store it in a database, display the results in a GUI, and provide analytical function beyond data collection, such as calculating baselines, displaying trends, and triggering threshold alerts based on collected IP SLA data. There’s also the possibility of simply getting the information from the CLI, but extracting the IP SLA response-time metrics and copying them to a spreadsheet can be difficult and tedious. However, for the extremely budget-conscious, it can be done.

Deploying Quality of Service with Cisco CBQoS

QoS is a blanket term for network policies and practices that help to manage different types of data traffic that share network links. Effectively, QoS determines how different types of traffic, with different priorities, are handled whenever tradeoffs that are likely to impede performance must be made.

Now, within any enterprise, the end-user experience with certain applications will always be more critical than it is with others. Strategies to avoid (or at least manage) congestion could include dropping traffic, adjusting application responses, and building packet queues. CBQoS is one way to do this – and comes with the CBQoS Management Information Base (MIB) to collect statistics about the traffic traversing the router and reports how the QoS configuration is being applied.

Here, an SNMP polling product with application-aware capabilities can get information on input and output QoS class map utilization, drop percentage, and packet counts. It can also get information on pre-versus-post QoS traffic volume, rate, and packet count. It can also point out traffic marked in conformance, in excess, and in violation of defined policies.

Without CBQoS, network managers don’t have a whole lot of evidence to verify that their QoS settings are actually improving network performance – in fact, they may even be inadvertently harming performance. CBQoS prevents network managers from flying blind with QoS deployments. And, like IP SLA, it’s built into Cisco IOS.

Gaining a New Level of Visibility with Cisco NBAR

From within the network device operating system, Cisco NBAR can inspect packets traversing the device and identify the corresponding application – for example, TCP traffic running on port 80 could be labeled as Google, SAP, SharePoint, SalesForce, etc. NBAR can also provide utilization, volume, and rate metrics on a per-application basis relative to the network circuit carrying the traffic.

It’s similar to NetFlow, but NetFlow identifies protocol traffic mixes – not application-layer visibility. NBAR identifies by application – which is important in setting proper QoS policies. And because NBAR is part of Cisco’s IOS, and the data can be collected with an application-aware SNMP poller (which many of you already have), it can be a more cost-effective solution than application discovery hardware.


Application Management Archives

A few of a many, or many of a few?


Ken Church, Albert Greenberg and James Hamilton of Microsoft recently put out a paper on “Delivering Embarrassingly Distributed Cloud Services.”[PDF] Like most papers of this type, it’s a dry read, but informative. It looks at the tradeoff between mega-data center size and micro-data center diversity from the both the viewpoints of total cost of ownership and of performance.

The most important line in the entire report, of course, is “The trade-offs vary by application.” However, they make the argument that applications with little need for server-to-server communications will show benefits in cost, scale, reliability and performance through geo-diversification – in other words, lots of little datacenters as opposed to one big datacenter.

This seems to fly in the face of the trend in data consolidation, but there is a point to it: For any data center, there needs to be redundancy, but in a centralized data center, there needs to be more redundancy than having multiple small data centers. As Church, Greenberg, and Hamilton put it, “the more geo-diversity, the better. N+1 redundancy becomes more attractive for large N.”

The part that really interested me, though, was the networking section. (Section 3, in case you want to skip right to it.) Church, Greenberg, and Hamilton point out that in a large, centralized datacenter, you can have end-to-end control and assure a particular level of performance through supported service level agreements. On the other hand, they argue:


“[with distributed data centers] the cloud service provider has ceded control of quality to its Internet access providers, and so cannot support (or even fully monitor) SLAs on flows that cross out multiple provider networks, as the bulk of the traffic will do. However, by artfully exploiting the diversity in choice of network providers and using performance sensitive global load balancing techniques, performance may not appreciably suffer. Moreover, by exploiting geo-diversity in design, there may be attendant gains in reducing latency…”



“Many large analysis applications are best run centrally in mega data centers… Interactive applications are best run near users… [they] can be delivered with better QoS (e.g., smaller TCP round trip times…) via micro data centers.”


The argument’s sound, especially when you consider that interactive applications are probably the most latency sensitive because they need to make multiple trips to and from the client and server with every interaction.

But reducing the propagation delay (or distance delay) is merely one part of the performance equation. By ceding control over router performance and transmission, you have no way of diagnosing network round trip time problems if they occur, and wouldn’t be able to fix them – short of the messy step of changing service providers – even if you did. If something goes wrong, it could negate the speed increases by diversifying servers, so moving to this model more of a gamble than a guarantee of improvement. Granted, it’s a gamble that might make sense for some apps and some organizations – some apps, apparently, can get away with less than 100% uptime.


Application Management Archives

Whiteboard Series - Ben Erwin talks about Passive Monitoring vs. Active Monitoring



 

Behind the cut, we have a higher-quality version of this movie through Blip.TV.

Continue reading "Whiteboard Series - Ben Erwin talks about Passive Monitoring vs. Active Monitoring" »


Application Management Archives

Is Web 2.0 bumming a ride?


hinkle.jpgGuest Post
by Josh Hinkle
Manager, Network Management & Security,
American Heart Association

As the youngest of three siblings I recall my brother hating to give my sister a ride to and from school. Even worse, he despised having her butt-in when he was hanging out with his friends.  After all, he was cool and his little sister was well…his little sister.

For my parents this was a great solution because they didn’t have to be the full-time taxi service anymore. Older siblings despise this role as chauffer because their younger siblings end up riding the coattails of older siblings to after school social activities.

At first glance I felt like Web 2.0 was that younger sibling tagging along on the years of hard work by global IT – built on an existing infrastructure while showing the ability to become popular seemingly overnight.

I spent the last 12 years in Information Technology with an emphasis in network management, eight of those years at the American Heart Association.  Most recently, I’ve served as Manager of Network Management & Security at the AHA the last two years. Like most corporate network managers I have a vested interest in enterprise application delivery. Our business, like many others, depends on enterprise applications being access by thousands of staff in hundreds of locations. At times our staff has been challenged with latency and remote connectivity. It was then we turned to NetQoS to measure, alert, report and trend our network traffic in an effort to take operations to the next level. As those processes recently began to mature our attention shifted to the free-riding sibling Web 2.0.

While Web 1.0 paved the way for networking billions of people, Web 2.0 is stealing the thunder. In a matter of months everyone has seemed to get LinkedIn, gotten poked on Facebook or Twittered someone. Web 2.0 is now carpooling with enterprise traffic across the same infrastructure competing for the same popularity of bandwidth.

AHA revolves around providing information to reduce cardiovascular disease and stroke, and Web 2.0 has increased the demand on AHA’s infrastructure.  It provides a low investment to a large audience.  Certainly, Web 2.0 has the potential inform and collaborate with millions, but the background costs of infrastructure and man hours concern me.

Web 2.0 apps are not representative of the traditional enterprise applications.  First, they exist outside the bounds of the enterprise infrastructure, yet we manage them on the same WAN. Second, the interactive nature of Web 2.0 apps require additional bandwidth.  And third, Web 2.0 applications are not unlike a “human machine” that grows with every click.

Right now, the American Heart Association is engaging  in a Social Media Evaluation project to determine where and how we can further leverage this new platform;  currently we are leveraging an application in Facebook to reach new audiences interested in the American Heart Association's Start! Walking Movement. The American Heart Association’s “You’re the Cure” Network has a Facebook site coordinates volunteer efforts to inform public officials.  Our TCS (Technology and Customers Strategy) Department started an AHA Technology Blog to discuss the technology we use and the organizational accomplishments achieved using technology.  Most recently, we posted a story about how a customer Googled symptoms he was having, which led him to our site on heart attacks.  His doctor told us that he called 911 immediately and survived because of it.

What I’m currently proposing to senior management is for AHA is to manage our network as if it were two separate networks – one network for our two very different needs.  The first network would use MPLS and provide managed bandwidth prioritizing queues for enterprise applications, and the second would offload all Internet bound traffic from the first. 

Not too long ago this type of infrastructure investment would appear to be unjustifiable, but given new trends in Web 2.0 as a platform and evolving cost structures it may very well be a business driving reality. We need a network as flexible and adaptive as the business demands.

This will increase our costs for transport but we are now able to guarantee Enterprise traffic on one network and adapt to evolving trends like Web 2.0, video conferencing, etc. on the other.  Even with the added costs, by negotiating more volume into our transport cost contracts, we can lower our per MB costs. 

Not all of the changes are measured in the bottom line however.  Our applications should see great gains in performance, and our network will be fully redundant for each site as the MPLS will failover for Internet traffic, and vice-versa. 

I must admit, at first I considered Web 2.0 an (admittedly exciting) nuisance in my network and a menace to my plan for enterprise application delivery.  But recently I created my own blog, linked my social network sites, posted you tube videos and started speaking a second language of Web 2.0 terminology.  I matured in my thinking as a network manager and now I embraced the qualities of web 2.0 much like my siblings and I matured in our appreciation for each other.

Web 2.0 may be the sibling that is bumming a ride but it has its qualities to appreciate; it may even mature into a traditional enterprise operations model.  Fasten your seatbelt and make the most of the ride.


Application Management Archives

Latency and Jitter


By Kevin Davis
Adapted from “Sources of Latency” Whitepaper

When network users call the Help Desk to report poor application performance, you don’t typically hear things like “The router’s CPU is too busy!,” “The network utilization is above 70%!,” or “The carrier path has failed-over to a sub-optimal path.” Instead, what you’re likely to hear is “The network is slow” or “The calls on my IP phone sound terrible.”

Complaints that end-users lodge are nearly always based their quality of experience using the application. And their quality of experience is almost always reliant on time.

Anytime a significant delay occurs in the delivery of network data, application performance suffers. Depending on the type of application and how it works, variances in network delay can have a severe impact on application performance thereby degrading end-user’s experiences.

Two important measurements of time intervals in network transmission systems are referred to as “latency” and “jitter”. Understanding latency and jitter sources and how their values vary in network architectures is critical to engineering application performance and optimizing information resources. For many regular readers, this will be old-hat, but we’ll go over it again.

Network latency is the amount of time it takes for a packet to be transmitted end-to-end across a network and is composed of five variables:


Network Latency = (Distance Delay) + (Serialization Delay) + (Queue Delay) + (Forwarding Delay) + (Protocol Delay)


Serialization Delay refers to the amount of time it takes for a network interface (such as a router’s interface or computer’s NIC) to perform bitwise transmission of a frame unto the outbound media, Forwarding Delay is the amount of time it takes a network device to process a frame/packet by performing a destination address lookup and forwarding the frame/packet to the outbound interface, and Protocol Delay is the amount of time that access or transmission algorithms may contribute to the delay of a network frame, and is typically introduced at the endpoints of the data transmission system.

Serialization delay, on a per-packet basis, becomes insignificant at data rates above 1.544 Mbits/s – or a T1. Forwarding delay is typically insignificant in modern routers and switches (when appropriately configured – significant delay can occur in misconfigured routers.) And Protocol delay typically occurs at the access layer or the end points. So the two major variables that have the most effect on network latency are Distance Delay and Queue Delay.

Distance Delay is simply the minimum amount of time that it takes the electrical signals that represent bits to travel down the physical wire. Optical cable sends bits at about ~5.5 µs/km, copper cable sends it at ~5.606 µs/km, and satellite sends bits at ~3.3 µs/km. (There are a few additional microseconds of delay from amplifying repeaters in optical cable, but compared to distance, the delay is negligible.)

Distance delay can have a significant impact on application performance for applications that require a large number of network round trips in order to complete a transaction – for example, custom transactional based applications, database queries, and VoIP, which begins do degrade when one-way end-to-end latency exceeds 200-220 milliseconds.

One of the biggest sources of end-user ire are database queries designed to run over a LAN ported to the WAN. For example if a user executes a SQL database query that requests 100 rows of a database table, one row at a time, over a link with a latency due to distance of 60 ms, it would take approximately 6 seconds (60 ms * 100 turns) to complete the transaction. The same query executed by a user on a LAN connected to the same database server would take less than 2-3 ms to be completed, as the latency due to distance across the LAN is insignificant.

Queue Delay is the amount of time a packet must spend in a network buffer waiting its turn to be transmitted. Network interfaces transmit one frame at a time, typically one bit at a time. As such, when two or more packets are forwarded to a network interface at the same time, or close to the same time – one packet is transmitted while the others are put in a queue on the interface buffer to await their turn at the interface. Packets that are put into the queue must wait until they can be transmitted, adding milliseconds of delay.

Increases in Queue Delay can be measured and detected by monitoring traffic along a given network path. Typically, most intermittent increases in latency above the baseline distance latency can be attributed to network congestion. (In order to reduce the possibility of excessive queue delay, application servers that are members of the same application architecture should be placed on the same Ethernet switch and on the same VLAN to ensure they do not have to compete for uplink bandwidth when problems like the one pictured above occur.)

Worse still, if the problem gets worse and packets wait in increasingly longer lines within the queue, the buffer may become full and the packets may be dropped. Packet drop, in turn, causes TCP connections to throttle back on the rate of transmission.

Those are some of the main causes of latency – but what about jitter?

Jitter is a term that refers to the variance in the arrival rate of packets from the same data flow, and abnormal jitter values can negatively impact real-time applications like VoIP and video. Jitter is typically created by three different mechanisms in a network: variance in Serialization Delays due to variance in packet sizes, variance in per-packet Queue Delay due to packet spacing from multiple sources at a common outbound interface, or packets taking different routes from source to destination – perhaps due to per-packet load sharing or routing issues.

The most effective way to deal with jitter is by using low-latency queuing for VoIP and video traffic on network interfaces with large serialization and/or queue delays. In addition, endpoints (such as IP phones) can use jitter buffers or playout delay buffers in order to deliver received packets at a constant rate to the end consumer. These buffers are typically 30-50 ms in depth, and thus they attempt to manage jitter values within these values on any single one-way path. While these buffers technically add 30-50ms in latency, they significantly reduce jitter. Since human beings don’t start to notice latency in VoIP or VideoIP applications till it hits about 200ms, if latency can be kept to under 150 milliseconds, then jitter can be significantly reduced using this method.


Application Management Archives

Podcast: Dr. Jim Metzler on the Next Generation NOC


In a few minutes, Jim Metzler of Ashton, Metzler, and Associates, will be delivering his keynote on the Next Generation NOC at NetQoS Symposium 2008 at Barton Creek Resort in Austin. Last week, we pre-recorded a podcast with Dr. Metzler regarding the speech he is about to give and what he means by a "next generation NOC."

He talks about the changing role of the NOC and moves in enterprises towards integrating what were once seperate stovepipe functions to focus on application delivery.

The podcast is below.


Application Management Archives

Podcast: Dr. Jim Metzler talks about Handbook of Application Delivery 2008 and NetQoS Symposium.


Today, in this podcast, we speak to Dr. Jim Metzler at Ashton, Metzler, and Associates regarding his handbook, "The Handbook of Application Delivery 2008" and his upcoming keynote speech a NetQoS Symposium 2008.



Application Management Archives

Symposium Preview: Kevin Davis on Time-based Troubleshooting.


Kevin Davis, a senior consultant at NetQoS, will be presenting a few training sessions at Symposium about SuperAgent, the end-to-end response time module of the NetQoS Performance Center. This will include a training session about how to use time-based network metrics in troubleshooting.  He talks about his upcoming training session below.

In the session, I’m going to be covering the importance of using a time-based metric in troubleshooting, because end-users complain foremost about time.  For example, they’ll say “the application is running slow,” or they believe “the network is slow.”  To users, everything is based on time, that’s what they’re complaining about.  And they’re correct.

It’s very new to many people to think of performance in “time” although that may seem counterintuitive - because most people are used to reading utilization graphs.  With utilization graphs, however, we don’t know if 70 or 80 or 90 percent utilization is necessarily impacting the user experience.  I mean, we buy networking equipment, routers, switches, firewalls, servers, and we want them to be highly – or efficiently - utilized.  Seeing high utilization could indicate a problem – or it could just indicate that you haven’t over-purchased.  So you can have a link at 90% utilization or a router at ninety percent CPU utilization but you won’t know if that’s impacting the end-user without a time based metric.

It’s time-based data that tells you how the users are being impacted.  Sure, the utilization data – the interface utilization, memory utilization, I/O utilization, can often tell what is doing the impact.  But the time base shows you the degree of the impact – the real-world effect on end-users.  With a time-based instrument, such as NetQoS SuperAgent, you can find out where the delay increase is occurring, and whether it’s based in the network, server, or application. 

In fact, you can take a look at time-based data and make a determination very quickly as to which entity is creating the performance issue – the beautiful thing about SuperAgent, in particular, is that it trends by time 24/7, so not only can you determine how your important business applications are being impacted today, but you can go back and look at recurring patterns in performance issues.  You can see if today is worse than yesterday or last week or last month.

In the session, I’ll also be going over how to architect the data center for performance.  Placement of servers that participate in inter-architectures is critical for the health and performance of the application and indeed the data center.  We also talk about how different protocols, for example, Microsoft’s TCP/IP stack, can impact application performance by enhancing or degrading it. 

It’s important for servers that are serving the same application.  For example, a front-end Web server and a back-end Oracle database really should be on the same switch on the same VLAN.  That way they receive optimum service from the network.  If they do leave the switch, they’ll have to contend with bandwidth going up and down the switch links, and they’ll be switched and routed multiple times. 

Based on measurements from customer environments and from our own laboratories, when two servers are on different switches they can have up to 18 milliseconds delay between them.  If we think of that in the terms of network engineers of one millisecond per 100 miles, what in effect we’re doing when we put two different servers on different switches, or two different VLANs on the same switch, we’re making it look like those servers are 1800 miles apart – like one server is in Los Angeles and the other is in Memphis. 


Application Management Archives

Cisco Beefs Up WAN and Application Acceleration Materials


patrickancipink.jpgby Patrick Ancipink
Director of Product Marketing, NetQoS

There’s been a lot of growth (and attendant hype) in technology areas like WAN optimization and application acceleration over the past few years, and for good reason. Anything that helps companies speed up and reduce the risk of strategic IT initiatives like consolidating data centers, turning up new branches or serving an increasingly mobile and scattered user community will be popular.

To help with cope with the increasing reliance on the WAN and keep latency in check, there are a dizzying array of vendors and products out there – but if you’re trying to determine precisely which techniques and technologies to implement for your specific needs, the array of vendors quickly goes from “dizzying” to “disorienting” and finally “nauseating.” 

Cisco’s been in this Tilt-a-Whirl™ of a market for a while (and NetQoS has been right there with them) and they’ve taken some big steps recently to provide a more holistic approach that centers on building an “application aware” network, rather than trying to highlight one type of implementation against another for a narrow set of capabilities.

NetQoS started working exclusively with Cisco closely to help customers evaluate, measure, and prove the effectiveness of WAN optimization and application acceleration deployments. As customers are moving from pilot phases into full production, the before/after measurements and comprehensive monitoring are critical to ensure customers are getting the benefits they intended and doing what they need to deliver application performance. 

To help get the word out, Cisco just launched a new section of their web site today that contains a wealth of information about, as they call it, “WAN and Application Optimization.” The downloadable presentation, Cisco WAN and Application Optimization Technical Overview Presentation, puts Cisco technologies (and complimentary ones, NetQoS included) into a useful context with a methodical approach and framework built around four steps: Profile and Baseline, Optimize, Evolve, and Operate. A whole Campbell’s Factory of Cisco alphabet soup technologies are included—WAAS, ACE, NBAR, Netflow, CBQoS, IP SLA, PfR—to show how they work in concert and what role they play in the bigger picture.

There’s also the Cisco WAN and Application Optimization Solution Guide , a very in-depth publication—like 227 pages deep—that is targeted for “technical personnel involved in the specification, design, and implementation of specific WAN and application optimization solutions.” We, here at NetQoS, are proud to have contributed several sections to book regarding the methodology and implementation of network performance monitoring for WAN optimization and application acceleration. 

(If you are looking for some lighter fare, the video on the site tells a nice story in about 6 minutes including an airshow, snowmobiles, windsurfers, and skydiving—interesting choices for demonstrating the criticality of serving video over the WAN.  Then again, some company somewhere has to make the recreational products, I suppose.)


Application Management Archives

I watch NBC on PCP. No, wait, I meant P2P!


Verizon and NBC are working on serving up TV shows to home computers. The problem is, high definition video, (and I've done some HD video work for the Web - shameless plug), takes a whole mess of bandwidth.

Now, the obvious solution for NBC would be to move to some sort of peer-to-peer distribution system, right? I mean, it works for Linux distros.

The problem is that a normal peer-to-peer connection doesn't distinguish between the cheap local links - that is, links on the same ISP, in roughly the same geographic area - from the expensive remote links. So while P2P provides a more cost effective solution, it doesn't provide the most cost-effective solution for the ISP.

A third party, Pando, has developed a P2P system for pre-authorized, pre-approved content, and has come up with a way to force peer to peer connections to look for local nodes first. This increase the efficiency of the system, lowers the cost, and generally increases the performance of the streaming/downloading video.

This is exactly the type of thing we talk about when we say that how the application is coded can have a huge impact on the application performance over the WAN. Sometimes instead of needing more bandwidth, you need to find a way to make the apps work more efficiently.

In this case, decentralized P2P systems developed after the destruction of Napster. Though they were much less likely to get shut down by the RIAA, they were also much less efficient. This dominated development of P2P applications for years. But for offering only pre-authorized content, a centralized system - especially one that takes advantage of the structure of the physical network, makes a certain bit of sense.

NBC will be offering Verizon customers their shows via Pando's P2P service - which they're calling P4P, later this year. The name is a logical outgrowth, P2P, or "peer to peer," versus P4P, or "peer for peer." P3P was disregarded because it sounded too much like PCP. And if a kid with a lisp goes around school saying: "I downloaded the latesth Methallica album on P3P" and a teacher hears: "I downloaded the latest Metallica album on PCP," well, that's just not going to be a story that ends well, now, is it?

There's only one problem with Pando's plan: Each ISP will have to give up information about its subscribers in order to participate - that is, the Pando platform requires knowing which nodes are "local" and which nodes are "remote" in order to optimize for the local connections:

For other ISPs to reap the benefits Verizon did in the test, they too would have to share information about their networks with file-sharing companies, and that they normally keep that information close to their chests.
''That's one of the objectives we have to solve -- how are we going to consolidate this data and distribute it?'' Pasko said, adding that the result of the test gives ISPs plenty of incentive to collaborate.

(Okay, maybe there's two problems: No offense to NBC, but when your biggest hit is a veritable case study in game theory… you need some new shows.)



<< 1 2 3 4