Network Performance Archives

Data Consolidation and Performance – Why Networks Fail (to Perform)


Part 4 in a series adapted from Joel Trammell’s Keynote Speech at NetQoS Symposium 2008

Is anyone out there contemplating a data center consolidation project or happen to be in the middle of one? How are you going to ensure that performance is consistent, when what you've effectively done-- and most executives in IT don't think about this--is move the users further away? From a networking sense that's what you're doing in a data center consolidation. So the ability to deliver consistent and acceptable application performance is going to be very important.

So, do you know what applications are affected? Do you know what applications are even in each of your data centers? Do you have a current baseline of performance? Do you know the traffic volumes that those applications take up so that you can properly size the infrastructure? If you have fifty Exchange servers and you're consolidating down to three locations, can you figure out what traffic is going to go to those three locations in order to properly size those links and size those servers? Do you understand the interdependencies of all the applications?

With multi tier applications it's not uncommon to think you're moving the entire application when in reality you find out some of the servers just got left behind thousands of miles away. We've have numerous examples of analyzing multi tier applications for people where one of the tiers just didn't get moved or didn't get located in the same physical data center and therefore is now a thousand miles away and so then they wonder why response time and performance suddenly slowed to a crawl.

So again, response time is the key network performance driver. Traffic flows are very important for understanding capacity issues and understanding how much traffic is generated by these consolidations. Device statistics now become important as well within the data center. Am I overloading routers that have been updated as part of the data center consolidation efforts? Finally, packet analysis should be employed to understand the affects consolidation has on multi tier applications and where the servers are physically located.


Network Performance Archives

Just a Flag in the Database – Why Networks Fail (to Perform)


Part 3 of a series adapted from Joel Trammell’s Keynote Speech at NetQoS Symposium 2008

My favorite story for a changed application: A network team was fighting with a thorny application performance issue for a long time and they saw the application had changed its bandwidth usage by an order of magnitude overnight. They went back to the application developers and asked, "Guys, why didn't you give us a heads up that you had changed this application so dramatically?" An application developer said, "What do you mean changed dramatically? We just flipped a flag in the database.”

"What do you mean you flipped a flag in the database?"

"Well, we made the graphics field, you know, available to the user.” So, it went from a text based application to one showing JPEG images that could be on the order of a MB in size, one for every page on that application.
To the application developer, this was not a change of any significance in the application.

But to the networking team, this was a major issue. Often, the application development team is not going to know what's going to be a little change to them and a big change to the network engineer. That’s why it’s important to quantify the normal behavior and performance for your key existing applications so that you can detect this change.

Do you have some sort of alarm in place of variations from what that normal performance is? Can you detect an unusual traffic flow that might indicate a change in usage or a change in architecture? Perhaps the server team has repositioned servers, it's going to have a dramatic effect on your network, though they haven't “changed” the network at all, right? Can you reconstruct the transaction timing to understand why things have changed?

So, the key driver in this case to detecting a changed application is going to be response time. You know what a normal baseline response time is. Hopefully, you're going to see an alarm off that normal baseline. Then you're going to want to be able to understand if traffic flows have changed, so anomaly detection may help you in this case and of course packet analysis again may help you understand how the application has specifically changed.


Network Performance Archives

Performance Edge Journal – Volume 3


The third edition of Performance Edge Journal has just been posted for download (some registration required). The publication is edited by Network Performance Daily’s Brian Boyko and is devoted to the diverse networking and application delivery responsibilities that today's network professionals must tackle.

In this edition, industry expert Dr. Jim Metzler contributes recent research, presenting some very interesting expectations and realities of today's NOC and how it's not just about monitoring network availability anymore. Find out what non-NOC IT pros think the NOC staff is really doing.

Performance Edge Journal also explores the risk-reward balance when considering incumbent technology. When is it prudent to stick with what you have? When does it make sense to invest in more modern technology?

Unified Communications adoption and ensuring VoIP quality continue to be issues of concern, so we included one article on some of the things to watch for when managing UC and another one that goes details on how to diagnose and deal with that annoying echo we sometimes get on phone calls. (Did you know that phone echo can never be caused by the digital stream?)

Other topics covered in this issue of the Performance Edge Journal include how technology has impacted the U.S. general election; anomaly detection software that provides early warnings of threats to optimal network performance; a case study that looks at how OSF HealthCare improved network visibility to address existing performance issues and prepare for critical initiatives such as VoIP, MPLS, and new network intensive imaging applications; a white paper discussing latency issues specific to financial trading environments; and finally a 2009 recreational network traffic calendar for planning of high traffic network times.


Network Performance Archives

Interview with ‘Bullied’ Network Engineer on Australian Gov’t Net Filters


Australia’s federal government has planned to require Australian ISPs to use filtering software to remove “illegal” content from Australia’s Internet. They’re spending around $77M (USD) to implement the program which the government had lead people to believe would be optional. Instead, it will be mandatory.

Mark Newton, a network engineer with Internode in Australia (but not working on behalf or speaking for Internode), did an analysis of the data gathered from Australian government trials of filtering software. He concluded that, among other things, more accurate filters degrade Internet speeds over 70%, and less accurate filters can have up to a 15% false positive rate.

In retaliation, Belinda Dennett, a policy advisor to Australia’s communication minister, Senator Stephen Conroy (Labor), wrote an e-mail to Newton’s employer, asking them to reign in the network engineer’s dissent.

We called Sen. Conroy’s office but we were not able to get a response before press time.

We have an audio interview in podcast form with Mark Newton below, with a transcript below the cut.

[Ed. Note: Due to problems with rendering in Internet Explorer 7, we've temporarily disabled the flash player version of the podcast. You can download the podcast as an MP3 file here.]

Continue reading "Interview with ‘Bullied’ Network Engineer on Australian Gov’t Net Filters" »


Network Performance Archives

This-specific-end-to-that-specific-end network performance management.


EMA analyst Dennis Drogseth had a column in Network World yesterday talking about end-to-end application management. In it, he had this to say:


You might believe, and with some real justification, that the term “end to end” is only used by vendors who custom-fit the definition to the scope of their particular product.

Does “end-to-end” application management, for instance, include the mainframe? You bet it does if you’re a vendor that manages the mainframe environment! Does it include capturing the end user experience at the end station, desktop, or mobile device? Once again, the answer is a definitive “yes” if you’re a vendor that has strong QoE (Quality of Experience) roots. Or how about insights into the code and design of the application itself? If you’re one of the few vendors that does this, you’re proud of it and wouldn’t have it any other way!


And this concerned me because, if you do a google search for: [site:networkperformancedaily.com “end-to-end”], you get 122 results. The phrase, “end-to-end” appears in a little more than 1 in 5 posts we’ve made to this blog.

So, what do we mean by “end-to-end?”  We’re usually using the phrase in connection with network response times and the end-user experience at the end station; NetQoS is a “vendor that has strong QoE roots.”

Now, we do have some insight into the code and design of the application.  But that isn’t the focus of our tools; the focus is to tell you whether the problem is in the network, server, or application, and if it’s in the application, give you a good idea of where to start your investigation.  (For example, an application that is slow due to unnecessary round-trip transactions behaves differently from an application that is slow due to a memory leak on the server where it is being run.) 

Drogseth is right when he says that no one vendor is optimized to do it all.  In the future, there could be, but then you run into the quality vs. quantity problem.  Is it better to do it all adequately or to do a few things extremely well?

EMA defined five major technology spheres, and last June, they polled more than 400 respondents to find out which of them they believed “most critical to end-to-end application management in 2008.”  The answer was “Network Application Management,” focusing on application flows and end-to-end (as we define it) transaction capabilities. 

For more information on this, I recommend you read the original article up at Network World.  Additionally, Drogseth promises to follow-up in his next two columns. 


Network Performance Archives

What do YOU think of the NetQoS Performance Center?


What do YOU think of the NetQoS Performance Center?

Peter Sevcik and Rebecca Wetzel, analysts with NetForecast, would like to know (and of course we at NetQoS are always looking for feedback). They have published an article about NetQoS for their “App Performance View” blog on NetworkWorld.com. This is the result of a series of blog posts they are writing about “tools that monitor application performance in real-world environments.” They ask for customers to comment about their experience with the NetQoS Performance Center at the end, and we would like to encourage all of you who have experience with any of NetQoS products to respond: What do you think of the NetQoS Performance Center?

You may respond anonymously if you wish, and the comment can be as short or as long as you want to make it. Feedback is encouraged on this blog as well.

And if you are just thinking about deploying any of our products, check out the customer comments already posted at the end of the Network World post!


Network Performance Archives

Three Things You Can Do Today To Improve Network Performance Without Spending a Dime


For months, we’ve been waiting to see what the fallout would be from the sub-prime mortgage crisis.

Apparently, the results are not unlike a hefty bag filled with chili con carne, dropped from the top of a skyscraper. Only instead of a hefty bag, it’s the U.S. economy.

So, as Wall Street explodes like an explosive so explosive it could explode and create a massive explosion, technology turnaround times will probably extend a couple more years as CIOs try to figure out how to use existing tools to solve network management problems and improve performance. How do you do that?

Luckily, there are ways to do that – Cisco routers and switches already have “application-aware” technologies and don’t require any additional purchases – including IP Service Level Agreement (IP SLA), Class Based Quality of Service (CBQoS), and Network Based Application Recognition (NBAR).

Managing Application Response Times with Cisco IP SLA

Now, measuring real application transactions is the most accurate method for measuring response times. But, failing that, you can use Cisco IP SLA to create synthetic transactions. This is not only useful when on an IT budget crunch but can also provide useful data when assessing whether or not to roll out a new application, or measuring a service provider’s SLA edge-to-edge.

IP SLA operates by sending synthetic transactions between two network devices or between a network device and a server. It can be configured to send different types of synthetic transactions based on port, packet size, type of service, and even more advanced characteristics, as is the case with Voice over Internet Protocol (VoIP) tests. When it gets a response, the sender then calculates the response-time metrics appropriate for the test type, and then repeats multiple times.

Some SNMP polling products can collect data automatically, store it in a database, display the results in a GUI, and provide analytical function beyond data collection, such as calculating baselines, displaying trends, and triggering threshold alerts based on collected IP SLA data. There’s also the possibility of simply getting the information from the CLI, but extracting the IP SLA response-time metrics and copying them to a spreadsheet can be difficult and tedious. However, for the extremely budget-conscious, it can be done.

Deploying Quality of Service with Cisco CBQoS

QoS is a blanket term for network policies and practices that help to manage different types of data traffic that share network links. Effectively, QoS determines how different types of traffic, with different priorities, are handled whenever tradeoffs that are likely to impede performance must be made.

Now, within any enterprise, the end-user experience with certain applications will always be more critical than it is with others. Strategies to avoid (or at least manage) congestion could include dropping traffic, adjusting application responses, and building packet queues. CBQoS is one way to do this – and comes with the CBQoS Management Information Base (MIB) to collect statistics about the traffic traversing the router and reports how the QoS configuration is being applied.

Here, an SNMP polling product with application-aware capabilities can get information on input and output QoS class map utilization, drop percentage, and packet counts. It can also get information on pre-versus-post QoS traffic volume, rate, and packet count. It can also point out traffic marked in conformance, in excess, and in violation of defined policies.

Without CBQoS, network managers don’t have a whole lot of evidence to verify that their QoS settings are actually improving network performance – in fact, they may even be inadvertently harming performance. CBQoS prevents network managers from flying blind with QoS deployments. And, like IP SLA, it’s built into Cisco IOS.

Gaining a New Level of Visibility with Cisco NBAR

From within the network device operating system, Cisco NBAR can inspect packets traversing the device and identify the corresponding application – for example, TCP traffic running on port 80 could be labeled as Google, SAP, SharePoint, SalesForce, etc. NBAR can also provide utilization, volume, and rate metrics on a per-application basis relative to the network circuit carrying the traffic.

It’s similar to NetFlow, but NetFlow identifies protocol traffic mixes – not application-layer visibility. NBAR identifies by application – which is important in setting proper QoS policies. And because NBAR is part of Cisco’s IOS, and the data can be collected with an application-aware SNMP poller (which many of you already have), it can be a more cost-effective solution than application discovery hardware.


Network Performance Archives

Nick Carr takes on Colbert


First off, congrats to Nick Carr – we’ve talked with him (and disagreed with him!) often on the blog and we’re thrilled that he managed to go toe-to-toe with Stephen Colbert on last night’s show.

And, thanks to the Colbert Report’s online presence, here’s an embedded player with that interview.


Although the book plugged is “The Big Switch,” the majority of the interview talks more about the implications of dwindling attention spans due to the Internet’s “hyper” hyperlinked nature – a topic not covered in “The Big Switch,” but instead in the cover article Carr wrote for the Atlantic Monthly, “Is Google Making Us Stoopid?

The idea, as we’ve mentioned before, is that Carr believes the end result of the attention getting behavior of the Internet is that it will “scatter our attention and diffuse our concentration.”


“When the Net absorbs a medium, that medium is recreated in the Net's image. It injects the medium's content with hyperlinks, blinking ads, and other digital gewgaws, and it surrounds the content with the content of other media it has absorbed. A new e-mail message, for instance, may announce its arrival as we're glancing over the latest headlines at a newspaper's site. The result is to scatter our attention and diffuse our concentration.”


During the interview, Colbert made a play of ignoring Carr to check his iPhone. Now, that that does happen in real life, but I’d say that’s more an indication of individual rudeness then of culture spinning on a dime over the concept of hypertext.

The same criticisms that Carr makes of the Internet could be made of the newspaper – you’re trying to read one thing but it’s broken up, put next to all these other interesting articles, and ads designed to catch your attention… with all these… analog gewgaws, how is one supposed to be informed?

We’ve mentioned before that the limits on network performance limit the ability to communicate complex thought back when the Atlantic Monthly article first came out. But, we missed an opportunity to get less academic, more practical, and closer to the issues in a corporate network environment.

While we disagree with Carr’s diagnosis that the Internet causes short attention spans, (I’m a pro-blogger at a tech company, raised on Nintendo and MTV – I’m the poster-child for the 21st century digital boy, and still I managed to summon the concentration to read the book Nick Carr wrote…) we do agree that human attention spans are short.

When I worked at a supermarket retailer, back in the early 2000s, as I’ve mentioned (and complained about) we were using a java-based networking app that took one to two seconds to input each number and move to the next field, and processing the entire report took minutes. The network performance was absolutely horrible, and as I pointed out before, we would have mentioned it in the hopes of having the performance improved somehow, except that we all realized that our jobs were essentially superfluous anyway and that we could all be replaced by a very small shell script that could parse the orders as they came in instead of printing them out and having us enter in all of them by hand.

Of course the lot of us at the data entry farm had CNN.com, Slashdot, and All Your Base Are Belong To Us and Hamsterdance open while we waited for the pages to load. (It was a simpler time back then.)

Of course, if we didn’t have outside Internet access, we could very well have distracted ourselves offline with desktop toys or conversation. We did that often anyway – as I said, it just took forever for those fields to come up.

I’ve also heard, second and third-hand, stories of other companies who are shocked to find that employees are going on to do other tasks while they wait for reports to generate, fields to come up, and pages to load – so if you’re honestly worried about dwindling attention spans, it might be better to not curse Google or the Internet, but to go in and actually improve things where you can.


Network Performance Archives

A few of a many, or many of a few?


Ken Church, Albert Greenberg and James Hamilton of Microsoft recently put out a paper on “Delivering Embarrassingly Distributed Cloud Services.”[PDF] Like most papers of this type, it’s a dry read, but informative. It looks at the tradeoff between mega-data center size and micro-data center diversity from the both the viewpoints of total cost of ownership and of performance.

The most important line in the entire report, of course, is “The trade-offs vary by application.” However, they make the argument that applications with little need for server-to-server communications will show benefits in cost, scale, reliability and performance through geo-diversification – in other words, lots of little datacenters as opposed to one big datacenter.

This seems to fly in the face of the trend in data consolidation, but there is a point to it: For any data center, there needs to be redundancy, but in a centralized data center, there needs to be more redundancy than having multiple small data centers. As Church, Greenberg, and Hamilton put it, “the more geo-diversity, the better. N+1 redundancy becomes more attractive for large N.”

The part that really interested me, though, was the networking section. (Section 3, in case you want to skip right to it.) Church, Greenberg, and Hamilton point out that in a large, centralized datacenter, you can have end-to-end control and assure a particular level of performance through supported service level agreements. On the other hand, they argue:


“[with distributed data centers] the cloud service provider has ceded control of quality to its Internet access providers, and so cannot support (or even fully monitor) SLAs on flows that cross out multiple provider networks, as the bulk of the traffic will do. However, by artfully exploiting the diversity in choice of network providers and using performance sensitive global load balancing techniques, performance may not appreciably suffer. Moreover, by exploiting geo-diversity in design, there may be attendant gains in reducing latency…”



“Many large analysis applications are best run centrally in mega data centers… Interactive applications are best run near users… [they] can be delivered with better QoS (e.g., smaller TCP round trip times…) via micro data centers.”


The argument’s sound, especially when you consider that interactive applications are probably the most latency sensitive because they need to make multiple trips to and from the client and server with every interaction.

But reducing the propagation delay (or distance delay) is merely one part of the performance equation. By ceding control over router performance and transmission, you have no way of diagnosing network round trip time problems if they occur, and wouldn’t be able to fix them – short of the messy step of changing service providers – even if you did. If something goes wrong, it could negate the speed increases by diversifying servers, so moving to this model more of a gamble than a guarantee of improvement. Granted, it’s a gamble that might make sense for some apps and some organizations – some apps, apparently, can get away with less than 100% uptime.


Network Performance Archives

Google Chrome and Network Performance – it’s bigger than you think.


When Google Chrome was released, our genuine reaction around the office was something like this:

ourreaction.jpg

Okay, so the last thing the world needs is yet another browser. Between IE, Firefox, Safari, Opera, Flock, Konqueror, Epiphany, Camino, Galeon, SeaMonkey, OmniWeb, and, of course, Wii Internet Channel, Web applications developers already have their hands full.

However, if you work in IT, you are either in the business of developing applications or delivering applications. And sometimes the bottleneck in application delivery is the browser. You can have the best network in the world, with only a couple hundred milliseconds of overall delay – but if it takes seconds to render the JavaScript on the front-end, it’s almost academic. At any rate, the end-user probably can’t tell the difference between delays on the network to delays on the client-side browser.

There are two things that make Chrome stand out – the first is running each tab, and each plug-in, as a separate process, with protected memory address space. Problems in one tab will not crash the entire browser.

The other is advances in JavaScript execution. By running java scripts in separate process, buggy JavaScript can’t hang the browser, like it would if JavaScript ran in a single-thread in a browser process. The above scenario should come as no surprise to anyone that has used Firefox and watched as a single buggy JavaScript site made you restart all the tabs on your browser.

But Chrome also comes with a JavaScript virtual machine, which speeds up JavaScript-based Web applications by turning the interpreted JavaScript code directly into machine-code for your processor and OS. Again, faster delivery of the application, when the browser is the bottleneck.

There are a few nay-sayers out there that are looking at this from a bottom line point of view – that Google is trying to enter into the browser wars and try to own the space – basically, if you use Google’s browser, even if it’s open-source, you’ll view Google’s advertisements, and make Google money. That’s true enough. But what we really should be taking from this is that even if Google’s code wasn’t open-sourced – and it is – these innovative ideas would eventually make their way into other Web browsers in order to stay competitive. Firefox will likely incorporate changes at least by the next full release, and Microsoft, Apple, and Opera Software will do so if they want to remain competitive.

I’m skeptical that Google Chrome will make it onto enough desktops that Google becomes a key competitor in the Browser Wars. Then again, Mosaic was the first Web browser, and no one uses it today – but we certainly use a lot of the technological ideas behind Mosaic. It really was a quantum leap forward, and though I may be overly optimistic about it, this really is a quantum leap forward in Web application development.

The point is not Google Chrome. The point is the technology behind Google Chrome.



<< 1 2 3 4 5 6 7