Capacity Planning Archives

Cisco’s MXE 3000 and video optimization


My day job is covering networking innovations and trends, but I moonlight as a video editor, director, and producer, so I was personally really excited to hear what Cisco was doing with the Cisco regarding the new Cisco Media Experience Engine (MXE) 3000, and my question lists includes questions about bitrate, framerate, dynamic re-encoding, and “can I borrow one for the weekend, pretty pretty please?”

Network World has a picture of it, which looks like a 1U blade with a DVD-ROM drive. According to the Cisco FAQ, it’s designed to be used in the data center.

But what does it do, exactly, and how will it impact network performance?

Ultimately, the MXE is a transcoding device that resides in the Data Center. For non-video geeks, transcoding is what happens when you take a video that is in one computer format, and want to turn it into another video format. For example, when you take your digital camcorder’s DV tape and burn it to a DVD, part of that process is your computer converting from the DV format to the format used in DVDs – MPEG2. That conversion is called transcoding – moving from one codec to another.

The question, of course, I would have really liked to ask Cisco: What is the advantage of putting the transcoding software and appliances in the data center, compared to, say, buying a Mac XServe, putting it in a closet somewhere, enabling a remote desktop, and using a program like Final Cut Studio’s Compressor to accomplish many of the same pre-processing and encoding tasks that the MXE can accomplish?

This is an especially important question because while one of the key goals of the MXE is to limit traffic congestion on the WAN by reducing large videos into smaller ones. For example, videos may be recorded using an HD camera in HDV, which records at 25 mbits/s in the MPEG2 codec. However, you could save on bandwidth by reducing the movie from the original 25Mbits/s to around 3Mbits/s in the H.264 codec, which preserves video quality at lower file sizes with the tradeoff being the extra processing power needed to both encode and decode the image. You could cut that down even further if you don’t need HD detail.

So, yes, if having an MXE means that raw video travels on the high-bandwidth, low-latency LAN down to the MXE, where it is converted to a smaller file for travel on the low-bandwidth, high-latency WAN, it could be huge.

What seems to be strange, though, is that Cisco suggests, in the online promotional video for the product, sending the large source video through the WAN to be transcoded. I’m not sure that that would work out as well as Cisco thinks it will. Even with a device like the MXE, keeping track of your network’s capacity and monitoring your traffic flows and response times end-to-end remains important for the simple reason that not all video is optimized for the network. We were unable to, as of press time; hear back from Cisco directly, and that was a little disappointing.

So what is the advantage of the design decision to put this in the data center? I’m sure there is one – I just wasn’t able to get with Cisco – yet- to find out exactly what it is.

Additionally, the MXE transcodes files, not streams. This means that video-over-IP won’t be affected by the device. What I’d really like to see is a device that can transcode streaming video on the fly – using higher resolutions and bitrates when the link is relatively uncongested, and reducing it when there is other traffic on the network with higher QoS priority. That would be a killer app for videoconferencing, and this might be a good first step towards that goal.


Capacity Planning Archives

Tracking YouTube Traffic with NetFlow: How It's Done


By David Oliver

We did have the opportunity to do this blog post as a video recording and put it on YouTube, but we realized that, ironically, as the post is all about how companies use NetFlow to track YouTube, because YouTube can, in many cases, suck down bandwidth, it was probably best just to write this out in text.

As we mentioned a week ago, YouTube is now supporting high definition content, with a high bandwidth to match.  Now, I've done a little bit of research into how YouTube actually works.   So I thought I’d explain to all those companies out who don’t yet have their own solutions some ideas about how to track and manage YouTube and other streaming media data – as well as give users out there an idea of exactly how companies can track your YouTube usage at work. 

Anyway, when you make a request for a video on YouTube, you are directed to YouTube’s servers via one of four IP addresses that are easily found on Google or other search engines.  From there you're going to be relayed to the Limelight network, which will actually feed you the video in the flash-based player.  You can see the flows to and from that initial IP address for the HTTP GET of that video. 

There are many solutions for providing visibility into traffic on the network by looking at the Cisco NetFlow data (which is already on most Cisco routers).  I’m going to refer to NetQoS’s own solution,  ReporterAnalyzer, when I talk about tracking NetFlow data. 

What we can do with ReporterAnalyzer is monitor the Internet-facing link, and create and use custom reports looking for YouTube’s specific IP addresses.  If you see a substantial amount of data being transferred,  that's a good marker of seeing that YouTube video traffic. 

You can rely on those custom reports and run them anytime you want, but companies can also monitor YouTube in real-time.  By mapping HTTP Port 80 traffic that involves one of YouTube's IP addresses to some other ephemeral port, (and naming it something catchy, like "YouTube,") it'll actually show up as it's own protocol in both real time reporting, as well as flow forensics.    You could use that data to create customer reports, to get a comprehensive list of users, and to sort YouTube use by volume. 

The other thing you can do is use analyses to know when YouTube traffic accounts for more than, say, 10% of any of my links' traffic.  Then it will go through on a link-by-link basis and tell you about violations, helping you further localize the source of that traffic. You can also configure it to alert you when and only when YouTube traffic on a particular link passes a threshold that you set. 

(The other option is to try to block it entirely, but that's an engineering nightmare.  Any employee smart enough to provide good value to a company - particularly a high tech company - will likely be smart enough to know how to circumvent blocks through proxies and other means.)

Custom reports to find correct addresses and to localize YouTube traffic may take a couple minutes.  The entire real-time application mapping process takes maybe another 15 minutes.  I can be showing real-time data specific to YouTube traffic just a few minutes after configuration of application mapping.  (If your boss asks in the morning for something to track YouTube usage, the company can get YouTube tracking up and running by that afternoon - if the boss just wants some a quick snapshot of the current YouTube traffic volume, it could take as little as five minutes through custom reports.)

Of course, this isn't limited to YouTube.  You can use similar methods and techniques to find and track streaming audio feeds, other video sites, etc. Any TCP flow is going to create some sort of NetFlow data.  Based on the source or destination address, you can localize that.  So as long as ReporterAnalyzer has visibility of that destination address, they can report on it.  As you know, there are a multitude of media based streaming sites, all of which are going to have their own IP address range, which you can find pretty easily.  You can then further localize and label them so that when you pull up reports, they're already differentiated from other traffic.

While YouTube is great, we’ve found that YouTube traffic congesting corporate networks is a common issue. For any company, WAN links are a finite resource and need to be managed.  It's something that's a concern because you're sizing your network around capacity needs for the business.  YouTube is (usually) non-business traffic, but it's going to share that limited resource.  The more you share a resource, the less is available for the requirements you originally scoped it for.  At NetQoS, we’ve found YouTube traffic congesting corporate networks is a common issue.




David Oliver is a Product Manager at NetQoS


Capacity Planning Archives

Black Friday, Cyber Monday.


Boy, what a difference a year makes.

“Black Friday” usually refers to the day after Thanksgiving, when retailers, both online and offline, started getting rushes of orders in order to fulfill Christmas demand. But unless you’re a Wall Street firm, for whom Christmas has come early, you’re probably cutting back on expenditures this holiday season.

Now, it’s probably referring to any Friday that you re-read your quarterly 401(k) or 529 statement.

Still, whether or not people –spend- more online this holiday season, they’ll probably be making a similar number of transactions – that is, Hershey bars instead of Godiva chocolate, Playstation 2s instead of Wiis, Go-Bots instead of Transformers…

And with every dollar counting, the one thing that retailers and suppliers can’t afford on Black Friday and Cyber Monday this year are performance slowdowns like the ones that hit Costco, Victoria’s Secret, Lowe's, and Macy's last year.

Additionally, even if you aren’t a retailer, Cyber Monday typically sends some Web traffic spikes over company networks as employees use the high speed connections work provides in order to make their purchases.

In either case, you can analyze network traffic flows to identify what traffic is mission critical, what is mission irrelevant, and what is mission impossible. After quantifying the impact of certain types of traffic on network performance, you can then implement quality of service policies to ensure that business critical apps have priority access to network resources.

We know that on Friday and next Monday, there will be a higher than normal volume of Internet traffic.  The trick is finding out how much of an impact it will have and preventing it from impacting application performance.


Capacity Planning Archives

Bandwidth “shortage” has 1950s precedent.


Most Americans can barely remember a time when there wasn’t enough electricity running to their house or apartment. Oh, certainly, we can remember times when we’d blow a fuse or trip a circuit breaker; but they were rare and usually happened under excessive strain – when the hairdryer and the air conditioner were on at the same time as the electric stove or George Foreman Lean Mean Fat-Reducing Grilling Machine.

In the 1950s, according to this advertisement uncovered by Modern Mechanix, running out of juice was a real problem because the wiring in house built decades earlier simply didn’t have enough capacity to run all those appliances. Well, of course it was – and this brings me back to my undergraduate days as a history major at the New Jersey Institute of Technology. (Yes, NJIT had a history major. No, it wasn’t a big class…)

Now, the second industrial revolution was years prior – during the 1910s through the 1920s, but consumer adoption of technological advances was stunted for a period of about 40 years. The great depression killed disposable income, when the war came and finally did bring some income, there was significant rationing and many companies specializing in electronics and electromechanical devices were building for the war effort. In a sense, when money started flowing in during the late 40s and 1950s, consumer demand had been “pent up.” Yeah, consumerism went overboard in the 1950s, but can you blame ‘em?

Fast forward to today and replace appliances with applications – there’s a reason that both have the same root word, the Latin “applicare – and you can see the parallels.

Of course, things are a bit more complicated than the 1950s – there are only a few types of electricity – AC/DC, 110v/220v… etc. The difference is that it was relatively easy to tell whether or not the problem you were having was caused by a lack of electricity. Does the thing light up? Is it working? If the answer is no to both, you’ve got an electricity problem. If the answer to the first is yes, and the answer to the second is no, you’ve got a busted appliance. (If the answer to the first is no and the second yes, you need to replace the lightbulb.)

Today it’s a little bit harder to diagnose the problems you have with application performance – application, server, and network problems can all look very similar – especially to the end-user. Sometimes you spend time and energy working on one area only to find out it’s not the problem.

But sometimes, yes, the network is the problem, and more capacity is needed. The important thing is to be sure about it rather than just guessing.

There’s another lesson here too for consumer applications as well. That is, in the 1950s, the way to solve the problem of new demand on the electrical grid was to upgrade the wiring. Maybe we should be doing that to solve some of our own broadband “shortage” problems, instead of resorting to things like “bandwidth caps” and “aggressive traffic shaping.”

Because it’s not just about not being able to see the latest cat on treadmill video. Videoconferencing via Skype or other method puts people in face-to-face communication. CERN scientists needed to upgrade the Web’s infrastructure to share the massive amounts of data the LHC would create.

But one of the most telling things is what’s going on in Austin right now.

Our local NBC affiliate, KXAN, and our local cable provider, Time Warner Cable, haven’t reached an agreement to show KXAN programming on cable. One of TWC’s gambits is running an advertisement on television showing people how to connect their laptop computers up to their television so that they can watch the streaming video of the NBC shows that they’re missing.

Which, of course, begs the question; if you can get television shows via the Internet directly from the networks themselves, why do you need cable TV or network affiliates in the first place? This move may backfire. One of the reasons I don’t have cable is that I’ve been hooking my TV up to my computer for years now… then again, I have pretty good broadband service.


Capacity Planning Archives

What network performance taught me about optimizing a lemon


David Oliver talks about his experiences running the 24 Hours LeMons race in Houston, and how knowing about network performance helped him optimize his junker.







Capacity Planning Archives

Cisco ships Mexican folk music instead of VPN software. Easy mistake: They’re so similar…


According to The Register, Cisco installation CDs for VPN networks contained music.

Specifically, music that sounded exactly like this.

Now, Mexican folk music of the “narcocorridos” variety has a rich tradition and requires extreme skill to produce, and is greatly enjoyed by many music aficionados. But still, if you’re going to come up with a piece of music designed to surprise the hell out of everyone, you could probably choose no better music in the world.

Knowing Cisco, there’s no way that this was deliberate; but this brings to mind two things: First, is there someone out in Baja California with a copy of VPN software in his or her hand, wondering to themselves: “¿Dónde está mi música?”

Second, will this start a trend of “narcorrido-rolling” network engineers?

Cisco is doing everything they can to recover from this error, and in a statement, said:


Cisco is aware that some customers have received defective VPN Client CDs as part of recent orders.

Manufacturing is aware of this problem and is actively reshipping new media to impacted customers.

Defective VPN Client CDs can be identified by the following marking on the back of the media which ends in "MX21511/4"


Of course the moral of the story is that you need to test before you deploy. In this case, it was a little embarrassment, and we all pretty much just have a chuckle about it. But deploying technology on the network without knowing the full effects is just asking for trouble.

I mean, what would have happened if the music actually installed? Is your enterprise prepared to handle accordion configuration?


Capacity Planning Archives

Scalability isn’t just about numbers


Scalability is one of the more overused terms in networking – which makes it hard to explain why it’s important. Well, I mean, beyond the main concept of: “More scalability means you can hook up more computers to it!”

True, how big the deployment is probably the best way to objectively prove scalability – for example, NetQoS has one ReporterAnalyzer deployment monitoring over 20,000 WAN links. No small feat. But scalability isn’t just the quantity of computers hooked up to the box, but also how much of the quality of the data you maintain when you’ve got tons of computers hooked up to the box. Or to put it another way, scalability means that in even large deployments, you get all the data at high granularity.

Talking about scalability in pure device count is sort of like talking about network performance purely in terms of fault. It is possible to have poor scalability without having no scalability, when you sacrifice detail for device count.

Another key of scalability that many people don’t think about is performance of the device itself. It would be ironic to purchase a device to monitor network performance that had a very slow UI because it strained under the load of monitoring thousands of links.

One of NetQoS’s many accomplishments over the past six months has been getting a patent on a memory management method and system which allows us to manage hundreds of thousands of combinations in a very small memory footprint.

Memory management is a major part of scalability, because allocating memory during a programming operation is relatively expensive, in terms of operating processor resources, to allocate memory during runtime. Put another way: the more efficiently you use memory, the harder you can push the processor on other tasks. For this reason, scalability requires efficient memory usage.

In addition to our own products, we also use it in our integrations into Cisco Wide Area Application Services (WAAS) – we’re able to integrate code there with little impact to the host systems.


Capacity Planning Archives

Why the Olympics stay online – because fewer people than you think are watching.


While we’ve talked quite a bit about what impact the Olympics may have on an enterprise network’s performance, we haven’t talked much about the performance of the NBC site hosting the live streaming of the Olympics. 

According to Jason Perlow at ZDNet, Limelight networks (which hosts the streaming videos) deployed the videos by going to the public internet by hosting the content more locally – at the ISP.  That means you’re viewing the Olympics through your ISP’s internal network, and the broader internet doesn’t even enter into the connection. 

This is smart thinking, it appears to be working, and by all measures this should be applauded.  Perhaps even duplicated – if you know that multiple employees will download the same content, local hosting on the LAN is preferable to duplicate download streams tying up the more expensive, slower WAN lines.

From the enterprise end of the equation, the fact that Limelight is delivering Olympics video more effectively just means that IT managers cannot count on their servers going down from being unable to handle the demand – IT managers still need to monitor their own networks for performance problems when a big event like the Olympics come up. 

However, it would be wrong to assume that Limelight’s strategy is the only reason why Olympic live-streaming hasn’t slowed to a trickle.

First of all, the site blocks 95.44% of visitors from accessing the content – because it limits the content only to those in the United States.  That’s a lot of people.

Secondly, the site requires Microsoft Silverlight. Most people don’t have Silverlight installed.  Some can’t even install it on their systems.  And there are certainly going to be a quite a few people who just didn’t think installing Silverlight was worth the bother to watch five minutes of Olympic footage they may be mildly interested in. 

And finally – none of the really popular sports are being streamed.  Gymnastics, Women’s Beach Volleyball, Swimming (with the exception of synchronized) and most of the track and field events aren’t available live. So you’re left with judo, fencing, and the decathlon.

So while it is a true technological wonder that the lights have stayed on and the site performs admirably – it is important to recognize that Limelight has not found a magic bullet to deal with extremely high internet video demand. 


Capacity Planning Archives

Latency and Jitter


By Kevin Davis
Adapted from “Sources of Latency” Whitepaper

When network users call the Help Desk to report poor application performance, you don’t typically hear things like “The router’s CPU is too busy!,” “The network utilization is above 70%!,” or “The carrier path has failed-over to a sub-optimal path.” Instead, what you’re likely to hear is “The network is slow” or “The calls on my IP phone sound terrible.”

Complaints that end-users lodge are nearly always based their quality of experience using the application. And their quality of experience is almost always reliant on time.

Anytime a significant delay occurs in the delivery of network data, application performance suffers. Depending on the type of application and how it works, variances in network delay can have a severe impact on application performance thereby degrading end-user’s experiences.

Two important measurements of time intervals in network transmission systems are referred to as “latency” and “jitter”. Understanding latency and jitter sources and how their values vary in network architectures is critical to engineering application performance and optimizing information resources. For many regular readers, this will be old-hat, but we’ll go over it again.

Network latency is the amount of time it takes for a packet to be transmitted end-to-end across a network and is composed of five variables:


Network Latency = (Distance Delay) + (Serialization Delay) + (Queue Delay) + (Forwarding Delay) + (Protocol Delay)


Serialization Delay refers to the amount of time it takes for a network interface (such as a router’s interface or computer’s NIC) to perform bitwise transmission of a frame unto the outbound media, Forwarding Delay is the amount of time it takes a network device to process a frame/packet by performing a destination address lookup and forwarding the frame/packet to the outbound interface, and Protocol Delay is the amount of time that access or transmission algorithms may contribute to the delay of a network frame, and is typically introduced at the endpoints of the data transmission system.

Serialization delay, on a per-packet basis, becomes insignificant at data rates above 1.544 Mbits/s – or a T1. Forwarding delay is typically insignificant in modern routers and switches (when appropriately configured – significant delay can occur in misconfigured routers.) And Protocol delay typically occurs at the access layer or the end points. So the two major variables that have the most effect on network latency are Distance Delay and Queue Delay.

Distance Delay is simply the minimum amount of time that it takes the electrical signals that represent bits to travel down the physical wire. Optical cable sends bits at about ~5.5 µs/km, copper cable sends it at ~5.606 µs/km, and satellite sends bits at ~3.3 µs/km. (There are a few additional microseconds of delay from amplifying repeaters in optical cable, but compared to distance, the delay is negligible.)

Distance delay can have a significant impact on application performance for applications that require a large number of network round trips in order to complete a transaction – for example, custom transactional based applications, database queries, and VoIP, which begins do degrade when one-way end-to-end latency exceeds 200-220 milliseconds.

One of the biggest sources of end-user ire are database queries designed to run over a LAN ported to the WAN. For example if a user executes a SQL database query that requests 100 rows of a database table, one row at a time, over a link with a latency due to distance of 60 ms, it would take approximately 6 seconds (60 ms * 100 turns) to complete the transaction. The same query executed by a user on a LAN connected to the same database server would take less than 2-3 ms to be completed, as the latency due to distance across the LAN is insignificant.

Queue Delay is the amount of time a packet must spend in a network buffer waiting its turn to be transmitted. Network interfaces transmit one frame at a time, typically one bit at a time. As such, when two or more packets are forwarded to a network interface at the same time, or close to the same time – one packet is transmitted while the others are put in a queue on the interface buffer to await their turn at the interface. Packets that are put into the queue must wait until they can be transmitted, adding milliseconds of delay.

Increases in Queue Delay can be measured and detected by monitoring traffic along a given network path. Typically, most intermittent increases in latency above the baseline distance latency can be attributed to network congestion. (In order to reduce the possibility of excessive queue delay, application servers that are members of the same application architecture should be placed on the same Ethernet switch and on the same VLAN to ensure they do not have to compete for uplink bandwidth when problems like the one pictured above occur.)

Worse still, if the problem gets worse and packets wait in increasingly longer lines within the queue, the buffer may become full and the packets may be dropped. Packet drop, in turn, causes TCP connections to throttle back on the rate of transmission.

Those are some of the main causes of latency – but what about jitter?

Jitter is a term that refers to the variance in the arrival rate of packets from the same data flow, and abnormal jitter values can negatively impact real-time applications like VoIP and video. Jitter is typically created by three different mechanisms in a network: variance in Serialization Delays due to variance in packet sizes, variance in per-packet Queue Delay due to packet spacing from multiple sources at a common outbound interface, or packets taking different routes from source to destination – perhaps due to per-packet load sharing or routing issues.

The most effective way to deal with jitter is by using low-latency queuing for VoIP and video traffic on network interfaces with large serialization and/or queue delays. In addition, endpoints (such as IP phones) can use jitter buffers or playout delay buffers in order to deliver received packets at a constant rate to the end consumer. These buffers are typically 30-50 ms in depth, and thus they attempt to manage jitter values within these values on any single one-way path. While these buffers technically add 30-50ms in latency, they significantly reduce jitter. Since human beings don’t start to notice latency in VoIP or VideoIP applications till it hits about 200ms, if latency can be kept to under 150 milliseconds, then jitter can be significantly reduced using this method.


Capacity Planning Archives

Waiting for Firefox


It’s Download Day.  At 10:00 a.m. PDT, or noon, for us in Austin, Firefox 3.0 was released to the public in what the Mozilla foundation has dubbed “download day.” In fact, they’re attempting to set a Guinness World Record for “most downloads in a 24 hour period.” 

So, it was a bit of a concern to us because with all those people downloading Web browsers, there would be sure to be traffic spikes on our network. But the “Download Day” promotion is such a huge success that Mozilla is having trouble keeping their own server up. 

At 10:16 a.m. PDT, I can see a “The server at www.spreadfirefox.com is taking too long to respond” error.  Mozilla.org is also unable to resolve. 

At 10:30 a.m. PDT, it’s still not connecting, and I decide to stop hitting refresh and go and eat lunch. Mmm.  Roast Beef. 

At 11:30 a.m. PDT, Spreadfirefox.com is still not resolving, but Mozilla.org does.  That doesn’t last, however, as I go to download Firefox, I get a “Http/1.1 Service Unavailable” error.   I bring up a copy of “Waiting for Godot” in another browser window.

It is 12:00 noon on the Pacific.  Spreadfirefox.com is still not resolving. 

12:30 p.m. PDT.  Still not working.  I clean off my work desk, something I’ve been putting off for a wh—ew, is that mayonnaise?  (I hope that’s mayonnaise.)

1:00 p.m. PDT. No Firefox, but My desk is now clean.  (My closet is now dangerous.)  Time to catch up on my RSS feeds to find out if there are any interesting leads that I can investigate. Hmm.  Wine 1.0 is out, but that really doesn’t have a lot to do with network performance.  Reddit seems have problems with Firefox too.  But somebody has to be getting the browser – there’s over 8000 downloads a minute according to the counting tracker.  Wait.  Some users report the counts running backward… what, are people uploading it back?

1:45 p.m. PDT. Aha!  Finally.  The page resolves and I begin my download… and it redirects me to Firefox 2.0.0.14.  Great.

1:55 p.m. PDT. I download Opera 9.5.

2:00 p.m. PDT. Mozilla’s page finally shows a link to Firefox 3.0 – but still shows the logo for Firefox 2.  The 7.1 MB download starts at around 50kBytes/s – which is pretty lame for the usual 700kBytes/s I can get when I download from work. 

2:15 p.m. I install Firefox 3.0 and launch it.  It’s nice.  It’s certainly more responsive and uses less memory.  However, my Tab Mix Plus extension isn’t compatible, and furthermore, there’s no option to undo closed tabs.  All in all, a disappointment – if it were a restaurant, it would be infamous for slow service and bad food.

Leaving aside the whole “Undo Closed Tabs” issue, you would think that an organization actively trying to beat the world record for the most downloads in a 24 hour period might, you know, be prepared enough to make sure the servers don’t go down?

Additionally; Mozilla has been promoting “Download Day” for some time now, so it makes sense for IT departments to be prepared for the onslaught of downloads coming into the network from users upgrading their PCs to the latest version of the browser – and keep track of the impact that traffic has on the user experience for more mission-critical apps.



1 2 3 4 5 6