Commentary Archives

Jim Metzler on Network Management Trends



Commentary Archives

Conclusion of FCC commissioned Harvard study: Open Access Makes Better Broadband


U.S. broadband lags behind international competitors.  And yet another study recently showed how much the U.S. lags behind international broadband. 

This is not news.  What is news is that the study was commissioned by the FCC and executed by Harvard University’s Berkman Center, and they came to the conclusion that  the most successful countries in broadband deployment have done one thing very differently from the U.S. – they have made their main carriers open up their networks to competing service providers.

In other words, since the barrier to entry for broadband is so high, by requiring existing carriers to lease out access to their networks, it creates an incentive for competition in the broadband market, leading to lower prices, better service, and better performance. 

By contrast, the FCC, early in this decade, decided not to require open access, based on an idea that forcing broadband providers to lease out their lines would create a disincentive towards investing in higher capacity networks. 

But, according to the study:


“The emphasis other countries place on open access policies appears to be warranted by the evidence.

We  find  that  in countries where an engaged  regulator enforced open  access  obligations,  competitors  that  entered  using  these  open  access  facilities  provided  an important catalyst for the development of robust competition which, in most cases, contributed to strong broadband performance across a range of metrics. Today these competitors continue to play, directly or through  successor  companies,  a  central  role  in  the  competitiveness  of  the  markets  they  inhabit.”


The FCC is now issuing a call for public comments on the study. 


Commentary Archives

Ars Technica vs. Nemertes Research


In May of this year, Nemertes Research president Johna Till Johnson wrote in Network World that “The Internet Sky Really Is Falling.”

The next day, we came out with a story about that column, in our much more irreverent style, entitled “That’s great, it starts with an earthquake: Is the Internet dying?


In that article, we questioned the conclusions that they drew from evidence. To sum up, those conclusions were:




  • Nemertes believed that YouTube restricting high definition video to developing countries was a sign of the Internet outstripping backbone demand. We pointed out that such restrictions were due to local traffic problems and the lack of profitable business models in many developing markets.



  • Nemertes also pointed out that many cable carriers were instituting bandwidth caps and pay-per-byte pricing. We pointed out that we did an entire series on why usage caps don’t help with traffic congestion, and that ISPs that roll them out typically do so in generally non-competitive markets where they have other business interests (like cable TV and phone service) that compete with Internet access, and that there were plenty of counter-examples of companies (like Verizon and Cablevision) offering more bandwidth without caps.



  • And Nemertes pointed out the IPv4 shortage, for which there was already a solution, IPv6. (Though adoption rates have been slow, it does not mean the Internet will halt – simply that IPv6 changeovers will be more expensive the longer the delay.)


But the one thing we didn’t question was claim by Nemertes claims that Internet traffic will grow “exponentially” while Internet backbone will grow “linearly,” leading Nemertes to the conclusion that there will come a day when there will be Internet “brownouts.”


Recently, Johna Till Johnson published another column – this time in ComputerWorld, outright claiming that net neutrality legislation would mean the end of the Internet. That’s not hyperbole on my part – the headline is literally: “Hello net neutrality, goodbye Internet.”


And Ars Technica, a Conde Nast publication, decided to take another look at Nemertes’ evidence.


Essentially, Nemertes now claims (in the October article) that Internet growth creates a strain on last-mile access lines (Cable/DSL/FiOS) that makes it “excruciatingly expensive to upgrade,” that network neutrality would mean that you can’t charge different rates for different traffic, so backbone providers and carriers would start charging by the bit – or at least capping and charging for overages. Since bandwidth providers would now charge each other for the traffic on their networks, they would either raise subscriber rates dramatically or disconnect from the Internet entirely, literally killing the Internet as the entire thing breaks down into walled tiers like early 1990s Compuserve, AOL, & Prodigy.


Ars Technica, on the other hand, points out that the “excruciatingly expensive to upgrade” last-mile bandwidth isn’t exactly excruciatingly expensive compared to the profits that Internet service providers already generate with net neutrality and in most cases, without caps. Verizon, for example, is paying $18 billion for FiOS upgrades, but that’s the most expensive upgrade in the market, and Verizon finds it financially feasible to do so in a net-neutral market. For most ISPs, DOCSIS 3.0 (for Cable) and FTTN (for DSL) are very cheap solutions to increasing last-mile bandwidth.


As for the idea of the Internet fracturing, Ars Technica pointed out that ISP networks all exchange roughly the same amount of bandwidth; and an even trade is an even trade no matter how much it costs. There are many ways to recoup costs – but raising the rates on a competitor who can then turn around and raise rates on you doesn’t make any sense at all.


Or as Sevcik and Wetzel put it in Network World:



“Backbone ISPs and access ISPs must play nicely with each other to satisfy their customers' needs. Why for heaven's sake would they hurt their customers and themselves by balkanizing?”


What’s most worrying however, is that Ars Technica wrote that Nemertes idea of Internet growth outstripping capacity may be flawed.


According to the University of Minnesota MINTS project, the year-over-year growth of Internet traffic is not “50-100%” as Nemertes claimed in the ComputerWorld article, but “50-60%.” (Technically, “50-60%” is within the range of “50-100%” but it’s like estimating that a man that could be 5to 6 feet tall is “between 5 to 10 ft. tall.”) In Canada, where ISPs have to reveal traffic numbers due to network neutrality research by the Canadian government, they find that growth is slowing, year over year. 53% growth in 2006, but 32% growth in 2008.


We’ve found that when it comes to enterprise networks and IT in general, Nemertes Research is a valuable research organization. But in 2007, Nemertes made a prediction – a reasonable one, given the evidence at the time - about the Internet that did not come to pass. Instead of re-examining that prediction, they continue to insist – on openly contested arguments – that they were indeed right all along, even as, less than 3 months away from the ominous “2010” date, the Internet has managed to keep up with the demand of high-bandwidth YouTube HD files, NetFlix streaming, Skype Video-calling, video game downloads, and other high-throughput applications.


I think that what is actually happening, rather than demand for bandwidth outstripping supply, is that the supply of bandwidth creates its own demand, and that the new demand comes primarily from new applications. That is, HD video on the net is only in demand now that the networks have been shown to be able to handle that kind of capacity. YouTube didn’t start until there was enough capacity on the Internet to make SD video distribution feasible. Only later, when capacity grew, did YouTube roll out high quality video, and still later (after Vimeo proved it was feasible) did YouTube roll out 720p video content. When the network capacity can handle streaming 1080p video, then that will be the new standard. But no one is going to roll out 1080p video until the network can handle it.


This is not to be confused with the issues faced by enterprises when trying to allocate resources to business critical traffic over recreational traffic – where supply of recreational network traffic can be artificially restricted through QoS policies and traffic shaping in order to, presumably, lower the strain that recreational traffic puts on the network. Even so, most smart companies engage in capacity planning, making sure they have the bandwidth available to use new applications before those applications are rolled out. Teleconferencing, for example, is a business application that requires a great deal of bandwidth – but it’s of no use to an organization – and therefore not demanded – if the company network can’t support it. Or in other words, if the money saved from teleconferencing isn’t equal to or greater than the increase in network costs, smart companies are not likely to invest in teleconferencing.


In short, the sky is not falling. But keep an eye on your patch of it, anyway.


Commentary Archives

Fact-Checking


Did you watch The Daily Show last night?  Or just the first 10 minutes of it anyway?  It was really good – even by the normally high standards of the Daily Show. John Stewart took CNN to task for failing in the most basic of its journalistic responsibilities – fact checking. 

If you’re in the U.S., you can watch it here, though, quite obviously, if you’re at work, your employer may consider it recreational network traffic. 

For those of you who can’t see the video just yet, here’s a quick summation: CNN bothered to fact-check a sketch about President Obama on Saturday Night Live (a comedy show whose political comedy has often been based on satire and hyperbole) but fails to fact check most of statements on their program, including statistics spouted by guests, claims made in press releases, and presenting two talking heads arguing not over policy but over a statement of fact – and not telling who was actually correct. 

In short, they don’t fact-check what they put out on the channel.  In many ways this is worse than Jayson Blair at the New York Times or Stephen Glass at the New Republic – as those journalistic travesties were the result of concentrated efforts to fool established fact-checking mechanisms, while CNN seems – well, not to give a hoot.  This is not due to any political bias on CNN’s part, it’s just due to intellectual laziness.

CNN could fire some of its reporting staff and hire some network engineers as journalists.  A good network engineer knows how to fact-check.

For example, WAN Optimization solution vendors are likely to make a claim that their solution will reduce traffic by X percent or whatever.  Those claims are usually true but based on a lab test, so it’s usually better to verify rather than trust, quantifying how much traffic changes before and after a WAN Optimization is installed, validating, (or invalidating), the vendor’s claims.  Mileage may, indeed, vary.

Similarly, network monitoring can be used to make sure that your network service provider is living up to their service level agreements. 

That’s just two of the obvious ways in which fact-checking and verification is important to network engineers, but troubleshooting is nothing but checking the possible causes of problems until you’re left, by a process of elimination, with the cause. 

All of these things are why it’s important for engineers to have network monitoring software and to know how to use it properly.  Which brings us to my last point, which is to engage in a bit of navel gazing.

Network Performance Daily is the company blog of NetQoS, and by definition, it has got a bias towards our company and towards our products.  I try to disclose this whenever there’s appearance of a conflict of interest.  I try to treat the blog like a journalistic outlet (my M.A. is in Journalism, and I used to be Associate Editor at the Daily Texan) when it comes to reporting – and the idea has always been to give you information that our customers would find interesting and relevant, and on those days when we can’t find anything interesting and relevant, we at least try to make you laugh a little bit.  But we do hold ourselves to a professional standard; and when we make mistakes or make a point that later needs clarification, we correct, clarify, and apologize.  (This happened very recently with the FCC Net Neutrality speech coverage, but we’ve made, and corrected, errors with Vint Cerf’s Interplanetary Internet, for example.) But the point is this: one can still be entertaining and interesting and adhering to a journalistic standard. 

Anyway, to sum all this up: In order to make informed decisions about how to manage a network, you need to have information about the network; not speculation about the network, and not wild guesses about the network.

In order to make informed decisions about how to manage a country (through the democratic process,) you need to have the facts – not speculation about the facts, and not wild guesses about statistics.  The problem is not that CNN has delved too far towards entertainment; but that it is possible to inform while entertain.  Which makes it all the more tragic that CNN chooses not to. 


Commentary Archives

In Soviet Swarm Programming Language, World “Hello”s You!


Distributed computation has been around a while in different forms – Beowulf clusters, for example, - but Ian Clarke, the developer of Freenet and founder of Revver, has started working on a programming language, based on Scala, called “Swarm,” which he hopes will create a distributed programming language that can run on almost any operating system.

Because it runs on an application level, any computer can be a part of Swarm. You run Swarm on any computer you like, and you can access the computation of other computers running Swarm on the network; or, theoretically, on the public Internet. And Swarm allows a programmer to code an application for multiple CPUs and multiple computers with the same code that you could code for one CPU on one computer.

Now, there are projects such as SETI@Home or Folding@Home which do similar grid-computing tasks, but both are based on a model of breaking up the data to bite-sized chunks, moving that data to individual machines, where the information is processed, and then resending the output back to the central server.

Swarm is trying to flip that on its head. With Swarm, you can run the program wherever the data resides. So if you had a piece of data on Computer A, and a piece of data on Computer B, and you wanted to do a calculation that required both A and B’s data, you wouldn’t need to copy the data over the network – the program would execute on both A and B, returning the result of the calculations on B’s data to Computer A. Swarm is designed to manage which software runs with which data on which computer – without the programmer having to think about it beforehand.

Combine this with the latest advances in dynamic allocation of virtual servers according to need, and you start to really chip away at a whole bunch of scalability problems that have traditionally plagued massively-multi-user-applications… that is, Web apps.

Now, here’s the question: CPU latency is measured in picoseconds. Network latency is measured in milliseconds. The question is: How do you figure out what computations will actually benefit from being offloaded to another computer? – i.e., which computations are so far back in the stack that it would be better for them to go for a round trip across the Ether than to just wait patiently for the stack to clear? It seems to me that network latency monitoring would be very important for such an application.

For example, let’s use some of the NetQoS Network Estimation Tools (shameless plug) to determine how fast we can theoretically get a calculation going over the network. So, figuring a router latency of 0.5 milliseconds on both ends, a server latency of 2ms, a link speed of 64000, and a (very short) link distance of 10 miles – you’re looking at 132 ms of latency altogether – assuming point-to-point protocol.

In that 132ms, a 2.4 GHz quad-core computer can perform 1.26 billion calculations locally. That seems like a lot – and it is. But you actually start saving time once you hit 1.26 billion plus one calculations. For some applications, that might be worth it.

But other than pure speed, there’s another reason to consider running Swarm – and that is that applications coded with Swarm should have the ability to continue running on other servers – preserving the application in the case of fault or insufficient resources on the primary computer.

Right now, Swarm is more theory than fact, and there’s a lot of work to be done before it can be practical. But anything that requires less data to be sent over the network is something to keep an eye one when trying to preserve network performance.


Commentary Archives

Billions and Billions


YouTube, according to a blog post by its CEO and co-founder, Chad Hurley, serves up one billion video viewings daily.

And you thought your business had a lot of YouTube traffic!

To keep in mind the rapid growth of YouTube, it took only three years and two months for YouTube to grow a literal order of magnitude.  This article from TechCrunch in July 2006 showed an impressive 100m videos served daily. 

Now, to put that 1 billion number in perspective, the current ratings for the top 20 network primetime series – calculated weekly, not daily, total a mere 298,013,000 viewers. (Though, to be fair, those shows tend to last 44 minutes, not a maximum of 10 minutes.)

Now, you know about the effect of YouTube traffic on enterprise network performance; and the importance of making sure YouTube traffic does not interfere with the mission critical applications.  Network performance monitoring, of course, is essential.

But YouTube’s growth isn’t just a bandwidth issue – it is a great cultural change which heralds as big an impact – if not bigger – than the rise of mass media in the 1930s. 

First, there’s the creation aspect – anyone can create a video and put it online.  There are no guarantees for distribution like there are with mass media, but neither are there barriers to entry to the market.  I myself created a number of short documentaries, and placed them on YouTube.  Some of them reached 100,000 views.  Had I gone the film-festival route, far fewer people – on the order of hundreds, rather than thousands, would have seen the videos. 

But more than that is the idea that viewers are willing to accept that entertainment does not need to be centralized - or displayed on the TV.  The average length of a video on YouTube is 2 minutes, 46.1 seconds.  I don’t have the percentage of videos watched in the U.S. compared to the world, but the U.S. uploads 34.5% of the videos – which is probably a good estimate of consumption as well, barring a better metric. 

So let’s work out the math… 2768333333 minutes of videos daily, 34.5% of which is viewed in the U.S., meaning 955075000 minutes of video daily in the U.S., divided by a U.S. population of 304,059,724… that comes out to 3 minutes, 8 seconds of YouTube watching – or a little more than a video a day, for each American.


This is not a problem that will go away – this is a cultural shift that will only get bigger with time.  Ignoring it will lead to disaster, especially considering that the baby boomers who grew up on TV are leaving the workforce, and the Millennials who grew up on the Internet are entering it.  Managing YouTube traffic requires a proactive approach to traffic monitoring and policymaking. 


Commentary Archives

Jim Metzler on Infrastructure Management Tools and Methodology



 
 


Commentary Archives

Palmskype


Lifesize, a video telepresence maker-of-thingies, just announced support for a new video telepresence thingy called the Passport.

Hook up the Passport to a 720p HDTV (or other HDMI enabled monitor) and a 1mbit/up Internet connection, and you have a teleconferencing system.  Since many places have both HDTVs and 1mbit/up connections, this vastly opens up the number of places that you can do teleconferencing from. 

Certain practical limitations apply, of course.  For example, while it might be theoretically possible to use a bar’s wi-fi connection and HDTV for teleconferencing, I do not think it’s a good idea to do so when a game is on.

The standards list includes a number of protocols, so it should interoperate pretty well with existing teleconferencing equipment – meaning you can have your salespeople check in from the road, telecommuters checking in from home, etc.

One interesting side-note is that the device supports Skype at 720p30 resolution.  I’m not sure what the resolution of Skype phone calls are now, but I think they’re maxed out at 640x480p30 for the desktop client – obviously, that’s going to cause a bit of a bump in Skype traffic for organizations that use Lifesize. (You are monitoring this stuff in your organization, right?)

But there’s another issue with the Skype calling.  That is, Skype has become a defacto standard for teleconferencing among consumers (though they’d just call it “video chat”).  At the $2500 price-tag for the Passport, while it’s pricy, it’s not too pricey for some early adopters who want to use it as a personal telepresence device.  I could see an upper-middle-class family dropping $5000 on it (one on each end) to keep tabs on a kid at college, for example. All of this requires sufficient network performance – about 1mbps, as mentioned earlier – and many “broadband” networks in the U.S. do not have that kind of speed.  At any rate, if these things get popular, we’re talking about increased demand for broadband speeds and increased usage of networks for latency sensitive communications. 


Commentary Archives

Fast* Broadband


*delivered really slowly.


The Washington Post has an article on a phenomenon that we’re all familiar with – that advertised broadband speeds don’t always match up to the actual performance that the end-user actually receives. 



Actual broadband speeds lag advertised speeds by as much as 50% to 80%.

So more than half the time, and sometimes as much as eight out of ten times, consumers are paying for slower Internet access speed than they signed up for.


Now, with congestion, infrequent outages, problems on the other end of the connection, and other vagaries of Internet performance, the fact that a customer’s effective Internet speed varies widely isn’t a surprise. 

What is a surprise is that companies do not monitor the performance of their own networks – or that they do, but give consumers bad data – either promoting a peak speed as the “speed” of the network, or promoting an impossible speed. 

Really, though, do you think it would hurt sales that much to re-label a “15mbps” offering as “7-15mbps?”  (Hmm, maybe it would, if the ISP can’t consistently deliver 7mbps.) 


"This speaks to consumer empowerment. And if you are advertising one speed but delivering another, that takes power away," Kelsey said. "Consumers can't make accurate decisions based on quality of service from one provider off another."


Now, there’s the truth in advertising approach – add qualifications, like a speed range, or parenthetical like 15mbps (during off-peak times) – but I think the “up to” disclaimer is good if there’s someplace – say, the order form for the service, or the company Web site where you sign up for the service – that explains exactly what your real performance is after you sign up, as well as the performance of the average customer at each speed.  Heck, you could even have one of those LED billboards like they have for state lotteries that show you how much that day’s jackpot is worth. 

We’ve talked before about how we believe that broadband caps are not a solution to the problem and would greatly degrade the overall network performance of the Internet.  That’s still true.  We’re especially suspicious of any sort of “gas gauge” that would tell customers how much they’ve downloaded – and nothing else.  But a true network performance monitoring solution, giving ISP customers true information that is actually relevant to their performance would be very welcome. 

Imagine, if you will, if you could go to your ISP’s web page, log in, and get this information:


  • Your average Internet Speed over the past two weeks is X/down, Y/up.

  • Peak Congestion Times are X:00am to Y:00pm

  • X% of your Internet usage occurs during peak times.

  • Your average Internet Speed during peak times is X/down, Y/up.

  • Your average Internet Speed during off-peak times is X/down, Y/up

  • At that average speed, you can video at Xmbps.  This is (low/medium/high) quality for standard definition and (low/medium/high) quality for high definition video.

  • Your latency is Xms round trip to our servers. You can expect (low/medium/high) quality for voice calls and video chat, and (low/medium/high) quality for computer gaming.

  • Recommendations for improving your Internet Experience:


    • Try to watch streaming video during off-peak times, or set your computer to download the video during off-peak times instead. 

    • Set peer-to-peer programs to use less bandwidth during peak hours.

    • Try to find gaming servers located closer to your geographic location to cut down on lag.

  • We noticed a number of anomalous behaviors these past two weeks.  Please check your system for malware and viruses.


    That’s not “techie” information – it’s all information the end-user can use, and it lets the user know exactly what they’re paying for. 


    Commentary Archives

    Cloud of Confusion


    Peter Kretzman at CTO/CIO Persepectives points out a serious problem with tech journalism in his article on cloud computing.  Sometimes the message gets oversimplified. 


    Mainstream media drifts into this oversimplification in part because they’re leery of delving into technical arcania (virtualization, scalable architectures, APIs) that many of their readers can’t relate to. Yet, there’s actually no need, when you try to explain its real impact, to make cloud computing sound geeky and complicated; it’s not, at least at core.


    Kretzman specifically points out examples from Business Week and NPR, which equates consumer-facing Web 2.0 technologies and SaaS apps such as GMail, YouTube, and Flickr to “cloud computing” as a whole.

    To be sure, these applications certainly are “cloud computing,” but it’s just one example of what cloud computing is – and not the best example, because it seems to limit cloud computing to Web-based applications with data storage on the Internet. 

    Cloud computing is more accurately described as renting IT resources when they are needed instead of owning them.  Sometimes you are talking about applications – using Google Apps instead of Microsoft Exchange.  Sometimes you’re talking about servers – using the computing power of big iron to supplement your medium-sized iron for those stubborn tasks, like trying to find the Higgs Boson, mapping known space, or making the best possible Fantasy Football picks.  And sometimes you’re talking about the network – using co-location to lower latency and provide more throughput to highly trafficked sites like CNN.com, YouTube.com, or ICanHasCheezburger.com.

    Hmm… application, server, network… where have I seen that collection before?  Oh – that’s right.  That’s the three sources of network performance problems – sometimes the problem is in the application, sometimes it’s in the server, and sometimes it’s in the network.  The problem with cloud computing from a network performance perspective is that when application, server, and network were in-house, you could find out which one of them was the problem relatively quickly and start fixing it. 

    But I digress.  The real reason I’m worried?  Cloud computing doubles the applications - and sometimes the networks – needed to do business.  Poor application performance could be caused by a poorly coded application – or is Firefox having a memory leak?  And the application is no longer from client to server and back.  Now you have client, to server, to cloud server, back to server, to client.  Fixing performance problems in this environment isn’t impossible, but it requires good monitoring solutions and an even better brain in the engineer doing the manual monitoring.



    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59