Commentary Archives

Cloud 99.999


Gmail is currently down for some of the service’s users – and so the world anxiously holds its breath waiting for the resolution.  This follows on the heels of a larger outage two weeks ago.

It should be worth noting that Google’s paid userbase are covered by a 99.9 percent uptime commitment.  It’s not the usual 5 nines associated with most network applications – although in light of recent events…

So, let’s say that we’re talking a total of two days of downtime over the past two years since it was launched to the public on February 7, 2007.  The last outage had Gmail skating on that 99.9% edge, this outage pushes it down to 99.7%. 

But wait, that outage was the third downtime in six months.

Now, don’t get me wrong, Gmail is a wonderful application.  It’s taken the portability of its predecessors and added amazing flexibility – I can get my Gmail email through the Web, on my iPhone, browse headlines through an RSS feed, download to my home Thunderbird client, and even use it as a way to store all those darn game serial numbers in case I lose the sticker that comes on the case. 

Cloud apps provide anywhere, everywhere access, with zero-brain-required “installation,” configuration, and maintenance.  It does a great job of separating content (the stuff you want) from context (how it is presented) so that you can have it presented in a number of formats.  These are all great things, and explains the appeal of the cloud platform. 

For a home user like me, 98% is reliable enough.  But that’s just it – cloud models (currently) can’t provide the same kind of reliability that we’ve come to expect from LAN and WAN applications.  Google didn’t promise “five nines” – they promised three, and delivered one.   I don’t think that it’s a matter of being oversubscribed or of improper maintenance – Google is Google, for crying out loud – but that a cloud application, by definition, contains all the problems associated with any app accessible over a wide area network (in this case, the Internet) as well as the problems associated with serving multiple customers from multiple locations.  The number of mail users a Google or a Yahoo has to support dwarfs the number of users supported at even the largest of enterprises. 

Moving to the Cloud is basically putting your application into a shared resource pool, and it seems to be part of a larger trend.  Virtualization lets us consolidate multiple servers onto one machine.  WAN Optimization lets us consolidate multiple machines into one data center.  What the cloud does is consolidate multiple companies IT departments into one shared pool.  

The point is – when you go to the cloud, you sacrifice a little reliability for greatly increased flexibility.  This is not to discourage you from making a switch over, but just be aware of the risks and, more importantly, be aware of your needs.  In a cloud environment, monitoring network bandwidth remains important because cloud providers will need tools to assess traffic and end-user responsiveness, so that they can adjust their computing capacity to handle the traffic without expensive unnecessary overprovision.


Commentary Archives

But what I really want to do is direct… packets.


The latest rumors, reported in Techcrunch and other places, imply strongly that Cisco is in talks to buy PureDigital, makers of those little flash-based “Flip” mini-cams. I own three of them myself, but that’s because I like to do things like suction cup them to cars, duct tape them to my helmet while sliding down a 45mph luge, ride with them in a human-sized hamster ball, etc.

The interesting thing about this acquisition is that the Flip camera has greatly simplified the ability for the average user to record and capture high definition video for uploading to YouTube and other sites. Lots of people are doing exactly that – and that’s a lot of bandwidth traveling across the Internet.

Cisco’s interest in Pure Digital may seem a mismatch – Cisco is known as a networking company, where the Flip is a consumer gadget. On the other hand, the Flip is a high-bandwidth gadget – and Cisco can stimulate the demand for its networking hardware and software by stimulating the supply of high-bandwidth applications.

Cisco CEO John Chambers has been aggressively pushing into the consumer space, with some enterprise technologies, such as Telepresence, almost tailor made for the consumer market – assuming you can get the economies of scale to work. Cisco also has a digital media network-attached storage device. The key, it seems, is to get more people using the network and more information on the network in order to feed the need for networking devices. Not so much a “razor and blades” model as a “stubble-growth serum and razor” model.

Of course, Cisco has also bought the consumer-router brand Linksys in 2003, and Scientific Atlanta in 2005. Scientific Atlanta deals mostly in set-tops, cable modems, and digital interactive subscriber systems for VideoIP and VoIP.


Commentary Archives

First Sell, then Code


On The Daily WTF, there’s a great story about a network-oriented coder at the (name changed) “AQ&V.”

Here’s a quick summary:

“Mike” takes a job with AQ&V, and finds that the “network health” team, tasked with finding out about outages and performance problems before the customer, was not actually spending any time resolving issues, because each member of the team was spending all their time using internal apps to enter tickets and generate reports. 

No one on the team actually solved the problems. 

Mike then decided to code a kludge of a script which would automate most of the repetitive tasks that the network team was doing, freeing them up to actually fix some of the problems.  But when he tried to implement it, he found that the password to the development machine had been changed and that management had decided that they didn’t want him to actually solve problems – they just hired him to throw more manpower at entering trouble tickets… without solving them.

In the anecdote, as presented, it’s clear that AQ&V has a lot of problems with network performance – obviously, if you’re not fixing problems, problems will remain unresolved. 

But it’s unsurprising that Mike wasn’t able to make headway in solving the problem – the problem wasn’t just that nothing was actually getting fixed, but that management didn’t care that nothing was getting fixed. 

It’s not just that management blocked Mike’s kludge of a solution – but that they refused to acknowledge that there was a problem to begin with.  What Mike should have done was try to get management on board first, before writing a single line of code.  The first step in trying to improve any process in IT is convincing the people with the power to make decisions that the current process is flawed. 

If Mike was unable to convince management that the process needed to be solved, he would have been no worse off than he was at the end of the story; if he was able to convince management that there was a problem, then he might have been able to bring more resources to bear and instead of writing a kludge, actually spend time and effort on improving the process with a more stable solution to the problem.

We’re technical people and because of that, we tend to think that the solutions to problems are often technical.  The right code, the right script, and the problems are solved.  The problem is that enterprise computing isn’t just technical knowledge, it’s also social interaction – meeting needs, and convincing people that needs exist and should be met.  Because of this, tools that help you take information from your network and present them in a simple, easy to understand way can be just as important as tools which directly solve problems – the former to convince management of the need for the latter. 

This story reminds me of rudimentary ITIL principals and the requirement for the “people” part to participate in the process change. Incident Management may be the first step most organizations get under control – but that’s not just entering tickets.  It’s also finding the quickest way to restore service.  Problem management—solving problems—is much easier when Incident Management is done right because you’ll have priorities and begin to see patterns, but “AQ&V” wasn’t even scratching the surface on Incident Management, so there was little chance to evolve. The situation was like a bad sitcom.


Commentary Archives

“The world will look up and shout, ‘Verify our SLA.’ And I’ll whisper, ‘No.’“


Okay, hard decision time.  Do I go grab the very last ticket to see “Watchmen” at the last showing tonight alone, or do I spend a sleepless night tossing and turning, while having nightmares about blue men and conspiracy theorists as I wait to see “Watchmen” with my friends the next day? 

Ha Ha!  Just kidding!  I don’t have any friends!

It should not surprise anyone that I have read Watchmen, that I consider it the most culturally significant work of comic fiction produced since William Hogarth’s “A Rake’s Progress.”  But it also speaks to me on a personal level.  That is, the core themes of “Watchmen” have quite a lot to do with network performance.

Okay, maybe “network performance” isn’t as much on a “personal level” so much as it would be some sort of romantic relationship, or spiritual experience.  But it’s still important.  Anyway…

Who, indeed, watches the watchmen?  For example, carriers claim that by going to MPLS you would get better performance.  But there’s not a lot of data provided by the service providers to verify those performance gains.  Without that ability to quantify how the carrier network is performing, you have no idea of providers are living up to their service level agreements – or whether you’re better off switching to MPLS in the first place.

Additionally, without watching carefully, you can have well meaning IT teams affecting the network – changes which seem trivial to an application developer may actually cause major repercussions to the network traffic - generating and transmitting graphics required for a CAPTCHA, for instance, when previously you were only dealing with a text-based application.

The most extreme case we’re familiar with is when one team of application developers flipped a single flag in the database, making a graphics field available to the user, sending 1 MB worth of graphics per page on the application.   This one “little” change caused network traffic to balloon dramatically over a single night. 

It is not that carriers or application developers are mean or untrustworthy – just that they may not know the impact of what they do on the network; and as such, you should.

This means, of course, monitoring the overall round-trip time from end-to-end on an application by application basis, and monitoring for sudden changes in the network. 


Commentary Archives

Someone still needs to run the network.


Well, amidst all the bad news about the economy, some good news.  Denise Dubie points out that, assuming that the company in question hasn’t completely failed, “Bear-Stearns” style, high-tech talents such as network engineering remain in demand, according to a poll by Robert Half Technology and Bluewolf.

Network administrators were in demand by 65% of CIOs polled, and while it’s hard to believe, according to Bluewolf, salaries of those with networking expertise will spike in the coming months, with a salary increase of 14%.

Here’s the downside: Project managers will have salary decreases over the next year, according to the same poll.

I know this is a short post, but with so much bad news coming out about the economy, I just wanted to report on something economic that elicited a “Yay!” for once.


Commentary Archives

Network Performance When TV Dies


Over the past year or so, I have become enraptured with the possibilities of storytelling presented by reality television.  Last night, I wrote an e-mail to a pen-pal about how one reality show explored the themes of trust vs. mistrust, selflessness vs. selfishness, and rationality vs. instinct – like I was analyzing a novel for an English lit course.  And no, I didn’t do so ironically.

I know.  I scare myself too.

So I like television, don’t get me wrong; but Paul Graham is 100% right in his latest essay: “Why TV Lost.” 


It's clear now that even by using the word "convergence" we were giving TV too much credit. This won't be convergence so much as replacement. People may still watch things they call "TV shows," but they'll watch them mostly on computers.


For me, this has already happened.  Over the past year, there have been two shows I’ve watched “live” instead of waiting for the video stream to appear online – “The Mole,” and “The Amazing Race.” The only reason I’ve bothered doing so (and I use that verb because it is a bother) is simply because I want to be able to immediately join in the online conversations and social applications of my fellow fans after air.  To me, the television is the beginning of the whole entertainment experience – and the least essential part.  The most essential part is the network – the interconnections between people that makes everything fascinating.

For most shows, however, I don’t bother turning on the television at all – I watch them on the computer.  I have a 46” HDTV, but 99.404% of the time, it’s acting as a computer monitor – not as a television.  For me and others of my generation, “TV” ceases to lose any meaning – they’re the same bits as text, as pictures, as music, as games, and as computer programs. 

In Graham’s essay, he goes into the details of why the move from TV to computer is irreversible, but it mostly boils down to that A) broadband made video downloads possible, B) video downloads are more convenient, C) computers allow you to connect to other people and converse with them, something television has only ever been able to create a weak facsimile of. 

Now, as far as network performance goes, A and B are the most important.  We expect the ability from broadband connections to get video – whether YouTube or iTunes.  And one of the reasons that video communication is the “next big thing” in the enterprise is that because we’ve had years of experience with video IP through YouTube and other sites.  Video has become so ubiquitous on the Web that blocking video in order to conserve bandwidth often means that information is simply more difficult to get to, and we no longer think of network communication as “VoIP” – we now think in terms of unified communications.

From a network engineer’s standpoint, the technology in the home determines what people expect from technology in the workplace – anything less than that, and to the end-user, “the network is slow.”

It is C, however, that is the most interesting from a sociologist’s standpoint.  Remember how I said that I had become a fan of reality television.  One of the strange things about this genre of entertainment is that unlike the big sitcom star, or the talk show host insulated from people and press through an army of spokesmen and mail-openers, is that many of the “reality stars” and “pseudo-celebrities” are much more open and engaging with the audiences that watch them.  Check out a reality TV forum – stars appearing on TV are there, communicating with the audience directly, using the Internet to create interactivity. 

Decry reality TV as lowbrow if you want, but the shift from the tube to “YouTube” is outright fascinating.  Maybe Reality TV has gotten a foothold because it is extremely cheap to produce compared to other fare, but it, perhaps more than any other genre, starts to get what people want – they want interconnectivity and conversations.

And because people want interconnectivity and conversations, people expect sufficient network performance to telecommunicate – video and audio – with whomever.  The most dreaded foe in any online game is “lag,” because poor network performance kills our ability to communicate.

During the editorial process for this post, the theory was advanced that reality TV succeeds because people will watch whatever is on television, no matter what’s on television.  I don’t think that’s so – and if it might have been true at one time, I do not think that it is true today for those who prefer to get video entertainment via the computer.  We are the “conversation generation,” and we have created a new media to fill our needs.  The question is whether we can continue to keep the network supporting our conversations.


Commentary Archives

Jimmy Ray Purser doesn’t like Network Management Software


He thinks it sucks.

No, correction. According to his latest post in Network World, he thinks it “Sucks!!!” With three exclamation points.


What is it with NMS that feels like were are [sic] riding in the back seat from Wisconsin to Florida with our stinky second cousin Bert. Anytime, I sit in a vendor meeting and they are trying to hock their NMS off on me I can picture signs for "See Rock City" or "Wall Drug" for the east coast to west coast survivors.


As a blogger for a company that makes network management software, (well, technically, network performance monitoring and management software), I figured I should respond.


But still, NMS promises are kinda like being chased by a [sic] angry Shih Tzu with your friends watching. With networks becoming more and more application layer driven, we need something with a little more power. For example;

- Flow based management is cool but what if I need more then [sic] just the conversation, I need packet capture/inspection THEN correlate those together with a verifiable SLA? Now what?

- 1GB 10GB 40GB 100GB? How do I monitor that? Not with a plain Jane NIC for sure. Heck at 1GB my NIC buffer size is only 64K which is just fair for 100M. Plus add in fragmentation and CPU interrupts and you can see your accuracy goes down fast. What's that? Your are using jumbo frames also...oh man...


Tough call, but I think you’re probably looking at something like a combo of NetQoS SuperAgent to handle the flow data, and NetQoS Gigastor to do the packet capture/inspection and monitor the large links through storing the data so that problems can be examined for a short time after they’ve happened. On a 100GB link, you’re talking hours rather than days, but still.

Now, this isn’t a pitch, (I don’t trust Shih Tzu dogs,) but seriously, we were wondering about the same problems, and didn’t have a solution until we integrated Gigastor into our suite.

But that’s not really the point, the point, Jimmy Ray. (Do you mind if I call you Jimmy Ray?) The point is partially that what you’re saying about network management software doesn’t jive with the numbers we’ve got on our end.

We use maintenance renewal rates as a way to benchmark customer satisfaction. If customers think our product “sucks,” they’ll typically get someone else that doesn’t “suck.” Last quarter, the renewal rate was north of 90%. This is pretty good considering the fact that the economy is so bad that dogs are beginning to worry about inflation.

More than half of these customers own multiple products as well – no single metric is adequate, as you pointed out, but through a combination of metrics, you can get the data you need.

What I’m trying to say, Jimmy Ray, is not that you’re wrong about saying that Network Management Software sucks. What I’m trying to say is that if we do suck, we’re sucking in such a way that it’s a blind spot for us – we thought we were doing an excellent job.


Commentary Archives

Stimulus, Response


Okay, you know the President’s got some serious chops when you’re out at Fry’s on a Tuesday night, frantically searching for a mini-plug to RCA adapter cable, and everyone – customers and staff, seems to be crowded around the big screen TVs watching a Presidential address.  Love or hate his policies, the guy can draw an audience.

We’re trying not to step on anyone’s toes here and take political positions that could needlessly alienate anyone in the audience that holds strong opinions.  (I, for example, remain a staunch supporter of the McGillicuddy Serious Party, despite the fact that has been completely disbanded.) But this blog is all about network performance, and we’ve talked repeatedly (here, here, here, here, and especially here), about improving U.S. broadband performance and availability.  If a politician comes out and says: “Hey, we need to improve the performance of the Internet,” there shouldn’t be anything wrong with giving them their due. 

Now, is the President’s way of improving broadband the best way to improve broadband?  Hell if I know. 

What I do know is that $7.2 billion of the $787 billion stimulus bill has been allocated for broadband expansion

Popular Mechanics has an in-depth article on the broadband improving funds – with some interesting conclusions.  For example, the Pew Research Center found that broadband adoption is around 55 percent, yet the cable industry claims that 92 percent of homes have access to its high-speed internet service. 

Glenn Derene at Popular Mechanics concludes, therefore, that “broadband stimulus in America has less to do with pushing cables out to rural areas than it does with finding a way to make broadband more affordable.”


“The biggest problem with broadband service in America is not a lack of availability, it's a lack of competition. Most users have only one or two options for service, and while prices have come down slightly, they are still relatively high for Americans who feel increasingly pinched. Pew's study found an average monthly broadband bill of $34.50, down 4 percent from the previous year, but it also showed a gradual migration away from cable service, which tends to be faster and more expensive, to cheaper and slower DSL service. So the broad language could end up defining an "underserved" area as simply an area without enough competition to make service affordable”


PM also points out that broadband is important for consumer devices, personal cloud computing (think Google Docs, Mapquest and the like), the filing of governmental forms online (cheaper for the taxpayer than paper forms) the dissemination of news via YouTube and blogs, the access of electronic medical records, the tracking of energy usage, etc.  Ultimately, anyone unable or unwilling to use these services essentially lose out on a first-world lifestyle – and ultimately, as PM puts it…


“…Americans who either cannot afford or are not capable of using reliable, robust Internet connections are at risk of becoming increasingly marginalized as citizens.”


Commentary Archives

Protection on the network allows for more on the desktop


Today was a frustrating day. I was hoping to have a video to show you but the best laid plans

Essentially, I was thwarted because part of the video would have involved using one of the company’s projectors, and it seemed that the computers I had administrative access to didn’t have sound, and the computers that had sound didn’t grant me administrative access – access I would need to install Macromedia Flash.

(I’m appalled. I mean, why don’t I have root access to all the company’s computers? I mean, I’m the company blogger, for crying out loud. Without me, the company wouldn’t have a blog, and without the blog the company would… uh… erm… maybe have a Google Pagerank of 5 instead of 6? Anyway, the point is, I provide a highly demented demanded service for the company…)

Anyway, we’ll have to reschedule that video. But it does make me realize the relationship between network performance and network security – and that is, that if the application your end-user needs will not run, you’re essentially looking at an effective network performance of zero percent for that application. While there are tons of monitoring solutions for determining up/down status, and one really groovy monitoring solution for application performance, not being able to use the application in the first place is a frustration that probably won’t show up in the NOC, but is there nonetheless.

It also makes me think about the “cry wolf” scenario with security. Look at the dreaded UAC for Windows Vista. I have a Vista box. First thing I did was disable UAC. (Second was install Firefox.) Too often when people interact with computer security, it is preventing them from doing something useful, rather than protecting them from something harmful.

So any time that you can take security away from the forefront and put it in the server, the less people have to interact with it, the less they get pissed off and the less likely they are to ignore warnings when something bad does happen. This, of course, requires keen insight into the behavior of the network and ways to detect anomalous behavior in real-time.

As for my personal problem with computer security – it’s not really that bad. I consider myself lucky that that’s the worst I have to deal with. I’ll just send in an IT request to set up the audio on the projector next week.


Commentary Archives

Spectrials and Spectribulations


Australia drops Internet filtering plans, New Zealand backtracks on S92, and in Sweden, the Pirate Bay is on trial. 

The world is interesting these days.

First, Australia: According to the Sydney Morning Herald, the Australian Government’s plan to introduce mandatory Internet filtering (which we’ve covered previously) has been effectively defeated when independent Senator Nick Xenophon switched to supporting the Opposition/Greens coalition.  And the Herald was not pulling punches on the measure’s failure.


The Communications Minister, Stephen Conroy, has consistently ignored advice from a host of technical experts saying the filters would slow the internet, block legitimate sites, be easily bypassed and fall short of capturing all of the nasty content available online… Even the trials have been heavily discredited… Senator Conroy originally pitched the filters as a way to block child porn but - as ISPs, technical experts and many web users feared - the targets have been broadened significantly since then….

This week, a national telephone poll of 1100 people, conducted by Galaxy and commissioned by online activist group GetUp, found that only 5 per cent of Australians want ISPs to be responsible for protecting children online and only 4 per cent want Government to have this responsibility.


As our interview with Mark Newton last October, a network engineer from Australia, showed, the problems with the filters were twofold; in addition to blocking legitimate content, it also degraded network performance severely. 

One of the problems was the mission creep of the blacklist, which started with “illegal content” but broadened to include mature but legal content, and last week, there was an uproar after it was discovered that an anti-abortion group Web page was placed on the blacklist

Meanwhile, across “the ditch,” in New Zealand, the National/ACT/United Future coalition government has delayed action on a piece of legislation known in shorthand as “S92” – Section 92 of the Copyright Amendment Act.  This law would cut off Internet access for people accused of copyright infringement.  The accusation would not need to be proven, and Section 92 contains no punishment for a mistaken or malicious accuser. 

In response, protestors started an “Internet Blackout” protest, with black protest signs, completely black Twitter profile icons, and black Web sites.  When the law was delayed, the protestors returned their icons.  Media 7 News Reporter Russell Brown wrote about the incident on his blog, “Hard News.”


The protest virtually came out of nowhere last week -- it was conceived in a schoolroom in Warkworth on the Saturday, and enacted on the Monday. The mobilisation involved was really remarkable.

The protest was also a success because it has fostered a new voice on copyright and creative issues; one that isn't an industry or technical body. The Creative Freedom Foundation's petition has 12,000 names -- and 8000 of the signatories identify themselves as artists. That's remarkable. It has attracted worldwide attention.


And finally, in Sweden, the Pirate Bay trial is ongoing.  The prosecution had dropped about half of the charges by the second day of the trial, and two days ago, the charges in the case were altered, making the case look more and more likely to go in the Pirate Bay’s favor.

The thing that interconnects all of these cases is that I think this – and I may be wrong here – I think this represents a trend that the public is becoming more informed about the impact of legislation on the Internet’s capabilities and on network performance specifically.  People are becoming informed – and more importantly, active and engaged in Internet issues, to secure a network that performs to the best of their capabilities. 

Of course, I could be wrong. 




1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59