Add a comment
by Jeff Hicks
Voice over IP (VoIP) is a hot topic in enterprise networking - mostly because it's a challenge. In implementation, VoIP employs a number of different protocols, and has a unique set of performance requirements that make it a challenge for any data network. Examining VoIP protocols should give someone a basic idea of the performance requirements that VoIP places on the network.
First, there's call setup, which sets up everything needed to make the telephone connection between the caller and the recipient (or “callee”). This requires protocols that enable dial tone, number lookup, ringing, and busy signals before the call even occurs. In addition, the call setup protocols handle things that happen after the call - any resource cleanup and statistic reporting.
Call setup protocols use either TCP or UDP to transfer data during the setup and takedown phases of a telephone call. The messages are sent back and forth between the caller, recipient, and call server using well-known ports. For calls that travel between the VoIP network and the Public Switched Telephone Network (PSTN), the call server will converse with a VoIP gateway using the call setup protocol. There are many different call setup protocols, some standardized and some proprietary. Let’s discuss a few of these.
(Continued...)
H.323, a family of telephony-based standards for multimedia, is widely deployed and is the oldest of the call setup protocols. It commonly runs on VoIP gateways to connect the VoIP network to the PSTN.
H.323 has been refined over many years. As a result, it is robust and flexible, but one cost for its robustness is high overhead: a calling session includes lots of handshakes and data exchanges for each function it performs. It uses TCP for communication, and setting up a call with H.323 can require many back and forth TCP flows. It can also require additional configuration on the VoIP gateway because the gateway maintains information about how calls are routed.
The Media Gateway Control Protocol (MGCP) is another commonly used call setup protocol, using UDP on port 2427. MGCP differs from some other call setup protocols in that it is not typically used by the phones to control the call. It’s more commonly used to let a call server control a VoIP gateway to the PSTN. Because the call server controls the gateway, the bulk of the call control intelligence and routing information resides in the call server, instead of the gateway.
Session Initiation Protocol (SIP) is a lightweight protocol that does much of the job of H.323 with much less overhead. More vendors, including Cisco and Avaya, are offering SIP phone/endpoint support. SIP client interfaces are shipped with Microsoft Windows XP.
Although SIP can use either TCP or UDP for transport, most implementations use TCP and port 5060. SIP messages are text based and generally follow a request-response structure – like HTTP.
In addition to those standardized call setup protocols, certain vendors have provided their own proprietary protocols. One example of this is Skinny Client Control Protocol (SCCP). SCCP provides a simple, lightweight call setup protocol for Cisco devices and passes messages using TCP and port 2000.
There is no single dominant call setup protocol today, however, the current trend is moving towards SIP as the call setup protocol of choice.
Next are the protocols involved in the VoIP conversation. The conversation portion of the call must be converted from analog to digital, translated into packets, sent across the network in packet format, reassembled, and converted from digital back to analog.
Codecs encode and decode both ends of the conversation to allow the conversation to be sent and received across the network. Different codecs have different bandwidth requirements and different characteristics that can impact network performance. Some codecs, like G.711, employ no compression schemes, which requires more bandwidth. Other codecs, like G.729, use lossy compression which reduces the amount of bandwidth required, but can impact the voice quality.
Once the codec has its payload ready, it's up to another protocol, the Real-time Transport Protocol (RTP) to transfer data to its recipient. RTP is a dominant protocol and is used almost exclusively for the transfer of VoIP conversations (with the exception of Skype, which uses proprietary protocols.) Widely used for streaming audio and video, RTP is designed for applications that need real-time performance to send data in one direction with no acknowledgments.
Since a VoIP call is bidirectional, there are two RTP streams carrying the conversation, one in each direction. The path that these RTP streams take through the network and the impairments encountered along the way are important factors in determining the quality of voice conversations.
RTP is an application protocol that uses UDP for transport – all the fields related to RTP sit inside the UDP payload. These fields specify the payload type, the sequence number (to help the receiver reassemble data), a time stamp, used to help determine jitter and reconstruct the timing of the original message, and a source ID, which allows the software at the receiving side distinguish among multiple, simultaneous incoming streams.
While the RTP header is important to support the real-time nature of the protocol, the accumulation of headers adds a lot of overhead - especially considering the size of VoIP codec payloads. A typical packet payload size in G.729, for example, is 20 bytes. To transfer those 20 bytes of data, the packet has to have 20 bytes of information for IP, 8 bytes of information for UDP, and 12 bytes of information for RTP – bringing the package total to 60 bytes – and that’s before you add Layer 2 headers, such as Ethernet drivers, which usually add another 18 bytes.
This means real bandwidth consumption can be higher than it first appears. The G.729 codec, for example, has a data payload rate of 8 kbps. But when sent at 20-ms intervals, its payload size is 20 bytes per datagram. To this, add the 40 bytes of RTP header and any additional Layer 2 headers mean that the bandwidth required is really around 31.2 kbps!
You may ask, why don't all codecs just increase the speech packet size to produce larger datagrams so that header overhead is not an issue? The problem is that increased speech packet size adds more delay, and that can have a negative impact on call quality. In addition a larger speech packet size places more voice data in a single datagram, which, if lost, can have a negative impact on call quality as well.
Jeff Hicks is a software architect at NetQoS
Technorati Tags: IT+management RTP VoIP Networking Network+Engineering IT
![]() |

