SIP& VoIP: Assessing the Security Aspects of SIP-based Communications

[SIP]
[VoIP]
[Computer Systems Security]
[Networking]
[Internet]

Abstract

The ubiquity of the Internet has driven the development of many disrupting technologies in the last decades. Transitioning communication systems from circuit-switching to packet-switching made sense and was a natural step in the evolution of communication technology. The transmission of voice and multimedia over the global network depends on reliable and sophisticated protocols, capable of, among other things, discovering, authenticating, and linking communication endpoints. However, the excitement and demand that surrounds emerging technologies are almost always accompanied by security concerns. In this context, this research will explore the Session Initiation Protocol (SIP) and its applicability towards Voice-over-IP (VoIP) solutions, while assessing the security aspects of Denial of Service Attacks (DoS) on SIP-based systems.

SIP& VoIP: Assessing the Security Aspects of SIP-based Communications

The Internet has enabled individuals and Enterprises with instant access to all kinds of media. The now retiring traditional telephony networks are being replaced by Internet-based Voice over IP (VoIP) systems that offer equivalent, if not better, audio quality, compact and sometimes cheaper equipment, incredibly low calling prices, and innumerous extra features such as automatic call answering, call blocking, call forwarding, call waiting, fax capabilities, multi-way calling, contact lists, distinctive ringing, local number portability, voicemail, and many more. However, since data networks were not originally designed for real-time media-rich telephony services, new communication protocols and Internet standards had to be defined and enforced to enable the migration from circuit-switched to packet-switched voice transmission and conferencing services.

The need for a new standard protocol, that could enable extended interactivity through real-time media and data sharing by managing how computers find and connect to each other, pushed the creation of the Session Initiation Protocol (SIP). SIP is a signaling protocol, currently defined by the Internet Engineering Task Force (IETF) RFC 3261, capable of setting up, maintaining, and cleaning up the established sessions between computers. Other protocols exist for supporting VoIP implementations (i.e. H.323, H.248, and etc…), but they are not as simple, robust and popular as SIP, which is why SIP became the focus of this research.


SIP, however, does not support all the existing features of the VoIP systems by itself. Instead, it collaborates with many other standards (e.g. protocols, processes, and services) to accomplish this task, such as Transmission Control Protocol (TCP), Network Time Protocol (NTP), User Datagram Protocol (UDP), Stream Control Protocol (SCTP), Domain Name System (DNS), Hypertext Transfer Protocol (HTTP), Trivial File Transfer Protocol (TFTP), Simple Network Management Protocol (SNMP), Dynamic Host Configuration Protocol (DHCP), Resource Reservation Protocol (RSVP), Session Description Protocol (SDP), Real-time Transport protocol (RTP), Transport Layer Security (TLS), and etc.

This paper will describe in details how SIP supports VoIP implementations, while assessingthe security aspects of Denial of Service Attacks (DoS) in SIP-based systems.

The SIP Protocol

The dependency of the SIP protocol on other Internet processes and protocols to support VoIP applications is present since its origins. According to Porter, Jr., & Baskin (2006), the original draft for the Session Invitation Protocol (SIP v1) was submitted on February 22, 1996 to the IETF by Mark Handley and Eve Schooler.

On that same day, Henning Schulzerinne submitted to the IETF a draft specifying the Simple Conference Invitation Protocol (SCIP) – a protocol used to enable session management for point-to-point and multicast sessions.

SIPv1 – The Session Invitation Protocol


SIPv1 was text-based and used UDP as its transport protocol. It used the Session Description Protocol (SDP) as the basic mechanism to manage multimedia sessions. On SIPv1, workstations needed to be registered against an address server before sessions could be established. If the user was working remotely or in a different workstation from his own, that workstation could be temporarily registered to the address server, enabling users to establish sessions with other registered computers. However, there was no way to know if the user was available for connection establishment. If he was available, once a session was established, the work performed by SIPv1 was done. At that point, SIP v1was still very immature as a signaling protocol, accounting only for session establishment among computers. There were no conference controls available and no mechanism to tear down the sessions.

SCIP was HTTP-based and used TCP as its transport protocol allowing for both synchronous and asynchronous communications. SCIP had a more universal approach for identifying registered computers. It used e-mail address like identification instead of plain IP address as workstation identifiers within the registration servers. This allowed for greater mobility over both synchronous and asynchronous connections. It also defined a new format for handling multimedia sessions. SCIP extended beyond the session establishment by defining mechanisms to modify session parameters in active sessions. SCIP was also responsible for closing the session and “cleaning up the house” (i.e. resetting parameters and deallocating used resources) once the session was over.

SIPv2 – The Session Invitation Protocol


By the end of 1996 both the SIPv1 and SCIP protocols were merged to give birth to the Session Initiation Protocol (SIP) – also known as SIPv2. The SIPv2 protocol was HTTP-based and allowed for both UDP and TCP sessions, using SDP to describe multimedia sessions. By December 1997, to easy the work of reviewing and enhancing the protocol, the IETF decided to split SIPv2 into a basic specification and a set of extensions. By February 1999, the protocol as published for the first time as an IETF standard (RFC 2543).

RFC 3261 – The current SIP standard

On June 2002, after the review of the RFC 2543 was completed, the SIP protocol was republished as an IETF standard (RFC 3261). The new SIP protocol was based on HTTP and SMTP and used TCP, UDP, and SCTP as transport

According to IETF.com (2002), the new SIP standard was responsible for establishing, managing, and terminating multimedia sessions, among multiple computers, besides locating and inviting computers (participants) for these sessions. RFC 3261 summarizes the inner-works of the SIP protocol as follows:

"SIP invitations used to create sessions carry session descriptions that allow participants to agree on a set of compatible media types. SIP makes use of elements called proxy servers to help route requests to the user's current location, authenticate and authorize users for services, implement provider call-routing policies, and provide features to users. SIP also provides a registration function that allows users to upload their current locations for use by proxy servers…”(IETF.com, 2002)


The computers participating in a SIP session are identified by their Universal Resource Identifier (URI), based the following formats: sip:username:password@host:portfor unencrypted sessions or sips:username:password@host:portfor TLS encrypted sessions. Applications running in these computers (i.e. messaging, games, VoIP, media-rich conferences, and etc…), which depend on real-time communications can use SIP to interact and share data with each other. SIP, however, is not responsible for the actual transport of the media streams.

“The Real-time Transport Protocol RTP is an application-level protocol which is intended for delivery of delay-sensitive content, such as audio and video, through different networks.” (Falk & Fries, 2008) “The data transport is augmented by a control protocol (RTCP) to allow monitoring of the data delivery in a manner scalable to large multicast networks, and to provide minimal control and identification functionality.” (IETF.com, 2003) Thus, SIP uses the RTP protocol for transport of real-time media.

Once the SIP sessions are established, a direct virtual path, supported by the RTP protocol, is established between the endpoints of the session. This way, media is exchanged in real time through the RTP path while signaling messages are exchanged in parallel using the the SIP protocol throughout the life of the session.

Now that we learned a little bit more about the evolution of the SIP protocol and its current implementation we are ready to start investigating more deeply the protocol. To avoid any confusion among the different SIP standards, the term SIP will be used from now on to define the current version of the SIP standard, as published on IETF’s RFC 3621.


SIP and the Open Systems Interconnection Model (OSI)

The OSI model is used to describe communication protocols through a set of layers, specifying how the various protocols are capable of exchanging data among computers within a network and how the protocols relate to each other. The SIP protocol is considered as an application layer protocol. This means that user applications interact directly with the protocol to make use of its functionality.

SIP Architecture


As mentioned before, the rich set of features available through SIP to support VoIP applications is dependent on many other protocols, services, and network devices. To better understand the security vulnerabilities of the SIP protocol and VoIP applications, we need first to understand the SIP architecture.

SIP Components


Following is a brief description of the components that integrate the SIP architecture:

     - User Agents (UAs): the participants of a session (i.e. software application – softphone, PDA, IP phone, connection, and etc.). UAs can communicate with other components through either client/server or peer-to-peer architectures. Each participant is equipped with the following components:

          o User Agent Client (UAC): subcomponent responsible for issuing SIP requests.

          o User Agent Server (UAS): subcomponent responsible for responding to UA requests.

     - SIP Servers: computers capable of locating and identifying participants, besides facilitating (routing) the exchange of SIP messages among them. These servers can run in either stateless or stateful modes, caching or not all the SIP messages they process. SIP Servers can communicate with other components through either client/server or peer-to-peer architectures.

          o Registrar Server: stores the UAs’ locations of the active computers on the network, based on their IP address and logged username. The registrar server also provides participant information to other servers through a service (daemon) known as Location Service.

          o Proxy Server: forwards requests (SIP messages) among UAs and SIP servers. The proxy server makes use of the information stored in the Registrar servers to locate the participants and verify their statuses (i.e. online, offline, away, busy, and etc…).


          o Redirect Server: redirects UACs requests to the UAs they are trying to connect to. Instead of forwarding messages as the proxy server does, the redirect server returns to the UAC the URI identification of the target UA and tell the UAC to contact the UA directly (in a peer-to-peer way). If the targeted UA is a participant registered in more than one location, the redirect server will split the session to the various UA’s locations. For example, if an UAA placed a call to a multi-location UABn (where “n” is the index of the targeted location), the phone on all the targeted UAB’s locations will ring. Only the first participant who accepted the call (for example, UAB0, UAB1, or UAB2, in the case of a participant registered in 3 different locations) will be used for the session, the other UAs will be discarded for that specific session.

IP Requests and SIP Responses


The HTTP-like characteristic of the SIP protocol allows for text-based messages that are exchanged among UAs (UACs and UASs) and SIP servers. Following is a list of the main SIP messages:

     - REGISTER: used by the UAs to register both the SIP and IP addresses with the SIP Registrar Servers.

     - INVITE: used to establish SIP session between UAC.

     - OPTIONS: used to query the UAs capabilities (i.e. session parameters, media types to be exchanged, and etc…).

     - NOTIFY: transmits information about the message’s originator UA.

     - SUBSCRIBE: used to verify presence status of a specific UA (i.e. online, away, offline, available, busy, and etc…).

     - ACK: used to acknowledge the exchange and correct processing of SIP messages.

     - BUY: terminates a session.


     - CANCEL: cancels a pending request without terminating the session.

After SIP messages are sent by either the UAs or SIP servers a response is always sent back to the message’s originator. These responses, also known as SIP responses, are grouped as follows:

     - (1xx) Informational responses: informs the sender that its request was received and is being processed.

     - (2xx) Success: informs the sender that its message was acknowledged and accepted.

     - (3xx) Redirection: informs the sender that its message redirection may be necessary before its SIP message can be processed.

     - (4xx) Client Error: informs the sender that its request was malformed and cannot be processed.

     - (5xx) Server Error: Informs the sender that his request cannot be processed by one of the available SIP servers. The message may be forwarded to the next active and available server for processing retry.

     - (6xx) Global Error: Informs the sender that his request cannot be processed by one of the available SIP servers. The message will not be redirected for reprocessing.

Please note that the list of specific SIP responses is huge. Thus, only the main response categories were described and the full list was omitted for brevity reasons.

Security Concerns on VoIP and SIP-based implementations


The incredible features provided by VoIP implementations far outpace the features offered by traditional circuited-switched telephony systems. However, “although VoIP was designed to be secure, VoIP technology faces many security threats nowadays.” (Garuba, Jiang,& Zhenqiang, 2008) Also, with the introduction of VOIP, “the need for security is compounded because now we must protect two invaluable assets, our data and our voice” (Nist.gov, 2005)

The dependence of VoIP on the SIP protocol and SIP extensions makes it vulnerable to most network attacks. As an application layer protocol, the SIP protocol is dependent on the lower layers of the OSI model, which makes it susceptible to any weaknesses that may affect these layers, including, among other things, Internet Protocol (IP) infrastructure vulnerabilities, operating system vulnerabilities, configuration vulnerabilities, and general network attacks. For the purposes of this research, we will focus on the effects of DoS attacks on SIP-based systems.

Denial of Service (DoS) Attacks


Denial of Service attacks are carried out by overloading the network components, in this case, UAs, SIP servers, and other network devices, with malicious traffic causing disruption of network services, degrading Quality of Service (QoS) and, occasionally, bringing down the entire network. Attacks may target any active port, through either DoS or Distributed Denial of Service (DDoS) attacks. In VoIP and SIP-based implementations, attackers may target specific ports, such as:

     - 5060 port: used for non-encrypted SIP traffic for both TCP and UDP protocols.

     - 5061 port: used for SIP traffic encrypted with TLS for both TCP and UDP protocols.

it is worth to note that, depending on the network configuration and branding of the network components, the port numbers may vary.

General DoS Attacks


Following is a list of some of the main DoS attacks:

     - Ping Flood Attack: ports are overloaded with Internet Control Message Protocol (ICMP) Echo request packets, normally used when the attacker has more bandwidth than the victim. Targeted ports may then respond with ICMP Echo Reply packets, which consume both bandwidth and CPU cycles.

     - Ping of Death (PoD) Attack: ports are overloaded with malicious traffic, crafted to ensure that packets with sizes bigger than 65,535 bytes. According to IETF.com (1981), the IP protocol (RFC 791) would be violated if a packet with such characteristics is sent through the network, unless the pack is sent in a fragmented format. Once the packet arrives on the port and is reassembled by the targeted computer, buffer overflow can occur causing the system to crash.

     - Smurf Attack: this attack targets the public Internet by exploiting and flooding poorly configured network devices with maliciously crafted Internet Protocol (IP) packets. These packets are broadcasted to all linked hosts on a particular network via its broadcast address in an amplified manner, instead of targeting a specific host. This quickly consumes network bandwidth, blocking the processing of valid packets.

     - SYN Flood Attack: ports listening over the TCP protocol are overloaded with crafted SYN messages, which are normally responsible for establishing the connection between a client and a server machine. Once a server receives a SYN message, it extracts the IP address of the message’s sender and prepares to establish a connection. In this case, since the sender’s address is forged, the acknowledgement message to establish the connection never returns to the server, causing it to be in a“wait” state. If too many connections are opened, the server will cease to accept connection requests for as long as the attack continues.

     - Permanent DoS (PDoS) Attack: exploits security flaws of the target machines, generally embedded devices with lower internal security, to gain administrative access to its interfaces. Once access is acquired, the attacker updates the firmware of networked devices with malformed images, breaking the devices.

     - Banana Attack: it overloads communication ports in targeted devices with a feedback mechanism, were outgoing traffic from the target is redirected to itself or to other devices in an out-of-sequence way, affecting and possibly disabling external access to these devices.

     - Distributed Reflected DoS (DRDoS) Attack: Echo broadcast requests are crafted with a spoofed sender address, which will be the IP address of the targeted device, and sent to a huge number of computers on a poorly configured network. The computers are expected to reply to the targeted machine, flooding it and disrupting its operation. This attack is also considered as an amplification attack.

     - Malware propagation: viruses and worms may cause DoS or DDoS attacks on QoS dependent networks, as a consequent burst of network traffic due to the malware’s replication and propagation efforts.

     - Unintentional DoS: the targeted computer is flooded by a sudden burst of requests, as a result of unpreparedness to handle unforeseen popularity of its services due to extensive or even unexpected advertisement campaigns.

From General DoS Attacks to VoIP and SIP-based Implementations


The list of possible DoS attacks is big and its effects can be even bigger on the enterprises and individuals which businesses depend on VoIP and SIP implementations. The DoS attacks described above can be easily adapted to target the specifics of VoIP environments. To illustrate the equivalence of the general DoS attacks to VoIP, let’s consider a few examples:

     - PoD Attack Targeting UA devices: as presented on the general description of the PoD attack, or Ping of Death attack, maliciously crafted data packets with sizes bigger than 65,535 bytes could be sent over UDP to the port 5060 of the UA devices (i.e. IP phones), causing them to crash.

     - Malware infection on VoIP environments: since both SIP and RTP – the supporting protocol used to perform the actual exchange of real-time media streams among UAs, are extremely time-dependent (QoS dependency), a malware infection could affect the network’s quality of service by increasing the network traffic as a consequence of the malware’s replication process.

     - Registrar Server Flooding Attack: by capturing and resending SIP REGISTER messages to the Registrar Servers, these servers can reach the limit of allowed registrations per second, becoming unable of registering new UAs and consequently blocking the SIP protocol from establishing new sessions for these UAs.

     - Packet Replay Attack: By capturing and resending SIP packets in an out-of-sequence format an attacker can add delays to the VoIP network, affecting the quality of service. This attack is similar in many ways to the previously described Banana attack.

     - DoS Attack against SIP Supporting Services: by attacking SIP supporting services (i.e. DHCP, DNC, and etc…) an attacker can affect the SIP protocol and consequently the VoIP application. For example, In a DHCP-based network, if the DHCP server is attacked and disabled, UAs and SIP Servers will become unable to exchange date, maintain, or establish sessions new sessions.

     - Deceiving DoS Messages Attack: by sending valid but fake SIP messages to UAs and SIP servers, it is possible to cause busy conditions and session disconnections in the very same way DoS SYN attacks work.

     - SIP Packet of Death Attack: by flooding the UAs and SIP Servers with random and out-of-sequence TCP, UDP, and ICPM packets, the VoIP network can collapse due to increased CPU usage, bandwidth consumption, and blocked services.

These are just a few examples of attacks derived from the extensive list of general DoS attacks that are used against VoIP and SIP implementation. In fact, this list does not intent, by any means, to be a complete reference.

DoS Attacks – Countermeasures


To realize the benefits of VoIP and SIP-based implementations, it is necessary to enforce the security of the underlying network infrastructure and supporting protocols. In both IP-based networks and SIP-based networks, security starts by assessing, formulating, implementing and enforcing proper security baselines. Even though DoS attacks are generally very effective against QoS dependent networks, general countermeasures exist and must be exercised. Following is a list of possible countermeasures:

     - Ensure proper configuration and patching of UAs, SIP Servers, user applications, and other network devices in accordance with well-established security baselines.

     - Separate network domains into independent logical groups, for both media and data traffic, through the use of Virtual Local Area Networks (VLANs), which simplifies the security configuration process, rendering each group relatively immune to DoS attacks within other logical groups.

     - Enable, whenever possible, port security and Media Access Control (MAC) Address filtering on all distribution switches.

     - Disable, whenever possible, unnecessary services and ports in all devices that comprise the VoIP network. Deploy and enforce the concepts of Least Privilege to all the devices that comprise the VoIP Network. Also, “carefully consider the impact of blocking services that you may be using.” (Cert.org, 2003)

     - Enforce, if possible, the concept of sync holes, where malicious data can be forwarded and isolated in logical groups to be analyzed, processed and discarded without adding any delays to the pertinent VoIP logical groups.

     - Establish and enforce data packet filtering through the deployment of firewalls, Intrusion Detection Systems (IDS) and Intrusion Prevention Systems (IPSs) on the VoIP network.

     - Make sure, whenever possible, that the supporting protocols and services (i.e. TCP, UDP, NTP, SCTP, DNS, HTTP, TFTP, SNMP, TFTP, and etc…) are properly configured and dedicated to the VoIP infrastructure.

     - Enable strong authentication and encrypted communications, whenever possible, on VoIP networks by activating security mechanisms such as Transport Layer Security (TLS), HTTP digest, and the Secure Real-time Transport Protocol.

     - Deploy and enforce physical security, whenever possible, to all sensitive devices on the network.

     - Establish and enforce proper monitoring and auditing procedures, while enabling logging as much as possible, in a logically isolated domain, to avoid adding delays to the VoIP network.

     - Make sure Board Session Controllers with built-in security are deployed into the edges of the VoIP Network.

It is worth to note that there is no silver bullet when dealing with VoIP and SIP security. In fact, even though “Many international research groups, among which the IETF, are focusing their activities on VoIP security problems; at the state of art, a security standard is not available.” (Casola, Rak, Mazzeo, & Mazzoccca, 2005)

Conclusion


This paper explored the Session Initiation Protocol (SIP) and its applicability towards Voice-over-IP (VoIP) solutions, while assessing the security aspects of Denial of Service Attacks (DoS) on SIP-based systems. The evolution and current implementation details of the SIP protocol as well as the SIP architecture that supports VoIP implementations were also explored.

Since VoIP and SIP-based implementations are not much different, security wise, from other communication protocols within a computer network, they are vulnerable to almost all the same weaknesses and threats that affect any other protocol. Under this context, DoS attacks were explained and their effects were translated to the VoIP implementations. A set of guidelines to countermeasure such attacks was also provided.

The marriage between VoIP and SIP allows for an incredible set of possibilities on packet-switched telephony and the exchange of media-rich experiences. However, to realize its full potential, it is necessary to enforce the security of the underlying network infrastructure and supporting protocols. Still, there is no silver bullet in computer systems security. Only by assessing, formulating, implementing and enforcing proper security policies can the full potential of this technology flourish.

References


Casola, V., Rak, M., Mazzeo, A., & Mazzoccca, N. (2005). Security design and evaluation in a VoIP secure infrastructure: a policy based approach. Information Technology: Coding and Computing, 2005. ITCC 2005. International Conference on Volume: 1A.

Cert.org. (2003). Multiple vulnerabilities in implementations of the Session Initiation Protocol (SIP). Retrieved December 07, 2011, from
http://www.cert.org/advisories/CA-2003-06.html.

Falk, R., & Fries, S. (2008). Security Governance for Enterprise VoIP Communication. 10.1109/SECUREWARE.2008.25 IEEE. Retrieved December 08, from http://doi.ieeecomputersociety.org/10.1109/SECURWARE.2008.25.

Garuba, M., Jiang, Li, & Zhenqiang, Yi. (2008). Security in the New Era of Telecommunication: Threats, Risks and Controls of VoIP Information Technology. New Generations, 2008. ITNG 2008. Fifth International Conference.

IETF.org. (2002). Session Initiation Protocol, IETF RFC 3261. Retrieved December 08, from http://www.ietf.org/rfc/rfc3261.txt.

IETF.org. (2003). A Transport Protocol for Real-Time Applications, IETF RFC 3550. Retrieved December 06, 2011 from http://www.ietf.org/rfc/rfc3550.txt.

Nist.gov. (2005). NIST Security Considerations for Voice Over IP Systems, National Institute of Standards and Technology NIST SP 800-58. Retrieved December 09, from http://csrc.nist.gov/publications/nistpubs/800-58/SP800-58-final.pdf.

Porter, Thomas, & Jr., Jan K., & Baskin, Brian. (2006). Pratical VoIP Security. Syngress : Waltham, MA.