What is Session Initiation Protocol (SIP) and how does it work?

Vendors, carriers, and PBX manuals all keep saying “SIP”, but many people still think it is just another word for VoIP or a single port to open.

SIP (Session Initiation Protocol) is a text-based signaling protocol that sets up, modifies, and terminates real-time sessions like voice, video, messaging, and presence over IP. It negotiates how we talk; RTP/SRTP carries what we hear.

Outdoor comparison graphic with two large signs: the left labeled with SIP call signaling messages like INVITE, 180 Ringing, 200 OK and BYE to show “SIP negotiates how we talk,” and the right labeled RTP / SRTP with lines indicating that RTP carries the actual media stream.
SIP signaling vs RTP media flow illustration

Think of SIP as the call-control language: it makes phones ring, finds where users are currently registered, negotiates codecs and media endpoints via SDP, supports mid-call changes (hold, transfer, add video), and cleans up sessions when calls end. The core behavior is defined by the SIP core specification (RFC 3261) 1. SIP is used by IP phones, SIP intercoms, IP PBXs, softphones, and SIP trunks.


How does SIP set up, manage, and terminate VoIP calls?

When people look at a SIP trace the first time, they only see long text blocks with headers and codes. It feels impossible to link that to a simple phone call.

SIP sets up calls with INVITE + SDP, confirms establishment with 200 OK + ACK, adjusts sessions with re-INVITE/UPDATE, and ends them with BYE. SIP moves signaling; media flows separately over RTP or SRTP.

Abstract VoIP call-flow diagram with teal phone icons on the left for steps like registration and proxying, connected by dashed lines to purple packet blocks and pink boxes on the right representing stages such as proxy, call setup, ringing, and connected call.
SIP registration and call setup workflow chart

SIP call flow in simple steps

Here’s the “story” of a typical SIP call:

  1. Register (optional, but common)
    Your phone sends REGISTER to a registrar. This binds your SIP identity (AoR, like sip:101@company.com) to a reachable Contact address.

  2. Invite
    You dial. Your phone sends INVITE to the PBX/SBC/proxy, usually with SDP listing:

    • audio/video codecs it supports
    • where it wants to receive RTP (IP/port)
  3. Ringing / progress
    The far side (or PBX) responds with 1xx like:

    • 100 Trying
    • 180 Ringing
    • sometimes 183 Session Progress (often used for early media)
  4. Answer
    When the call is accepted, the callee sends 200 OK with its own SDP (its chosen codecs and RTP address/ports).

  5. ACK
    The caller sends ACK to confirm the final response for the INVITE. Now the dialog is established.

  6. Media
    Audio/video flows over Real-time Transport Protocol (RTP) 2 (or Secure RTP (SRTP) 3) using the negotiated details from SDP.
    Note: media can also start before the 200 OK in some designs (early media with 183 + SDP).

  7. Mid-call changes
    Hold/resume, codec changes, adding video, or refreshing NAT mappings commonly use re-INVITE or UPDATE with new SDP.

  8. Hang up
    Either side sends BYE. The other side answers 200 OK, and the dialog ends.

Key SIP methods you actually use

Method What it’s for
REGISTER Bind identity (AoR) to current reachable Contact
INVITE Start a session or renegotiate media (with SDP)
ACK Confirm final 2xx response to INVITE
BYE End an active dialog
CANCEL Stop a call that’s still ringing/not answered
OPTIONS Capability/reachability “ping”
UPDATE / re-INVITE Change/refresh media mid-call

Transactions vs dialogs

SIP has two important scopes:

  • Transaction: one request + its responses (e.g., INVITE → 100/180/200)
  • Dialog: the ongoing relationship for the call, tracked by Call-ID + tags + CSeq

A call can fork (ring multiple devices) and still be “one call” from the user’s viewpoint, but multiple dialogs can exist briefly during forking.


What is SDP and why is it inside SIP?

Many SIP problems aren’t “SIP problems” — they’re SDP problems.

SDP (Session Description Protocol) is the blob inside SIP messages that describes media: codecs, IPs, ports, and attributes like DTMF, SRTP, and direction (sendrecv/recvonly).

The canonical format is described in RFC 4566, the SDP specification 4.

Typical SDP items you see:

  • m= media line (audio/video + port + protocol)
  • c= connection line (IP address)
  • codec payloads (e.g., G.711, Opus)
  • RTP event for DTMF (RFC 2833/4733 style)
  • SRTP attributes (keys or DTLS fingerprints depending on mode)

If SDP advertises the wrong IP/port (common behind NAT), you get classic symptoms:

  • rings but one-way audio
  • connects but no audio
  • video works one way only

What is the difference between SIP and VoIP?

Many sales pages write “SIP phones” and “VoIP phones” as if they are two different planets.

VoIP is the overall concept: voice over IP. SIP is one popular signaling protocol used in many VoIP systems to set up and control calls. SIP is a piece of VoIP, not a replacement for it.

Row of white telecom racks in a data center with a translucent blue network diagram overlaid on the left cabinet, showing icons for cloud, controller, servers, and a laptop interconnected.
Data center with overlaid IP network topology to cloud

A simple stack view:

Part of the system What it does Examples
Codecs Encode/decode audio/video G.711, G.722, Opus, H.264
Media transport Carry media packets RTP, SRTP, RTCP
Signaling Set up/control sessions SIP, (also H.323, MGCP in some systems)
Call control Features & routing logic PBX/softswitch/SBC policies

What ports does SIP use?

People often treat SIP like “open port 5060 and you’re done.” That’s rarely true.

SIP signaling commonly uses UDP/TCP 5060, or TLS on 5061, but the bigger issue is RTP/SRTP: media uses dynamic UDP ports negotiated in SDP (or anchored by an SBC).

Typical defaults:

  • SIP over UDP/TCP: 5060
  • SIP over TLS: 5061
  • RTP/SRTP: dynamic UDP range (varies by PBX/vendor)

So firewall rules are usually about both:

  • allowing SIP to the right server(s)
  • allowing RTP/SRTP media ranges (or forcing media relay via SBC)

Will SIP work behind NAT, firewalls, and SIP ALG?

Many VoIP issues have nothing to do with codecs or PBX rules. They come from NAT and “helpful” routers.

SIP works behind NAT, but you must manage address/port rewriting for SIP headers and SDP, and you often need an SBC/B2BUA or NAT traversal tools. SIP ALG is a frequent cause of one-way audio and random breakage.

Isometric blue illustration of an IP gateway device in the center connected by lines into cloud shapes containing a mobile phone, Ethernet switches, and server stacks, symbolizing a cloud-connected VoIP or network appliance.
Cloud-connected VoIP gateway and remote devices diagram

Why NAT breaks SIP so often

NAT changes source IP/ports on the outside, but SIP/SDP may still announce private addresses like 192.168.x.x. If the far end follows that announcement, it sends media to an unreachable private IP.

What actually fixes it

Tool / approach What it solves Common place
SBC / B2BUA Rewrites SIP + SDP, anchors media, enforces policy Network edge / cloud edge
Symmetric RTP / rport Helps with basic NAT behavior PBX + endpoints
Session Traversal Utilities for NAT (STUN) 5 Helps a client discover public mapping Softphones/WebRTC, some SIP clients
TURN Relays media when direct path fails WebRTC / mobile-heavy deployments
Interactive Connectivity Establishment (ICE) 6 Tries multiple candidate paths automatically WebRTC, some modern SIP endpoints

SIP ALG: why it hurts

SIP ALG tries to rewrite SIP/SDP on the router. Problems:

  • it often can’t handle vendor differences
  • it breaks TLS (can’t inspect encrypted SIP)
  • it may rewrite ports incorrectly

In many professional deployments: disable SIP ALG and let the PBX/SBC do SIP-aware handling.


How do I secure SIP with TLS, SRTP, and SBCs?

Once calls cross the internet, security stops being optional.

Secure SIP by encrypting signaling with TLS, encrypting media with SRTP (or DTLS-SRTP where appropriate), using strong authentication, and placing SBCs at borders for protection, NAT traversal, and interop.

Desk IP phone on a table next to a glowing blue schematic of a secure network, with padlock symbols and dotted paths indicating encrypted VoIP communications.
Secure VoIP network map beside IP phone

What each layer protects

  • SIP over TLS: protects SIP headers and SDP from snooping/tampering in transit
  • SRTP: protects the actual audio/video payload
  • Session Border Controller (SBC) 7: reduces attack surface, blocks floods, hides internal topology, normalizes SIP behavior between carriers and endpoints

Practical hardening checklist

Control What to do Why
TLS for SIP Use sips: / TLS where supported Stops credential and routing leakage
SRTP Prefer SRTP end-to-end when possible Prevents eavesdropping on media
Strong auth Long random passwords, lockouts Stops brute-force registrations
Rate limits SBC/edge rules for scans/floods Reduces downtime from attacks
Least exposure Don’t expose PBX directly if possible Shrinks threat surface
Logs & alerts Monitor REGISTER failures, call spikes Early warning of abuse

Conclusion

SIP is the signaling backbone of many VoIP systems: it finds users, rings devices, negotiates media with SDP, supports mid-call changes, and ends sessions cleanly — while RTP/SRTP carries the actual voice and video. With good NAT handling (often via SBC), sensible firewall rules, and TLS + SRTP, SIP becomes reliable, interoperable, and secure enough for real-world PBXs, SIP trunks, and SIP intercom deployments.

Footnotes


  1. Defines SIP requests, responses, dialogs, and routing behaviors in the official standard. ↩︎ 

  2. Details RTP packet structure, sequencing, and timing used to carry real-time audio/video streams. ↩︎ 

  3. Describes how SRTP encrypts and authenticates voice/video media to prevent eavesdropping and tampering. ↩︎ 

  4. Explains the SDP format used to advertise codecs, IPs, ports, and media attributes in calls. ↩︎ 

  5. Shows how STUN helps endpoints discover public NAT mappings for better connectivity and fewer one-way-audio issues. ↩︎ 

  6. Explains ICE candidate gathering and selection for reliable NAT traversal in modern real-time communications. ↩︎ 

  7. Overview of SBC roles like topology hiding, policy enforcement, interop normalization, and attack mitigation at SIP borders. ↩︎ 

About The Author
Picture of DJSLink R&D Team
DJSLink R&D Team

DJSLink China's top SIP Audio And Video Communication Solutions manufacturer & factory .
Over the past 15 years, we have not only provided reliable, secure, clear, high-quality audio and video products and services, but we also take care of the delivery of your projects, ensuring your success in the local market and helping you to build a strong reputation.

Request A Quote Today!

Your email address will not be published. Required fields are marked *. We will contact you within 24 hours!
Kindly Send Us Your Project Details

We Will Quote for You Within 24 Hours .

OR
Recent Products
Get a Free Quote

DJSLink experts Will Quote for You Within 24 Hours .

OR