SIP (Session Initiation Protocol) is a text-based signaling protocol that sets up, modifies, and terminates real-time sessions like voice, video, messaging, and presence over IP. It negotiates how we talk; RTP/SRTP carries what we hear.

Outdoor comparison graphic with two large signs: the left labeled with SIP call signaling messages like INVITE, 180 Ringing, 200 OK and BYE to show “SIP negotiates how we talk,” and the right labeled RTP / SRTP with lines indicating that RTP carries the actual media stream. — SIP signaling vs RTP media flow illustration

Think of SIP as the call-control language: it makes phones ring, finds where users are currently registered, negotiates codecs and media endpoints via SDP, supports mid-call changes (hold, transfer, add video), and cleans up sessions when calls end. The core behavior is defined by the SIP core specification (RFC 3261) ¹. SIP is used by IP phones, SIP intercoms, IP PBXs, softphones, and SIP trunks.

How does SIP set up, manage, and terminate VoIP calls?

When people look at a SIP trace the first time, they only see long text blocks with headers and codes. It feels impossible to link that to a simple phone call.

SIP sets up calls with INVITE + SDP, confirms establishment with 200 OK + ACK, adjusts sessions with re-INVITE/UPDATE, and ends them with BYE. SIP moves signaling; media flows separately over RTP or SRTP.

Abstract VoIP call-flow diagram with teal phone icons on the left for steps like registration and proxying, connected by dashed lines to purple packet blocks and pink boxes on the right representing stages such as proxy, call setup, ringing, and connected call. — SIP registration and call setup workflow chart

SIP call flow in simple steps

Here’s the “story” of a typical SIP call:

Register (optional, but common)
Your phone sends REGISTER to a registrar. This binds your SIP identity (AoR, like sip:101@company.com) to a reachable Contact address.
Invite
You dial. Your phone sends INVITE to the PBX/SBC/proxy, usually with SDP listing:
- audio/video codecs it supports
- where it wants to receive RTP (IP/port)
Ringing / progress
The far side (or PBX) responds with 1xx like:
- 100 Trying
- 180 Ringing
- sometimes 183 Session Progress (often used for early media)
Answer
When the call is accepted, the callee sends 200 OK with its own SDP (its chosen codecs and RTP address/ports).
ACK
The caller sends ACK to confirm the final response for the INVITE. Now the dialog is established.
Media
Audio/video flows over Real-time Transport Protocol (RTP) ² (or Secure RTP (SRTP) ³) using the negotiated details from SDP.
Note: media can also start before the 200 OK in some designs (early media with 183 + SDP).
Mid-call changes
Hold/resume, codec changes, adding video, or refreshing NAT mappings commonly use re-INVITE or UPDATE with new SDP.
Hang up
Either side sends BYE. The other side answers 200 OK, and the dialog ends.

Key SIP methods you actually use

Method	What it’s for
REGISTER	Bind identity (AoR) to current reachable Contact
INVITE	Start a session or renegotiate media (with SDP)
ACK	Confirm final 2xx response to INVITE
BYE	End an active dialog
CANCEL	Stop a call that’s still ringing/not answered
OPTIONS	Capability/reachability “ping”
UPDATE / re-INVITE	Change/refresh media mid-call

Transactions vs dialogs

SIP has two important scopes:

Transaction: one request + its responses (e.g., INVITE → 100/180/200)
Dialog: the ongoing relationship for the call, tracked by Call-ID + tags + CSeq

A call can fork (ring multiple devices) and still be “one call” from the user’s viewpoint, but multiple dialogs can exist briefly during forking.

What is SDP and why is it inside SIP?

Many SIP problems aren’t “SIP problems” — they’re SDP problems.

SDP (Session Description Protocol) is the blob inside SIP messages that describes media: codecs, IPs, ports, and attributes like DTMF, SRTP, and direction (sendrecv/recvonly).

The canonical format is described in RFC 4566, the SDP specification ⁴.

Typical SDP items you see:

m= media line (audio/video + port + protocol)
c= connection line (IP address)
codec payloads (e.g., G.711, Opus)
RTP event for DTMF (RFC 2833/4733 style)
SRTP attributes (keys or DTLS fingerprints depending on mode)

If SDP advertises the wrong IP/port (common behind NAT), you get classic symptoms:

rings but one-way audio
connects but no audio
video works one way only

What is the difference between SIP and VoIP?

Many sales pages write “SIP phones” and “VoIP phones” as if they are two different planets.

VoIP is the overall concept: voice over IP. SIP is one popular signaling protocol used in many VoIP systems to set up and control calls. SIP is a piece of VoIP, not a replacement for it.

Row of white telecom racks in a data center with a translucent blue network diagram overlaid on the left cabinet, showing icons for cloud, controller, servers, and a laptop interconnected. — Data center with overlaid IP network topology to cloud

A simple stack view:

Part of the system	What it does	Examples
Codecs	Encode/decode audio/video	G.711, G.722, Opus, H.264
Media transport	Carry media packets	RTP, SRTP, RTCP
Signaling	Set up/control sessions	SIP, (also H.323, MGCP in some systems)
Call control	Features & routing logic	PBX/softswitch/SBC policies

What ports does SIP use?

People often treat SIP like “open port 5060 and you’re done.” That’s rarely true.

SIP signaling commonly uses UDP/TCP 5060, or TLS on 5061, but the bigger issue is RTP/SRTP: media uses dynamic UDP ports negotiated in SDP (or anchored by an SBC).

Typical defaults:

SIP over UDP/TCP: 5060
SIP over TLS: 5061
RTP/SRTP: dynamic UDP range (varies by PBX/vendor)

So firewall rules are usually about both:

allowing SIP to the right server(s)
allowing RTP/SRTP media ranges (or forcing media relay via SBC)

Will SIP work behind NAT, firewalls, and SIP ALG?

Many VoIP issues have nothing to do with codecs or PBX rules. They come from NAT and “helpful” routers.

SIP works behind NAT, but you must manage address/port rewriting for SIP headers and SDP, and you often need an SBC/B2BUA or NAT traversal tools. SIP ALG is a frequent cause of one-way audio and random breakage.

Isometric blue illustration of an IP gateway device in the center connected by lines into cloud shapes containing a mobile phone, Ethernet switches, and server stacks, symbolizing a cloud-connected VoIP or network appliance. — Cloud-connected VoIP gateway and remote devices diagram

Why NAT breaks SIP so often

NAT changes source IP/ports on the outside, but SIP/SDP may still announce private addresses like 192.168.x.x. If the far end follows that announcement, it sends media to an unreachable private IP.

What actually fixes it

Tool / approach	What it solves	Common place
SBC / B2BUA	Rewrites SIP + SDP, anchors media, enforces policy	Network edge / cloud edge
Symmetric RTP / rport	Helps with basic NAT behavior	PBX + endpoints
Session Traversal Utilities for NAT (STUN) ⁵	Helps a client discover public mapping	Softphones/WebRTC, some SIP clients
TURN	Relays media when direct path fails	WebRTC / mobile-heavy deployments
Interactive Connectivity Establishment (ICE) ⁶	Tries multiple candidate paths automatically	WebRTC, some modern SIP endpoints

SIP ALG: why it hurts

SIP ALG tries to rewrite SIP/SDP on the router. Problems:

it often can’t handle vendor differences
it breaks TLS (can’t inspect encrypted SIP)
it may rewrite ports incorrectly

In many professional deployments: disable SIP ALG and let the PBX/SBC do SIP-aware handling.

How do I secure SIP with TLS, SRTP, and SBCs?

Once calls cross the internet, security stops being optional.

Secure SIP by encrypting signaling with TLS, encrypting media with SRTP (or DTLS-SRTP where appropriate), using strong authentication, and placing SBCs at borders for protection, NAT traversal, and interop.

Desk IP phone on a table next to a glowing blue schematic of a secure network, with padlock symbols and dotted paths indicating encrypted VoIP communications. — Secure VoIP network map beside IP phone

What each layer protects

SIP over TLS: protects SIP headers and SDP from snooping/tampering in transit
SRTP: protects the actual audio/video payload
Session Border Controller (SBC) ⁷: reduces attack surface, blocks floods, hides internal topology, normalizes SIP behavior between carriers and endpoints

Practical hardening checklist

Control	What to do	Why
TLS for SIP	Use `sips:` / TLS where supported	Stops credential and routing leakage
SRTP	Prefer SRTP end-to-end when possible	Prevents eavesdropping on media
Strong auth	Long random passwords, lockouts	Stops brute-force registrations
Rate limits	SBC/edge rules for scans/floods	Reduces downtime from attacks
Least exposure	Don’t expose PBX directly if possible	Shrinks threat surface
Logs & alerts	Monitor REGISTER failures, call spikes	Early warning of abuse

Conclusion

SIP is the signaling backbone of many VoIP systems: it finds users, rings devices, negotiates media with SDP, supports mid-call changes, and ends sessions cleanly — while RTP/SRTP carries the actual voice and video. With good NAT handling (often via SBC), sensible firewall rules, and TLS + SRTP, SIP becomes reliable, interoperable, and secure enough for real-world PBXs, SIP trunks, and SIP intercom deployments.

Footnotes

Defines SIP requests, responses, dialogs, and routing behaviors in the official standard. ↩︎ ↩
Details RTP packet structure, sequencing, and timing used to carry real-time audio/video streams. ↩︎ ↩
Describes how SRTP encrypts and authenticates voice/video media to prevent eavesdropping and tampering. ↩︎ ↩
Explains the SDP format used to advertise codecs, IPs, ports, and media attributes in calls. ↩︎ ↩
Shows how STUN helps endpoints discover public NAT mappings for better connectivity and fewer one-way-audio issues. ↩︎ ↩
Explains ICE candidate gathering and selection for reliable NAT traversal in modern real-time communications. ↩︎ ↩
Overview of SBC roles like topology hiding, policy enforcement, interop normalization, and attack mitigation at SIP borders. ↩︎ ↩

About The Author

DJSLink R&D Team

DJSLink China's top SIP Audio And Video Communication Solutions manufacturer & factory .
Over the past 15 years, we have not only provided reliable, secure, clear, high-quality audio and video products and services, but we also take care of the delivery of your projects, ensuring your success in the local market and helping you to build a strong reputation.