SIP calling looks simple until calls fail at the worst moment. One-way audio, random registration drops, and DTMF not working can turn a clean rollout into daily firefighting.
SIP calling uses SIP messages to set up, manage, and end calls, while the actual voice travels separately over RTP. SIP negotiates codecs and media ports via SDP, then endpoints stream audio directly or through an SBC.

SIP calling: signaling sets the call, RTP carries the voice
SIP (Session Initiation Protocol (SIP) 1) is the signaling layer. It is the part that says “who is calling who,” “should the phone ring,” “which codecs are allowed,” and “when is the call ended.” The voice itself is usually not inside SIP. The voice is carried by RTP (Real-time Transport Protocol (RTP) 2), which runs in parallel once SIP finishes negotiating the session.
The basic call flow in plain language
A normal SIP call follows a predictable sequence:
- The caller sends an INVITE to the callee (often through a PBX, proxy, or SBC).
- The callee (or server) answers with progress messages like 100 Trying and 180 Ringing.
- The final acceptance is 200 OK, usually containing SDP that describes the chosen codec and media IP/port.
- The caller confirms with ACK.
- Media starts flowing via RTP in both directions.
- The call ends with BYE and a final 200 OK response.
SDP (Session Description Protocol (SDP) 3) is the “menu” inside SIP messages. It lists codec options, packetization, IP/port for RTP, and sometimes SRTP keying details. Most call problems come from one of these areas:
- SIP signaling cannot reach the far end (routing, auth, ports).
- RTP media cannot reach the far end (NAT, firewall, wrong IP in SDP).
- The negotiated codec/DTMF mode is incompatible.
Why SIP works well for phones, PBX, and intercoms
SIP scales because it separates control from media. Phones and SIP intercoms can register to a PBX for inbound reachability and can also place outbound calls through trunks. A PBX can fork calls (ring multiple devices), route by dial plan, and provide features like transfer, paging, and hunt groups.
In practical deployments, it helps to think in two planes:
- Control plane (SIP): identities, authentication, routing, and features.
- Media plane (RTP/SRTP): audio quality, jitter, packet loss, and encryption.
| Plane | Typical ports | What breaks first | What to test |
|---|---|---|---|
| SIP signaling | 5060 UDP/TCP, 5061 TLS | Registration, call setup, ringing | SIP logs, REGISTER/INVITE traces |
| RTP media | Dynamic UDP range | One-way/no audio, DTMF issues | RTCP stats, firewall pinholes |
| Security | TLS + SRTP | Compliance gaps, MITM risk | Certs, cipher suites, keying mode |
SIP calling becomes predictable when signaling and media are treated as separate systems with separate failure modes.
If the goal is reliable phone + intercom projects, the next step is understanding how registration and SIP trunks connect everything into one dial plan.
How do SIP registration and trunks connect PBX, phones, and intercoms?
SIP networks fail when roles are mixed up. Phones “register,” trunks “peer,” and intercoms sometimes do both. Clarity here saves a lot of time.
Registration connects endpoints to a PBX by publishing their current contact address, while SIP trunks connect PBXs to carriers or other PBXs. Phones and intercoms usually register as extensions; trunks usually authenticate or IP-peer to exchange inbound/outbound calls.

Registration: how an endpoint becomes reachable
Registration is the process where a phone or SIP intercom tells the PBX: “this is my current IP/port; send calls for extension 101 here.” The endpoint sends REGISTER requests 4 to a registrar on the PBX (or hosted platform). The PBX replies with authentication challenges (401) and accepts with 200 OK. The PBX stores the Contact location and refreshes it with an expiry timer.
Registration is ideal for:
- SIP phones on desks
- Indoor stations and door intercoms
- Remote devices behind NAT (with keepalives)
For intercoms, registration also supports predictable inbound calling (call the door station) and feature control (DTMF for relay, paging, busy lamp, etc.).
Trunks: how the PBX reaches the outside world (or another domain)
A SIP trunk is a PBX-to-provider (or PBX-to-PBX) connection. Instead of “registering like a phone,” trunks often work as:
- Registration-based trunks (PBX registers to provider with credentials)
- IP-auth trunks (provider trusts calls from the PBX public IP, often with ACLs)
- Mutual TLS trunks (strong identity, cert-based trust)
Trunks are where DID numbers, inbound routes, and outbound caller ID policies live. They often require SBC-like behavior: NAT awareness, codec control, and security enforcement.
Common connection patterns that work well
In mixed projects with SIP phones + intercoms, three patterns are common:
1) Local PBX + SIP endpoints (register)
Phones and intercoms register to the PBX. The PBX routes internal calls and uses a trunk for PSTN.
2) Hosted PBX + remote endpoints (register)
Endpoints register over the internet using TLS/SRTP. NAT traversal and keepalives matter.
3) SBC in front of PBX
Endpoints and trunks terminate on an SBC, which handles NAT, encryption, and policy. This often reduces “random” one-way audio in the field.
| Element | Identity model | Typical auth | Best practice |
|---|---|---|---|
| Phone / Intercom | Extension (AOR + Contact) | Digest / TLS client auth | Short keepalive, stable registration refresh |
| PBX | Dial plan controller | N/A | Normalize codecs, DTMF, and RTP ranges |
| SIP trunk | Carrier peer | IP ACL / registration / mTLS | Use SBC, lock down inbound IPs |
| SBC (optional) | Security + NAT boundary | Certificates + policy | Terminate TLS/SRTP, hide topology |
For SIP intercom deployments, a practical approach is to register each intercom as a normal extension and treat PSTN access as a trunk function only. It keeps roles clean and troubleshooting fast.
Next comes the question that causes most interoperability pain: ports, codecs, and DTMF. This is where “default settings” break multi-vendor systems.
What SIP ports, codecs, and DTMF settings should I configure?
Most “SIP doesn’t work” tickets end up being “RTP blocked,” “wrong codec,” or “DTMF mode mismatch.” A small set of settings prevents most of that.
Configure signaling on 5060 (UDP/TCP) or 5061 (TLS), open a defined RTP UDP port range for media, allow a small codec set for interoperability, and standardize DTMF as RTP events (RFC 2833/4733) unless a platform requires SIP INFO.

SIP signaling ports
- 5060 UDP: common default, efficient, but easier to spoof if exposed to the internet.
- 5060 TCP: useful when NAT devices mishandle UDP or when message size grows.
- 5061 TLS: encrypted signaling. Preferred for internet-facing deployments.
A clean rule: use TLS externally, and restrict 5060/5061 exposure to known IPs (SBC, PBX, provider).
RTP media ports (the real firewall work)
RTP uses dynamic UDP ports negotiated in SDP. Many PBXs and SBCs let you define a port range (example ranges seen in the field: 10000–20000, 20000–40000). The exact range is not universal, so the safest move is to:
- Set a fixed RTP range on the PBX/SBC
- Ensure endpoints match or can accept that range
- Open that UDP range between the correct network zones
Codec selection that avoids surprises
A small codec policy works best:
- G.711 (PCMU/PCMA): highest compatibility, higher bandwidth (~80–90 kbps per call including overhead at 20 ms ptime).
- Opus: excellent quality and resilience, flexible bitrate; great when both ends support it.
- G.729: lower bandwidth, but licensing and sensitivity can be a concern in some environments.
Packetization time (ptime) commonly defaults to 20 ms, which is a solid balance of latency and overhead.
DTMF settings (critical for door release and IVR)
DTMF can be carried in several ways:
- RTP events (RFC 2833/4733) 5: best interoperability for VoIP. Recommended default.
- SIP INFO: used by some systems, but can break across proxies or when not normalized.
- In-band: depends on codec fidelity; often unreliable with compressed codecs.
For SIP intercoms controlling relays, RTP events are usually the safest. If a platform insists on SIP INFO, keep it consistent end-to-end and avoid transcoding gateways that drop INFO.
| Item | Recommended default | When to change | Symptom if wrong |
|---|---|---|---|
| SIP transport | TLS/5061 externally | Legacy endpoints | Random registration failure, security risk |
| RTP range | Fixed UDP range on PBX/SBC | Multi-zone firewalls | One-way/no audio |
| Codec set | G.711 + Opus (optional) | Low bandwidth links | No audio, transcoding artifacts |
| ptime | 20 ms | High-loss links (sometimes) | Latency or choppy audio |
| DTMF | RFC 2833/4733 | Platform requires INFO | Door open fails, IVR ignores digits |
A practical checklist for SIP devices (phones + intercoms) is: restrict codecs, lock ptime, standardize DTMF, and pin RTP ranges. It avoids 80% of multi-vendor interoperability problems.
Once ports and codecs are correct, the next failure mode is NAT. NAT is where signaling may work but audio fails, and SIP ALG often makes it worse.
How do NAT, STUN, and SIP ALG affect audio and signaling?
SIP can register successfully and still produce one-way audio. That is the classic sign of NAT and SDP problems, not “bad SIP credentials.”
NAT rewrites IP/ports and can cause SIP/SDP to advertise unreachable private addresses; STUN/ICE help endpoints discover public mappings; SIP ALG tries to rewrite SIP/SDP but often breaks modern VoIP and should usually be disabled in favor of SBCs and proper NAT traversal.

Why NAT breaks media more than signaling
Signaling (SIP) often goes to a known server IP/port, so NAT creates a stable outbound mapping. Media (RTP) uses dynamic ports and may be peer-to-peer. If the SDP advertises a private IP (like 192.168.x.x) to a remote endpoint, the far end sends RTP to an unreachable address. Result: one-way audio or no audio.
Common NAT-friendly behaviors that help:
- rport and symmetric response (server replies to source port)
- SIP keepalives to maintain mappings
- Symmetric RTP (send RTP back to the source address/port seen)
- Short REGISTER refresh (balanced to avoid load)
STUN, TURN, and ICE in VoIP reality
- STUN: tells a client its public mapped address. Works well for many NAT types but not all.
- TURN: relays media through a server. Heavier, but reliable when direct media fails.
- Interactive Connectivity Establishment (ICE) 6: negotiates the best candidate path (host, STUN-reflexive, TURN-relayed). Common in WebRTC and modern softphones.
For many enterprise SIP phones, STUN support is limited or optional. For softphones and WebRTC clients, ICE is often the default approach.
SIP ALG: why it causes “works but broken”
SIP ALG inspects and rewrites SIP/SDP to “help” NAT traversal. In theory, it fixes private IPs and opens pinholes. In practice, many ALGs:
- Rewrite headers inconsistently
- Mangle SDP ports
- Break re-INVITEs and mid-call changes
- Interfere with TLS (cannot inspect encrypted SIP)
- Conflict with ICE/symmetric RTP behaviors
The most reliable pattern is: disable SIP ALG and use one of these instead:
- An SBC at the edge
- A PBX that supports NAT-aware contact handling and symmetric RTP
- Proper firewall rules with predictable port ranges
| Situation | Best approach | What to avoid | Typical symptom |
|---|---|---|---|
| Phones behind NAT to hosted PBX | TLS + keepalive + SBC | SIP ALG | Random one-way audio |
| WebRTC clients | ICE (STUN/TURN) | Forcing direct RTP only | Calls connect, audio fails |
| Site-to-site PBX | IPsec + SBC | Overlapping ALGs | Mid-call drops on re-INVITE |
| Mixed VLANs/firewalls | Fixed RTP range | Wide-open any-any UDP | Security risk + still unstable |
For SIP intercom projects, NAT issues often show up as: registration works, but door station audio is one-way when calling outside the LAN. The fix is almost always SDP correctness, RTP pinholes, and disabling SIP ALG.
After NAT is handled, security becomes the next question: how to encrypt calls end-to-end and satisfy compliance needs without breaking interoperability.
How does SRTP secure calls and meet compliance requirements?
Unencrypted VoIP is easy to intercept on shared networks. That risk grows in multi-tenant buildings and cloud deployments. Encryption needs to be designed, not bolted on.
SRTP encrypts RTP media to protect voice content and adds integrity and replay protection; it is often paired with SIP over TLS for signaling. Compliance goals are met by encrypting in transit, controlling keys, enforcing strong identity, and logging security events without storing sensitive media unnecessarily.

What SRTP protects (and what it does not)
Secure Real-time Transport Protocol (SRTP) 7 secures the media stream:
- Confidentiality (encryption)
- Integrity (tamper detection)
- Replay protection (blocks reused packets)
SIP over TLS protects signaling metadata in transit (dialed numbers, headers, SDP). Without TLS, SRTP may still protect audio, but SIP messages can leak sensitive call details and can be modified by attackers.
Keying methods: why interoperability matters
SRTP requires both ends to agree on keys. Common approaches include:
- SDES-SRTP: keys are carried in SDP. Easy to deploy, but keys must be protected by TLS to be safe.
- DTLS-SRTP: keys negotiated via DTLS handshakes. Common in WebRTC and some modern endpoints.
- SRTP via SBC: the SBC terminates and re-encrypts, which is not end-to-end but is practical for compliance and interop.
For many enterprise PBX systems, “TLS + SDES-SRTP” is a common, workable baseline. For WebRTC, DTLS-SRTP is typical.
Compliance: focus on controls, not buzzwords
Compliance requirements vary, but the technical controls that usually matter are stable:
- Encrypt signaling (TLS) and media (SRTP) in transit
- Restrict who can connect (ACLs, mTLS where possible, SBC policies)
- Manage certificates and rotation
- Log authentication events and configuration changes
- Protect recordings with access control and encryption at rest (if recordings exist)
In regulated environments, an SBC often becomes the policy enforcement point. It can require TLS, enforce cipher suites, prevent downgrade to plain RTP, and provide audit-friendly telemetry.
Practical SRTP settings that reduce failure
- Keep a clear fallback policy: either require SRTP or allow fallback only on trusted LAN segments.
- Avoid mixed keying modes across domains unless the SBC normalizes them.
- Ensure time and certificates are correct on endpoints (clock drift breaks TLS).
- Validate DTMF under SRTP (RTP events still work, but interop should be tested).
| Security layer | Recommended baseline | Benefit | Common pitfall |
|---|---|---|---|
| SIP signaling | TLS (5061) | Protects SIP headers/SDP | Bad certs, wrong time, MITM warnings |
| Media | SRTP | Protects voice content | Mixed keying modes, forced fallback |
| Edge control | SBC policy + ACLs | Stops scanning and abuse | Exposing 5060 to the internet |
| Operations | Logs + rotation | Supports audits and response | No visibility when issues happen |
For SIP phones and intercoms, SRTP works best when it is treated as a standard requirement, not an optional feature. The deployment becomes simpler when every device is expected to speak TLS/SRTP, and exceptions are isolated behind controlled gateways.
Conclusion
SIP calling uses SIP/SDP to negotiate and control sessions while RTP carries the audio. Reliable deployments standardize registration/trunk roles, lock ports/codecs/DTMF, handle NAT without SIP ALG, and secure media with TLS + SRTP.
Footnotes
-
Official SIP standard for message flows, dialogs, and core signaling behavior. ↩ ↩
-
Learn how RTP transports real-time voice packets, timing, and sequence handling. ↩ ↩
-
SDP reference for codec offers, media attributes, and negotiated IP/port details. ↩ ↩
-
Details REGISTER bindings, Contact refresh rules, and registrar processing expectations. ↩ ↩
-
Standard for carrying DTMF digits as RTP events across proxies and SBCs. ↩ ↩
-
ICE explains practical NAT traversal using STUN/TURN candidates and connectivity checks. ↩ ↩
-
SRTP standard for encrypting RTP media with integrity and replay protection. ↩ ↩








