What is a codec in my VoIP system?

Bad audio turns a normal call into a support ticket. People repeat themselves, customers lose trust, and teams blame “the internet” without a clear fix.

Table of Contents hide

1 Codec basics that actually matter in daily operations

1.1 A codec does three jobs, not one

1.2 G.711: simple, stable, and everywhere

1.3 G.729: low bandwidth, more sensitive to loss

1.4 Opus: adaptive, wideband, and modern

1.5 ptime and packet overhead: the hidden bandwidth lever

1.6 Bitrate, wideband, and what users hear

1.7 Transcoding: the quality and CPU tax

1.8 Variable bitrate: helpful, but only in the right place

1.9 PLC: the “hide the hole” feature

1.10 Jitter buffers: the delay vs stability trade

1.11 SIP trunks: prioritize interoperability and predictable calls

1.12 Intercoms and emergency endpoints: prioritize intelligibility

1.13 Recording: prioritize “no surprises”

2 Conclusion

3 Footnotes

A codec (coder-decoder) is the audio algorithm that turns voice into digital packets for sending over IP, then turns those packets back into sound on the other end. It sets the trade between bandwidth, delay, and perceived voice quality.

Diagram showing a cloud labeled ONE Channel One Concurrent External Call feeding three numbered paths: Channel 1 via SIP trunk to an IP phone, Channel 2 via IP PBX to a robot-like device, and Channel 3 via PSTN to a cloud labeled Queued/Rejected — One SIP trunk channel equals one concurrent external call path

Codec basics that actually matter in daily operations

A codec does three jobs, not one

A codec does more than “compress audio.” It samples voice, encodes it into frames, and sends those frames inside RTP packets ¹. On the far end, it rebuilds the audio in real time. This is why codec choices show up as real business outcomes. A heavy codec can sound fine but add delay. A light codec can save bandwidth but sound thin. Many issues that feel like “SIP problems” are really media problems.

A codec also lives inside a call negotiation. Session Initiation Protocol (SIP) ² sets up the call. Session Description Protocol (SDP) ³ lists what each side can do. The final codec is the overlap between both endpoints. That overlap may be limited by your SIP trunk, your PBX, your intercom, and even the far-end carrier. So the best codec on paper may never be used in the real call.

The last piece is resilience. Some codecs include strong packet loss concealment and can hide small loss. Some support forward error correction. Some behave better with jitter buffers. This matters more than raw bitrate when the network is noisy.

Codec concept	What it changes	What to watch in production
Bitrate	Bandwidth per call	WAN capacity at peak
Frame size / ptime	Latency and overhead	End-to-end delay, jitter
Packet loss handling	Stability under loss	Choppy audio, clipping
Negotiation (SIP/SDP)	What gets used	“We configured Opus but calls use G.711”
Transcoding	CPU and quality hit	High PBX load, added delay

When codec basics are clear, sizing and troubleshooting become simpler. It becomes easy to separate “call setup” issues from “media quality” issues.

If the codec is the engine of call audio, the next step is to compare the engines people see the most: G.711, G.729, and Opus.

How do audio codecs like G.711, G.729, and Opus differ?

VoIP audio can sound perfect in one office and awful in another. The same SIP trunk can deliver clear voice on LAN, then break on a congested WAN.

G.711 favors consistent quality and maximum compatibility, G.729 favors low bandwidth with tighter quality limits, and Opus is adaptive and wideband, often sounding best when both sides support it and the network changes.

Whiteboard sketch with CALL PATH in the center, arrows to SIP TRUNK, Phone UI, calculator, laptop, and magnifier labeled Carrier edge, plus headings CAPACITY → Channel and CONNECTION → Line appearance — Whiteboard explaining call path, channels, SIP trunk, and line appearance

G.711: simple, stable, and everywhere

G.711 (PCMU/PCMA) ⁴ is the “default safe choice” in many deployments. It uses about 64 kbps for audio payload, and it is very widely supported. In real networks, the total bandwidth per call is higher once RTP/UDP/IP overhead is included. That overhead changes with ptime. Still, G.711 is predictable. It also avoids the “robot voice” artifacts that can happen with stronger compression.

G.729: low bandwidth, more sensitive to loss

G.729 ⁵ is often used when bandwidth is tight. The payload bitrate is far lower than G.711. That helps with remote sites and small uplinks. But heavy compression can sound worse in noisy places, and it may degrade faster when packets drop. Licensing is also a practical point. G.729 historically had patent licensing concerns. Many patents have expired, but vendor terms and included licenses still vary, so it is worth checking your stack.

Opus: adaptive, wideband, and modern

The Opus interactive speech and audio codec ⁶ can run narrowband, wideband, or higher. It can also adapt its bitrate and protect audio in smart ways. That makes it strong for mixed networks and modern endpoints. The real limit is support. Many SIP carriers still prefer G.711 and may not accept Opus on trunks, even if your phones do.

Codec	Typical use case	Strength	Common downside
G.711 (PCMU/PCMA)	SIP trunks, general office	Compatibility and consistent sound	Higher bandwidth per call
G.729	Remote sites, limited WAN	Low bitrate	Can sound thin, can be touchy under loss
Opus	UC apps, modern endpoints, some PBXs	Great quality, adapts to conditions	Not always supported on trunks

In my intercom projects, G.711 stays popular because every PBX and SBC understands it. Opus is excellent when the whole path supports it. G.729 helps when the uplink is the real bottleneck and voice quality goals are modest.

Which codec settings affect bandwidth, MOS, and call quality?

Teams often change the codec and stop there. Then they get the same audio complaints. Codec choice matters, but settings often matter more.

Bandwidth and perceived quality depend on bitrate, packetization time (ptime), voice activity detection, and whether transcoding occurs. MOS trends also track packet loss, jitter, and delay, so codec settings must match network reality.

Engineer standing in a server room overlaid with an analytics bubble showing a rising line chart and text: Required Channels, Peak :: Headroom + Reserve, and concurrency metrics like BHCA and AIT — Capacity planning graphic for required SIP channels based on peak traffic and headroom

ptime and packet overhead: the hidden bandwidth lever

Packetization time (ptime) is how much audio goes into one RTP packet. Common values are 20 ms and 30 ms. Smaller ptime means more packets per second. That raises overhead and can raise CPU load. Larger ptime reduces overhead but adds delay and can make packet loss more painful because each lost packet contains more audio.

Bitrate, wideband, and what users hear

Mean Opinion Score (MOS) ⁷ is a perception score, not a strict math result. Still, patterns show up. Wideband codecs like G.722 or Opus wideband often sound clearer because they carry more voice frequency detail. That improves user satisfaction, especially for intercoms where clarity matters, like elevators, emergency phones, or noisy gates. But wideband needs more bandwidth and needs endpoints that can capture and play it well.

Transcoding: the quality and CPU tax

Transcoding happens when one side uses one codec and the other side uses another. The PBX or SBC must convert the media. This adds CPU and a bit of delay. It can also add small quality loss. In busy systems, transcoding becomes a hidden limiter. It can also break call recording expectations if the recorder is tied to one media format.

Setting	What it changes	Simple guidance
ptime (20/30 ms)	Overhead and latency	Start at 20 ms for most voice
Wideband vs narrowband	Clarity and natural voice	Use wideband on LAN when possible
VAD/DTX + CNG	Bandwidth during silence	Use for WAN savings, test first
Codec order / preference	What is negotiated	Prefer one codec end-to-end
Transcoding policy	CPU and delay	Avoid unless needed

When diagnosing call quality, it helps to separate “codec quality” from “network quality.” A great codec still fails with high jitter and loss. A basic codec can sound great on a stable network.

Should I use variable bitrate, PLC, and jitter buffers for stability?

Most call drops are not codec issues. Most “robot audio” complaints are jitter and loss issues. That is why stability features matter.

Variable bitrate can help when bandwidth changes, PLC helps hide small packet loss, and jitter buffers smooth timing gaps. The best result comes from balanced settings that reduce loss effects without adding too much delay.

Codec selection for trunks intercoms recording — SIP trunk headquarters topology and licensing models with resilience options

Variable bitrate: helpful, but only in the right place

Variable bitrate (VBR) is common with Opus and some modern stacks. It lets the codec spend more bits when needed and fewer bits during simpler audio. This can keep quality stable when the network is changing. But VBR can also create bursts that stress a tight uplink if QoS is not set. On links that are already near full, a steady bitrate can be easier to plan.

PLC: the “hide the hole” feature

Packet loss concealment (PLC) is one of the most valuable features in real VoIP. It guesses what the missing audio might have been and fills in a short gap. It works well for small, random loss. It does not save a call with heavy loss, but it often turns a “complaint call” into a “fine call.”

Jitter buffers: the delay vs stability trade

A jitter buffer stores a small amount of audio so playback stays smooth even when packets arrive unevenly. Bigger buffers handle worse jitter but add delay. Smaller buffers keep speech snappy but can expose timing gaps. Adaptive jitter buffers can work well, but they can also “chase the problem” if the network is very unstable.

Feature	Helps with	Risk	Practical starting point
VBR (Opus)	Changing network conditions	Bandwidth bursts	Use with QoS and headroom
PLC	Small packet loss	Limited under heavy loss	Keep enabled
Fixed jitter buffer	Predictable jitter	Can be too small/large	Use on stable links
Adaptive jitter buffer	Variable jitter	Adds delay swings	Use on WAN, monitor delay
FEC (Opus)	Loss recovery	More bandwidth	Enable only if loss is real

In DJSlink-style deployments with SIP intercoms and emergency phones, stability is the priority. PLC stays on. Jitter buffers are tuned for the network path. Variable bitrate is great for Opus, but only if the uplink and QoS can handle it.

How do I choose codecs for SIP trunks, intercoms, and recording?

Many systems fail here because one codec strategy is forced across every device. A trunk wants compatibility. An intercom wants clarity and stability. Recording wants consistency and storage control.

Choose codecs by call path: trunks for compatibility (often G.711), intercoms for clarity and robustness (G.711/G.722, sometimes Opus), and recording for consistency (avoid transcoding, pick a format your storage and compliance rules can support).

SIP trunks: prioritize interoperability and predictable calls

Most carriers accept G.711 without drama. Many also accept G.729, but quality and licensing details differ by provider and platform. Opus may be possible on some trunks, but it is not a safe assumption. For trunk design, stability beats novelty. A clean “G.711 end-to-end” plan avoids transcoding and reduces surprises. If bandwidth is tight, G.729 can help, but testing under loss is important. In some networks, a higher bitrate codec with better PLC can sound better than a low bitrate codec under real loss.

Intercoms and emergency endpoints: prioritize intelligibility

Intercom audio is often captured in noisy places. Wideband can help, but only if the microphone and speaker are decent and the path supports it. For many access control and gate intercoms, G.711 still wins because it is supported everywhere and stays stable. For indoor stations or higher-end endpoints, G.722 can add clarity on LAN. For modern deployments with full control of both ends, Opus can be excellent, but it should be validated with the PBX and SBC.

Recording: prioritize “no surprises”

Recording is where transcoding pain shows up. If calls negotiate different codecs, the PBX may transcode for recording. That adds CPU and can reduce capacity during peak hours. A clean approach is to align trunk codec, endpoint codec, and recording codec when possible. If storage is cheap, recording in a higher quality format can help audits and speech analytics. If storage is tight, compressed formats help, but they can complicate later processing.

Scenario	Best default choice	Why	What to avoid
Carrier SIP trunk	G.711	Highest compatibility	Forcing Opus when carrier does not support it
Low-bandwidth site	G.729 or Opus (if supported)	Saves bandwidth	Transcoding every call on a small PBX
Gate/intercom to PBX	G.711 (or G.722 on LAN)	Clear and stable	Mixed codecs without a plan
Call recording	Match trunk codec or record PCM	Predictable and low CPU	Recording that requires constant transcoding

A simple rule works well: use one main codec per call path, keep codec lists short, set a sane ptime, and avoid transcoding unless it is a deliberate decision.

Conclusion

A codec is the voice engine of VoIP. Pick it by call path, tune ptime and buffers, avoid transcoding, and let the network and device limits guide the final choice.

Footnotes

RTP standard describing how real-time audio frames are carried over IP networks. ↩ ↩
SIP standard for call signaling and session setup used by VoIP systems. ↩ ↩
SDP standard explaining how endpoints advertise codecs and media parameters during call setup. ↩ ↩
ITU-T reference for PCM voice coding, including µ-law and A-law used in telephony. ↩ ↩
ITU-T reference for 8 kbit/s CS-ACELP speech coding and related annex options. ↩ ↩
Opus specification describing adaptive bitrate, wideband audio, and interactive VoIP use cases. ↩ ↩
MOS terminology guidance for interpreting and reporting subjective voice-quality scores consistently. ↩ ↩

About The Author

DJSLink R&D Team

DJSLink China's top SIP Audio And Video Communication Solutions manufacturer & factory .
Over the past 15 years, we have not only provided reliable, secure, clear, high-quality audio and video products and services, but we also take care of the delivery of your projects, ensuring your success in the local market and helping you to build a strong reputation.