What is Comfort Noise Generation (CNG) in VoIP?

Dead-silent calls feel broken. Users think the line dropped, then they talk over each other, and support teams chase “network issues” that are not real.

Table of Contents hide

1 Why CNG exists in real VoIP networks

1.1 Silence is not neutral for humans

1.2 How CNG is delivered without streaming noise

1.3 Where CNG sits in the media pipeline

2 How does CNG work with VAD and silence suppression?

2.1 The three-part handshake: VAD → DTX → CNG

2.2 Why SID update rate matters

2.3 Interactions with NAT and keepalives

3 What comfort noise level suits SIP phones and intercoms?

3.1 Think in “perceived continuity,” not absolute voltage

3.2 Practical tuning approach that works across brands

3.3 Special notes for SIP intercoms

4 How do codecs G.711, G.729, and Opus implement CNG?

4.1 G.711: simple waveform, optional Annex B behavior

4.2 G.729: bandwidth saver with common Annex B usage

4.3 Opus: flexible, modern, and implementation-dependent

5 Why do I hear hiss with CNG and how to fix it?

5.1 The most common causes in the field

5.2 Fix steps that work in a predictable order

5.3 Quick troubleshooting table for support teams

6 Conclusion

7 Footnotes

Comfort Noise Generation (CNG) adds a controlled, low-level background noise during silence so VoIP calls feel continuous. It works with VAD and DTX to save bandwidth without making the line sound dead.

Two callers with comfort noise keeping VoIP conversation connected — Comfort noise VoIP call

Why CNG exists in real VoIP networks

Silence is not neutral for humans

Silence in packet voice is not the same as silence in PSTN. Traditional phone lines always had some background noise from the analog circuit. People got used to that soft “air.” When VoIP uses silence suppression, the receiver can output absolute digital silence. That can feel like a disconnect, even when the call is active. In customer support, this is one of the most common false alarms: “the call drops when nobody talks.”

CNG solves that perception gap. It generates a small noise bed that matches the remote environment so the listener hears continuity. This matters a lot in SIP door intercoms and speakerphones, where talk spurts and pauses are frequent. A visitor speaks, pauses to listen, and silence can feel like the intercom stopped working.

How CNG is delivered without streaming noise

CNG is efficient because it does not transmit full audio during silence. Instead, the sender uses VAD to detect non-speech frames. When silence is detected, the sender stops sending normal voice payload and sends compact updates that describe the background noise. In many VoIP systems these updates are called Silence Insertion Descriptor (SID) frames ¹. The receiver uses SID parameters to synthesize noise locally. This is why bandwidth drops during silence but the call still sounds “alive.”

Where CNG sits in the media pipeline

In a good audio chain, CNG should not interfere with echo cancellation or mixing. A practical placement is:

Capture near-end mic
Run AEC and noise suppression
Decide VAD / DTX state
Generate or update comfort noise parameters
Send RTP (Real-time Transport Protocol) ² packets (speech or SID updates)

In conferencing, CNG needs extra care. If every participant injects comfort noise, the mixer can sum multiple noise beds and create a higher hiss floor. This is why some conference bridges prefer to generate one comfort noise bed centrally, or they reduce CNG energy when many silent participants exist.

Component	Job	What goes wrong when mis-set
VAD	Detect speech vs silence	Speech gets cut or noise triggers “talking”
DTX / silence suppression	Stop sending full-rate speech	NAT bindings can age out if no keepalives
SID updates	Describe background noise	“Pumping” or sudden noise jumps
CNG synthesis	Generate matching noise locally	Hiss too loud, dead silence, or clicks

CNG is not a “nice-to-have.” It is part of making VoIP feel natural, especially for SIP phones, SIP intercoms, and paging endpoints that live in noisy spaces.

If CNG is understood as a controlled illusion, setup becomes simpler: keep it subtle, keep it stable, and keep transitions smooth.

Now the next question is the core mechanics: how CNG works with VAD and silence suppression in real RTP streams.

A clear mental model prevents most bad tuning decisions.

How does CNG work with VAD and silence suppression?

Silence suppression saves bandwidth, but it can make calls feel unstable. If users think the call dropped, they talk over each other and call quality feels worse.

VAD decides when speech stops, DTX stops sending full speech packets, and CNG fills the silence with synthetic noise. The sender sends small SID updates, and the receiver generates noise until speech resumes.

Full VAD and CNG processing block diagram for comfort noise generation — VAD CNG workflow

The three-part handshake: VAD → DTX → CNG

Voice Activity Detection (VAD) ³ is the gatekeeper. It looks at short frames, often 10–30 ms, and labels them speech or non-speech. When VAD says “non-speech,” DTX kicks in. DTX is the policy that reduces transmission during silence. In VoIP, this is often called discontinuous transmission (DTX) ⁴. Instead of sending 50 RTP packets per second of near-zero audio, the endpoint sends either nothing or very small periodic updates.

CNG is the listener-side result. The receiver generates a noise bed so silence does not sound like a hard mute. Good receivers also cross-fade when switching from synthetic noise to real speech to avoid clicks or sudden level jumps at the start of a talk spurt.

Why SID update rate matters

Background noise is not constant. A quiet office can turn into keyboard clicks. A lobby can shift from calm to busy. If SID frames are too infrequent, the receiver keeps using stale parameters, and the comfort noise can “pump” or jump when an update finally arrives. If SID frames are too frequent, bandwidth savings drop and some devices behave poorly.

A practical approach is to keep SID updates periodic and allow extra updates when noise changes a lot. Many systems do this automatically. When manual settings exist, stable is better than aggressive.

Interactions with NAT and keepalives

Silence suppression can reduce RTP traffic enough that NAT mappings age out on some routers. This can cause one-way audio after long silence, especially on consumer-grade NAT with short UDP timeouts. In deployments with SIP intercoms behind NAT, it helps to ensure:

SIP keepalives (REGISTER refresh and/or CRLF keepalive)
RTP keepalive behavior if supported
Reasonable UDP timeout settings on the edge firewall

The goal is simple: silence suppression should not accidentally close the media path.

Stage	Sender behavior	Receiver behavior	Key risk
Speech	Send full RTP with codec payload	Decode and play	Normal jitter/loss handling
Silence begins	VAD switches state, DTX reduces packets	Cross-fade into comfort noise	First syllable clipping if VAD slow
Long silence	Send periodic SID updates	Generate noise from SID	NAT timeout or stale noise parameters
Speech resumes	Resume full RTP	Fade out CNG and play speech	Clicks if transitions are abrupt

When these pieces are aligned, a call stays natural and efficient. When they are not aligned, users complain about “random hiss,” “cut words,” or “audio drops after silence.”

Next is a practical question that comes up in every installation checklist: what comfort noise level should be used for phones and intercoms.

The answer is not one number, but there are safe targets.

What comfort noise level suits SIP phones and intercoms?

Comfort noise that is too low feels like a dead line. Comfort noise that is too high sounds like a hiss problem. Both create support tickets.

A good CNG level is subtle and close to the real background noise floor. For SIP phones, it should be barely noticeable. For SIP intercoms in noisy areas, it should match the environment but never mask speech onset.

Engineer measuring lobby intercom room tone levels with SIP desk phone — Room tone calibration

Think in “perceived continuity,” not absolute voltage

Many devices do not expose comfort noise in dBFS or Vrms. They expose it as Low/Medium/High or Auto. In those cases, the target is still the same: make silence feel continuous but not distracting. The best CNG is the one nobody notices.

In quiet offices, comfort noise should be very low. In loud lobbies, comfort noise can be higher, but it should not become a constant hiss that operators hear all day. On door intercoms, the environment can change fast. Wind and street noise may spike, then drop. If the system chases those changes too aggressively, the noise bed becomes unstable.

Practical tuning approach that works across brands

A simple setup method avoids guessing:
1) Put the endpoint in its real environment.
2) Start a call and stay silent for 10–15 seconds.
3) Listen on the far end with good speakers or a headset.
4) Increase CNG only until silence feels “connected.”
5) Speak softly and check that the first syllable is not masked or clipped.

If the far end hears a strong hiss during silence, CNG is too high, or the noise model is wrong. If the far end hears pure dead silence and thinks the call dropped, CNG is too low or disabled.

Special notes for SIP intercoms

Intercoms often use speakerphone mode. That means AEC, noise suppression, and AGC are active, and they can reshape the noise floor. If AGC boosts during silence, the system may raise the perceived comfort noise too much. For that reason, it helps to keep AGC moderate and avoid excessive mic gain.

In DJSlink-style deployments, a common pattern is to keep CNG set to Auto or Low, then focus on microphone placement and noise suppression quality. When the physical audio is clean, CNG can stay subtle.

Environment	Suggested CNG approach	What to watch
Quiet office SIP phones	Very low / Auto	Dead silence perception vs faint hiss
Call center headset	Often disable or very low	Headsets make hiss more obvious
Lobby intercom	Low to medium, stable	Noise “pumping” when crowd changes
Outdoor door station	Auto with smoothing	Wind noise causing false noise updates
Industrial paging point	Medium, but controlled	Noise masking soft speech starts

Comfort noise should support conversation, not become the main thing people hear. If CNG becomes noticeable, the next step is often codec behavior and how SID is carried.

That leads into codec-specific behavior, because not all codecs handle CNG the same way.

How do codecs G.711, G.729, and Opus implement CNG?

Many VoIP issues happen after a codec change. People blame compression, but the real change was VAD/DTX/CNG behavior and how silence is represented.

G.711 and G.729 have standardized VAD/DTX/CNG options (often called Annex B) that use SID-style updates. Opus can use DTX and decoder-side noise handling, and it may rely on comfort noise payloads or internal estimation depending on implementation.

G.711 Annex B and RFC 3389 comfort noise signaling diagram — G.711 comfort noise

G.711: simple waveform, optional Annex B behavior

The ITU-T G.711 codec standard ⁵ is a waveform codec (PCMU/PCMA). It is heavy in bitrate compared to compressed codecs, but it is simple and interoperable. CNG with G.711 is commonly implemented with an optional mode often referred to as Annex B behavior in many systems. In practice, systems either:

Send “silence” as low-level PCM continuously (no DTX)
Or enable VAD/DTX so silence uses SID updates and receiver-side CNG

Because G.711 is so widely supported, mismatches can happen when one side expects CN payload handling and the other side does not. In those cases, silence can become dead quiet or turn into odd comfort noise artifacts.

G.729: bandwidth saver with common Annex B usage

The ITU-T G.729 speech coding standard ⁶ is a low-bitrate codec, and it often uses VAD/DTX/CNG options in real deployments. When enabled, silence periods reduce bandwidth further, and SID parameters update the receiver’s noise generator. The key practical point is that G.729 endpoints can differ in how aggressively they trigger VAD and how often they send SID updates. In mixed-vendor networks, this is a common source of “hiss changes” or clipped word starts.

For SIP intercoms that need reliability, the safest approach is to validate cross-vendor behavior in a short test matrix before large deployment. One bad codec pairing can create hundreds of “audio is weird” tickets.

Opus: flexible, modern, and implementation-dependent

The Opus codec specification (RFC 6716) ⁷ supports a wide range of bitrates and bandwidth modes. It also supports DTX-style behavior so the encoder can reduce traffic during silence. In many Opus deployments, the decoder can generate a comfort-noise-like output based on recent signal statistics, and packet loss concealment can also behave like synthetic noise during gaps. Some systems also use a dedicated comfort noise payload type in RTP for explicit CN handling, depending on the stack and negotiation.

The practical takeaway is not the internal math. The takeaway is interoperability:

Keep Opus settings consistent across the call path when possible.
Avoid unnecessary transcoding at the PBX/SBC.
Validate silence behavior on the exact endpoints used in the project.

Codec	Typical CNG style	Strength	Common deployment risk
G.711	Optional DTX/CNG mode	Best compatibility	“One side ignores SID/CN” mismatch
G.729	Often used with VAD/DTX/CNG	Low bandwidth	Aggressive VAD clips word starts
Opus	DTX + decoder noise handling	Best quality per bitrate	Different stacks behave differently, transcoding hurts

Codec choice should be driven by the network and the endpoints, not by a single “best codec” claim. For SIP intercoms, clarity and interoperability usually beat squeezing bandwidth at all costs.

If the symptom is “hiss during silence,” the codec is only one suspect. The next section covers why hiss happens and how to fix it without guessing.

Why do I hear hiss with CNG and how to fix it?

Hiss complaints are common because comfort noise is supposed to be subtle. When users notice it, the level is wrong, the noise model is unstable, or the chain is amplifying noise.

Hiss with CNG usually comes from comfort noise set too high, unstable SID updates, gain staging that boosts noise, or codec/interoperability mismatches. Fix it by lowering CNG level, stabilizing VAD/DTX behavior, and correcting gain and DSP order.

Comparison of hissy comfort noise and tuned CNG during VoIP calls — Hissy vs tuned CNG

The most common causes in the field

1) CNG level too high
This is the simplest. Many devices ship with aggressive defaults meant to avoid dead silence. On headsets and quiet rooms, that “safe” level becomes obvious hiss.

2) AGC or mic preamp noise being amplified
If the near-end audio chain is noisy, CNG can expose it. Some systems boost during silence. That makes noise floors audible. The fix is often proper mic gain staging and less aggressive AGC.

3) SID updates too slow or noise floor tracking too reactive
If the environment changes and SID updates lag, the noise bed can jump. People perceive this as pumping or hiss that changes. A smoother update behavior is better than chasing every change.

4) Transcoding or mixed endpoint behavior
When a PBX or SBC transcodes, it can break silence behavior. One side may send SID or CN updates, while the other side expects continuous comfort noise, or vice versa. This mismatch can create hiss bursts, dead silence, or strange tonal noise.

5) Conference mixing summing multiple CNG beds
In multi-party calls, multiple comfort noise sources can add up. The mix becomes a louder hiss floor. A bridge that manages comfort noise centrally is often cleaner.

Fix steps that work in a predictable order

I prefer a simple sequence:

First, lower CNG level (or set it to Auto/Low).
Second, confirm VAD is not clipping speech. If it clips, adjust thresholds and hangover so talk spurts start cleanly.
Third, check gain staging: reduce mic gain if it clips, reduce speaker gain if it distorts, and avoid extreme AGC.
Fourth, avoid transcoding. Keep a consistent codec end-to-end when possible.
Fifth, retest in the real environment, not only in a lab.

Quick troubleshooting table for support teams

Symptom	Likely cause	Fast fix
Constant noticeable hiss in silence	CNG level too high	Set CNG Low/Auto or reduce noise level
Hiss “pumps” up and down	SID updates unstable	Increase smoothing, avoid aggressive noise tracking
Hiss only on conference calls	Summed comfort noise	Reduce per-stream CNG or use bridge-controlled noise
Hiss only after codec change	Interop mismatch or transcoding	Lock codec path, verify VAD/DTX settings match
Hiss plus clipped first syllables	VAD too aggressive	Lower start threshold, add hangover, keep CNG subtle

For SIP intercoms connected to amplifiers or paging systems, the same logic applies, with one extra point: paging amps can amplify noise floors aggressively. If the line input gain is too high, any comfort noise becomes more obvious. A clean gain plan keeps intercom audio near nominal and trims the amplifier input instead of boosting everything.

Conclusion

CNG keeps VoIP calls feeling alive during silence by generating subtle background noise. With VAD and DTX it saves bandwidth, but correct levels, stable SID behavior, and clean gain staging prevent hiss and speech clipping.

Footnotes

Defines RTP comfort-noise payload type and CN packet format. ↩ ↩
Specifies RTP packet format used to carry real-time audio and video. ↩ ↩
Overview of VAD concepts and common uses in communications systems. ↩ ↩
Explains DTX and its role in reducing transmissions during silence. ↩ ↩
ITU-T standard for 64 kbps PCM telephony audio (PCMU/PCMA). ↩ ↩
ITU-T standard for 8 kbps CS-ACELP speech coding used in VoIP. ↩ ↩
Defines Opus interactive speech and audio codec used in VoIP/WebRTC. ↩ ↩

About The Author

DJSLink R&D Team

DJSLink China's top SIP Audio And Video Communication Solutions manufacturer & factory .
Over the past 15 years, we have not only provided reliable, secure, clear, high-quality audio and video products and services, but we also take care of the delivery of your projects, ensuring your success in the local market and helping you to build a strong reputation.