Network bandwidth is the maximum data rate on my links. VoIP uses small, steady streams. I must size for simultaneous calls, add overhead, leave headroom, and protect voice with QoS.

Diagram of WAN bandwidth pipe showing SIP trunk from DJSlink, SIP phone and IP PBX using about 80–90 Kbps RTP per VoIP call — How Much WAN Bandwidth Each SIP RTP Call Consumes in Kbps

Bandwidth alone is not the full story. Delay, jitter, and loss turn small gaps into big problems. Good design keeps voice first. The rest fits around it. Now let us break it down and do the math.

How much bandwidth do SIP calls and video need per user?

People ask, “How many calls fit on my line?” The answer depends on codec, packet size, and overhead. The numbers are simple and practical.

Per-call bandwidth = codec payload + IP/UDP/RTP + L2/VPN overhead. With 20 ms packets: G.711 ≈ 80–90 kbps one way, G.729 ≈ 24–32 kbps, Opus wideband ≈ 32–50 kbps.

Comparison table of different wideband and narrowband VoIP profiles with columns of throughput and capacity figures — Capacity Planning Table Comparing VoIP Codec Bandwidth and Concurrent Calls

Dive deeper Paragraph:

Why payload is not the full rate

A codec’s bitrate (like 64 kbps for G.711) is only the audio payload. Each packet also carries IP, the User Datagram Protocol (UDP) ¹, and the Real-time Transport Protocol (RTP) ² headers (40 bytes). Layer-2 adds more (Ethernet ~18 bytes). VPNs add 5–50+ bytes. Packetization interval (ptime) changes how often I send those headers. Short ptime sends more packets per second. That raises overhead and bandwidth but lowers mouth-to-ear delay. Long ptime saves bandwidth but raises delay and loss risk.

If you want the canonical specs behind the planning numbers, start with the G.711 PCM recommendation ³ and the Opus wideband codec RFC ⁴.

Quick reference table (20 ms ptime, typical LAN/WAN, one way)

Codec (mode)	Payload kbps	Packets/s	Est. kbps incl. headers	Est. kbps with VPN/overhead
G.711 (PCMU/A)	64	50	~80–90	~90–110
G.722 (WB)	64	50	~80–90	~90–110
Opus (WB 24–32 kbps)	24–32	50	~32–50	~40–60
G.729	8	50	~24–32	~28–36
Opus (WB 48 kbps)	48	50	~60–70	~70–85

These ranges include common Layer-2 and IP header costs. They vary with VLAN tags, PPPoE, GRE/IPsec, or SRTP.

Packetization and its trade-offs

10 ms ptime: Higher bandwidth. Lower latency. Better PLC. More packets, more CPU.
20 ms ptime: Good balance for most SIP trunks.
40 ms ptime: Lower bandwidth. Higher delay. More damage when a packet drops. Use with care.

Video and screen share (rough planning, per stream)

Media	Typical codec	Rough kbps (one way)	Notes
HD Voice (WB)	Opus/G.722	40–70	Per audio stream
720p30 video	H.264	1,000–2,500	Variable with content
1080p30 video	H.264	2,500–5,000	Needs clean uplink
Screen share	H.264	700–2,000	Text-heavy scenes compress well

Capacity formula I use

Total per direction ≈ concurrent external call legs × per-call kbps
Add headroom: +20–30% for bursts, FEC, and codec swings
Remember symmetry: voice is per direction; size uplink and downlink separately

Example: 20 calls on Opus ~45 kbps one way → 20 × 45 = 900 kbps. Add 30% = 1.17 Mbps. I reserve ≥1.2–1.5 Mbps each way for clean margin.

How do I measure my available upload and download bandwidth?

Guessing leads to surprise. Measure real sustained rates, not only speed test peaks. Measure when staff works, not at midnight.

Run multi-minute tests for up and down. Watch sustained throughput, latency under load, and jitter. Verify at peak hours. Then log interface rates on edge devices.

Network engineer using a laptop connected by Ethernet to a test appliance while monitoring traffic graphs in an office — Measuring WAN Utilization and VoIP Performance with Network Test Equipment

Dive deeper Paragraph:

What to test, and why it matters

A short speed burst hides bufferbloat. I care about steady throughput with latency under load. So I use tests that saturate the link for 1–3 minutes and record RTT and jitter during the run. I test both directions because uplink is usually the choke point for VoIP. I repeat during busy hours. I save results over time to catch slowdowns or ISP shaping.

Tools and signals to collect

Interface counters on the router or firewall: bits per second, errors, drops.
Latency under load (ping during upload): if RTT jumps by 100+ ms, calls may clip.
Jitter and loss: these reveal queuing stress, not just line rate.
95th percentile throughput: good for planning recurring peaks.
MOS or R-factor from active probes: reflects real call quality, not only bandwidth.

A simple test plan

Step	Action	Target
1	Baseline idle RTT/jitter	RTT < 30–80 ms to provider POP, jitter < 5 ms
2	3-minute upload saturation	Latency increase < 30–50 ms with SQM; no loss
3	3-minute download saturation	Same as upload
4	Dual-direction test	Verify QoS keeps voice priority
5	Peak-hour repeat	Confirm stability under business load

Where tests go wrong

Testing from Wi-Fi instead of wired.
Testing during off hours only.
Ignoring VPN overhead.
Using a single stream when ISP favors speed test CDNs.
Skipping long enough runs to fill buffers.

When results swing, I check for background syncs, cloud backups, software updates, and camera uploads. Those kill uplinks first.

Will low bandwidth cause jitter, choppy audio, or dropped calls?

Low bandwidth alone is not always the culprit. Unmanaged queues cause delay spikes and loss. That sounds like jitter and chop.

When the uplink saturates, packets queue and drop. Jitter rises. RTP arrives late or not at all. A small call stream suffers, even though it needs only tens of kbps.

Isometric data center with server racks and core switches labelled bufferbloat uplink saturated — Illustration of Bufferbloat When Data Center Uplink Is Saturated

Dive deeper Paragraph:

The real enemy is congestion and bufferbloat

Voice packets are tiny but frequent. Bulk transfers like backups and video chunks fill queues. Without smart queuing, RTP waits behind large packets. That adds variable delay (jitter). Jitter buffers hide small swings, but when spikes exceed the buffer, playout gaps appear. Listeners hear robots, repeats, or silence. If queues overflow, loss climbs. PLC and FEC help a little, but they cannot recover long bursts.

Symptoms and quick checks

Symptom	Likely cause	Check
Good off-hours, bad daytime	Uplink saturation	Interface graphs; ping under load
Sudden chop during file sync	Single queue, no QoS	QoS policy and queue stats
Video OK, calls bad	RTP not prioritized	DSCP marking and queue mapping
Calls die on VPN	MTU/MSS wrong	Fragmentation counters, PMTUD logs

Practical thresholds I aim for

Uplink reserve: keep 20–30% free headroom.
Jitter (p95): < 20 ms wired, < 30 ms Wi-Fi/Internet.
Loss: < 0.3% average, zero bursts > 100 ms.
One-way latency: < 150 ms end-to-end for clean conversation.

What not to expect from jitter buffers

A bigger buffer does not create bandwidth. It only smooths arrival times at the cost of extra delay. Beyond ~150–200 ms one way, conversation suffers even with perfect audio frames. If I must increase buffers to survive, I then fix the congestion root cause.

How can I prioritize VoIP bandwidth with QoS, VLANs, and codecs?

Voice wins when I mark it, queue it right, and keep paths short. I also choose codecs that fit my links and users.

Mark RTP EF (DSCP 46). Put voice in a strict low-latency queue. Shape bulk traffic below line rate with FQ-CoDel/PIE. Use a voice VLAN. Pick Opus or G.711 per site.

Wi-Fi router with WAN and QoS labels marking VoIP traffic using DSCP for expedited forwarding to a server farm — Configuring QoS and DSCP Marking on an Edge Router for VoIP Traffic Priority

Dive deeper Paragraph:

QoS: marking and queuing that actually works

Set phones, PBX, and SBC to mark DiffServ Expedited Forwarding (EF, DSCP 46) ⁵ for RTP and CS3/AF31 for SIP signaling. On switches and routers, trust only the ports that connect to known devices. Rewrite at the edge for everything else. Place EF into a strict priority or low-latency queue with a small buffer. Cap other classes so EF never starves. On small links, enable Smart Queue Management like FQ-CoDel smart queue management ⁶. Set an egress shaper a bit below the ISP rate (for example, at 95–98%). This prevents the ISP’s big buffer from adding delay.

VLANs and clean paths

Use a voice VLAN for phones. This reduces broadcast noise and gives phones a separate DHCP scope with options for provisioning. Keep PoE phones on wired where possible. If Wi-Fi is required, use 5 or 6 GHz, 20 MHz channels, -67 dBm minimum RSSI, and enable Wi-Fi Multimedia (WMM) ⁷ mapping for voice. Limit clients per AP to protect airtime.

Codec choices for each site

Site type	Uplink reality	Best codec choice	Notes
Fiber/clean metro	Plenty of headroom	G.711 or Opus 48k	Simpler, high MOS
DSL/cable busy uplink	Tight upstream	Opus 24–32k	Strong PLC, lower kbps
Mobile/remote users	Variable paths	Opus + ICE/TURN	Traversal + resilience
High packet loss	No quick fix	Opus with FEC	Still fix the path

Avoid frequent transcoding. Keep end-to-end codec the same across trunks and phones. Transcoding adds delay and artifacts.

VPNs, tunnels, and MTU

Tunnels add overhead. Budget +5–20% per call. Clamp MSS to avoid fragmentation. Verify DSCP preservation through the tunnel. If the VPN cannot preserve EF, prioritize by port and flow on the outer header.

A simple deployment checklist

Item	Target	Why
DSCP EF on RTP	Marked and honored	Low latency queueing
Egress shaper	95–98% of line rate	Kill bufferbloat
Voice VLAN	Separate scope	Clean L2 domain
Wi-Fi voice SSID	5/6 GHz, WMM Voice	Airtime priority
Codec policy	Opus or G.711	Fit to link reality
Monitoring	MOS, jitter, loss	Catch regressions

With these steps, voice keeps flowing even when someone uploads large files. Calls stay clear. Complaints drop.

Conclusion

Size calls with overhead, keep 20–30% headroom, and protect RTP with QoS and clean VLANs. Pick codecs to fit links. Measure often and adjust.

Footnotes

Defines UDP, the common transport for RTP media packets. ↩︎ ↩
Defines RTP header behavior used for real-time voice transport. ↩︎ ↩
Official codec specification for G.711 PCM voice. ↩︎ ↩
Defines Opus codec modes, bitrates, and packetization guidance. ↩︎ ↩
Defines EF PHB used to prioritize low-latency voice traffic. ↩︎ ↩
Describes FQ-CoDel AQM/SQM to reduce bufferbloat and latency spikes. ↩︎ ↩
Explains WMM/802.11e QoS categories for voice on Wi-Fi. ↩︎ ↩

About The Author

DJSLink R&D Team

DJSLink China's top SIP Audio And Video Communication Solutions manufacturer & factory .
Over the past 15 years, we have not only provided reliable, secure, clear, high-quality audio and video products and services, but we also take care of the delivery of your projects, ensuring your success in the local market and helping you to build a strong reputation.