What is a VoIP phone system for my office?

Most offices still pay for old phone lines, while their data network sits under-used, so they carry two separate infrastructures for one simple thing: talking.

Table of Contents hide

1 How do IP PBX and cloud PBX compare on cost?

1.1 Look at total cost, not just license price

1.2 Tie cost to control and integration

2 Which codecs should I enable for call quality?

2.1 Understand what codecs actually do

2.2 Practical codec strategy for an office

3 Can I secure VoIP with TLS, SRTP, and VLANs?

3.1 SIP signalling and media encryption basics

3.2 VLANs, QoS, and edge protection

4 Why do my VoIP calls jitter on Wi-Fi?

4.1 Why Wi-Fi is harder for voice than Ethernet

4.2 Practical steps to reduce jitter

5 Conclusion

6 Footnotes

A VoIP phone system runs all office calls over your data network and internet using SIP and RTP, giving you extensions, features, and external calling without old-style phone lines.

Row of SIP desk phones and PCs in empty modern office — Hybrid IP desk deployment

With VoIP, phones become just another IP device. An IP PBX ¹ or a cloud solution based on hosted PBX systems ² handles routing, features, and SIP trunks. Desk phones, softphones, and even SIP intercoms share the same platform.

Under the hood, most office VoIP uses Session Initiation Protocol (SIP) ³ to register endpoints and set up calls, and Real-time Transport Protocol (RTP) ⁴ to carry the actual voice media once the call is established. In our own deployments, once voice, intercom, and paging move to IP, projects for buildings and security become much easier to scale and maintain.

How do IP PBX and cloud PBX compare on cost?

Teams often start by asking, “which is cheaper?” and then forget to include trunks, support, and growth in the math.

On-prem IP PBX is CapEx-heavy but cheaper per seat at scale. Cloud PBX is OpEx, faster to start, and often cheaper for small teams or very dynamic headcount.

Side-by-side comparison chart of on-prem PBX IP vs cloud phone solution — On-prem vs cloud PBX

Look at total cost, not just license price

When I compare IP PBX and cloud PBX, I split the cost into buckets: setup, monthly, and long-term flexibility. It helps to put numbers into a simple model, even if they are rough.

Typical cost factors:

Cost element	IP PBX (on-prem)	Cloud PBX
Core system	One-time license / appliance / VM	Per-user monthly subscription
SIP trunks and minutes	Direct with carriers	Often bundled, sometimes separate
Hardware	IP phones, PoE switches, server, UPS	IP phones, PoE switches (still needed)
Maintenance and upgrades	Your IT or partner	Included in service fee
Scaling up/down	Add licenses and trunks; hardware capacity	Add/remove seats in portal
Redundancy	Extra servers, HA, backup WAN	Built-in DC redundancy (varies by provider)

For small offices (say 5–20 users), cloud PBX is usually easier and often cheaper in the first years. I pay a per-user fee, get all the features, and do not worry about servers.

For medium or large sites, or where voice and security integrate deeply, an on-prem IP PBX can win over a few years because:

SIP trunks scale by concurrent calls, not by user count.
One PBX can host many extensions, intercoms, and paging endpoints.
I own the platform and can tune SIP, routing, and integrations very precisely.

A simple example:

60 users, moderate call volume.
Cloud PBX at $20/user/month → $1,200/month before extra minutes.
IP PBX license and server might be a few thousand once, plus SIP trunks, support, and hardware.

Over three to five years, the on-prem investment often becomes cheaper per seat, especially when you add SIP intercoms, SIP speakers, and security devices that do not need full “user” licenses in the cloud.

Tie cost to control and integration

Cost is not the only axis. I also look at:

Control: need deep SIP tweaking, custom dialplans, or special emergency routing? IP PBX gives more room.
Security and compliance: some sectors want media and call records on-site.
Integration: if I integrate with SIP intercoms, access control, paging, and PAGA, owning the PBX often simplifies things.

In many projects we end up with hybrid: an on-site IP PBX for critical devices and local survivability, and cloud UC for knowledge workers. The “cheapest” solution is the one that matches your risk, features, and growth, not only the monthly number on paper.

Which codecs should I enable for call quality?

If every device has a different codec list, calls will still connect, but you pay with transcoding, CPU load, and strange audio problems.

For office VoIP, I enable G.711 as my baseline, add G.722 or Opus where supported, and keep compressed codecs like G.729 only for special low-bandwidth links.

Color tiles highlighting G.711 and G.7xx codec feature groups — VoIP codec feature blocks

Understand what codecs actually do

Codecs trade bandwidth for quality and CPU. Some common ones:

Codec	Bandwidth (approx)	Quality	Typical use
G.711	80–90 kbps	“PSTN” narrowband	Default for most SIP trunks
G.722	80–90 kbps	Wideband (HD)	Office LAN, internal calls
Opus	24–64 kbps+	Very flexible HD	Softphones, WebRTC, variable links
G.729	~30–40 kbps	Compressed NB	Low bandwidth, older gear

G.711 is simple and compatible. Most carriers and legacy gateways expect it. It uses more bandwidth than compressed codecs, but on modern office links this is usually fine.

G.722 (wideband) sounds much clearer for internal calls. Voices feel more natural, which helps fatigue and understanding in long calls. Many IP phones support it.

The Opus audio codec ⁵ is great in softphones and modern systems because it adapts well to changing network conditions and can maintain strong quality at lower bitrates.

Practical codec strategy for an office

A simple, safe approach:

On LAN and internal calls: prefer G.722 (or Opus) first, then G.711.
On SIP trunks: use G.711 as primary, match what the carrier supports.
On constrained links or older hardware: consider G.729 if licenses and support exist.

On each device and trunk, I order codecs like this, for example:

G.722
G.711 (A-law or μ-law, depending on region)
Opus (for softphones / WebRTC, if the PBX supports it)

The PBX should then handle transcoding only when needed, not for every call. Less transcoding means less CPU, less latency, and fewer points of failure.

When you test codecs, use real calls:

Internal extension-to-extension calls.
Calls over each SIP trunk.
Calls that include SIP intercoms or emergency phones.

If any device cannot handle wideband correctly, you may limit it to G.711 to avoid strange audio. The target is a small, consistent codec set across devices and trunks, not a long list “just in case”.

Can I secure VoIP with TLS, SRTP, and VLANs?

Many VoIP systems work fine on day one but are wide open: cleartext SIP, no encryption, flat LAN, and default passwords that invite abuse.

Yes. I can secure VoIP by encrypting SIP with TLS, media with SRTP, isolating voice traffic on VLANs, and adding SBCs, strong credentials, and rate limits.

Encrypted SIP desk phone in front of secure network lock diagram — Secure VoIP signaling and media

SIP signalling and media encryption basics

Security has two planes:

Signalling (SIP): who calls whom, caller ID, dialed number, registration, and control.
Media (RTP): the audio (and video) stream itself.

I secure them like this:

Use Transport Layer Security (TLS) ⁶ for SIP so registrations and call control are encrypted.
Use Secure Real-time Transport Protocol (SRTP) ⁷ for audio so media packets are encrypted and harder to intercept.

Most modern IP phones, softphones, and PBXs support both. On SIP trunks, it depends on the carrier; many now offer TLS/SRTP options, especially for business and government lines.

Even with encryption, I still:

Use strong SIP passwords and random usernames.
Restrict which IPs can send SIP traffic (firewall, SBC, or both).
Limit dialling to allowed countries and number ranges to reduce toll fraud risk.

VLANs, QoS, and edge protection

Network separation helps both quality and security:

Voice VLAN: place phones and SIP intercoms on a dedicated VLAN.
QoS: mark voice packets (DSCP) and give them higher priority on switches and routers.
DHCP and provisioning: control where phones get their configs and firmware.

This way:

Data storms or backup jobs on the main LAN are less likely to hurt voice.
A misconfigured PC has less chance of attacking phone infrastructure directly.
It is easier to apply firewall rules at the edge for “voice network” vs “everything else”.

An SBC (Session Border Controller) or a hardened SIP edge device sits between your PBX and the outside world. It:

Hides internal IP addresses.
Normalises SIP from different carriers and remote phones.
Applies rate limits, SIP DoS protection, and protocol sanity checks.

In our deployments for security and industrial projects, this edge layer is critical. SIP intercoms, emergency phones, and access control cannot be down because someone scanned the SIP port or guessed a weak password.

Why do my VoIP calls jitter on Wi-Fi?

On the LAN everything sounds good, but as soon as someone walks with a Wi-Fi handset or uses a laptop softphone, voices break, words repeat, or delays jump.

VoIP jitter on Wi-Fi comes from interference, contention, roaming, and power-saving. The fix is better Wi-Fi design, QoS, and often “less Wi-Fi, more wire” for phones.

Knowledge worker on SIP desk phone while using laptop in open office — Unified voice and desktop workflow

Why Wi-Fi is harder for voice than Ethernet

Wi-Fi is shared radio. Everyone takes turns, and the environment changes all the time. Problems that hit VoIP first:

Interference from other Wi-Fi networks, microwaves, Bluetooth, and devices.
Contention when many clients share the same channel.
Roaming delays when devices move between access points.
Power save modes on laptops and phones that pause radio too aggressively.

Voice is real-time. Small delays and dropped packets that are invisible to web browsing become obvious as choppy audio, echo, or robotic speech.

Ethernet gives each phone a dedicated, full-duplex link with low latency and almost no interference. That is why, for fixed desks and most SIP intercoms, I still prefer wired connections and PoE.

Practical steps to reduce jitter

If Wi-Fi must carry voice, I treat it as a voice network, not just “free internet in the office”:

Use 5 GHz or 6 GHz bands for voice; avoid crowded 2.4 GHz where possible.
Create a separate SSID for VoIP devices with higher QoS priority.
Enable WMM / Wi-Fi QoS, so voice frames get priority over bulk traffic.
Design coverage so roaming is smooth; avoid dead zones and too-few APs.
Avoid overloading a single AP with many roaming voice devices.

On the endpoint side:

Turn off aggressive power save for Wi-Fi handsets if possible.
Prefer wired headsets + wired laptops for heavy softphone users.
Use Opus or wideband codecs that handle loss and jitter better, if your PBX supports them.

If you still see issues, capture metrics:

Jitter, latency, and packet loss from phones or softphones.
AP load and channel utilisation from your Wi-Fi controller.

Often, once you clean up the channels and apply QoS, jitter drops to acceptable levels. For critical positions (reception, security, dispatch), I still insist on wired IP phones. Wi-Fi becomes an extension and backup, not the only option.

Conclusion

A VoIP phone system turns your data network into a full voice platform; with the right PBX model, codecs, security, and network design, it gives clear, secure calls from desk to door station.

Footnotes

Definition and key characteristics of IP PBX systems used for on-prem VoIP call control. ↩ ↩
Overview of hosted PBX models and how cloud telephony is delivered as a managed service. ↩ ↩
SIP standard for how phones register, set up calls, and control transfers and features. ↩ ↩
RTP standard describing how real-time voice media packets are carried once a call is established. ↩ ↩
Opus codec spec: adaptive, high-quality audio for softphones, WebRTC, and variable networks like Wi-Fi. ↩ ↩
TLS 1.3 spec for encrypting SIP signalling so registrations and call control aren’t readable on the network. ↩ ↩
SRTP spec for encrypting RTP media to protect voice content from eavesdropping and tampering. ↩ ↩

About The Author

DJSLink R&D Team

DJSLink China's top SIP Audio And Video Communication Solutions manufacturer & factory .
Over the past 15 years, we have not only provided reliable, secure, clear, high-quality audio and video products and services, but we also take care of the delivery of your projects, ensuring your success in the local market and helping you to build a strong reputation.