What is video calling?

When video calls freeze at the gate or in a meeting room, people lose trust in the whole system. Stable video calling makes your intercom, PBX, and apps feel “simple” again.

Video calling is real-time two-way audio and video over IP networks, where cameras and microphones capture media, codecs compress it, and endpoints decode it so people can see and hear each other as if they are face to face.

Lobby with SIP, RTP, SRTP branding and wall-mounted SIP video door intercom plus laptop video call
SIP / RTP / SRTP enabled video entry system in office lobby

In real projects, that means SIP video phones, door stations, indoor monitors, laptops, and phones. They capture audio and video, encode them with the H.264 video codec 1 or similar codecs, send them over your LAN or the internet, and play them back in a few tens of milliseconds. The quality depends on bandwidth, latency, jitter, and how well you designed the call path.

Next, let’s look at how this works with SIP, WebRTC, SIP door phones, and your IP PBX so you can choose the right design instead of guessing.

How does SIP video calling differ from WebRTC?

To end users, a video tile is a video tile. But for you as the system owner, SIP video and WebRTC behave very differently in real networks.

SIP video calling uses SIP signaling and RTP/SRTP media through a PBX or SBC, while WebRTC lives in browsers and apps with built-in DTLS-SRTP, STUN, and TURN, and usually needs a gateway to talk to SIP devices.

Diagram comparing SIP PBX softphone desktop interface with mobile control application
SIP VoIP phone system dashboard versus companion tablet app

Signaling, identity, and call model

SIP follows the classic telephony model. You dial an extension, SIP URI, or phone number. The IP PBX or SBC routes the call using the SIP signaling standard 2. The same platform already handles your voice, queues, IVR, and intercom.

WebRTC follows an app model. The browser or mobile SDK talks to an application server. Users join with a link, meeting ID, or app account. If you want the formal baseline for what browsers implement, the WebRTC 1.0 specification 3 is the reference point.

A simple comparison:

Aspect SIP video calling WebRTC calling
Identity Extension, DID, SIP URI URL, meeting ID, app account
Signaling SIP over UDP/TCP/TLS Custom app signaling over WebSocket/HTTPS
Media transport RTP / SRTP DTLS-SRTP
Core element IP PBX / softswitch / SBC Web/app server + SFU/MCU
PSTN integration Native (phone numbers, E.164) Needs separate PSTN gateway or add-on

So SIP video fits best where you already use extensions and phone numbers. WebRTC fits best where people join meetings from browsers and mobile apps with one click.

Devices, use cases, and intercoms

For a SIP intercom project, SIP video has some clear advantages:

  • Door phones, indoor stations, and SIP video phones speak SIP natively.
  • IP PBX routing and features apply to both audio and video.
  • Existing SIP trunks and SBCs already know how to pass the call.

WebRTC is strong in:

  • Browser meetings without any install.
  • Rich collaboration features like chat, reactions, and whiteboards.
  • B2C support flows where customers join from a link.

In many deployments I see a hybrid. SIP handles intercom + PBX. WebRTC handles browser meetings. A gateway or media server sits in the middle if you need them to talk to each other.

Security, NAT, and QoS

With SIP, you decide:

  • If you use TLS and SRTP or plain SIP and RTP.
  • How you expose services through your SBC and firewall.
  • How you tag audio and video for QoS on the network.

Media in SIP systems typically rides the RTP (Real-time Transport Protocol) 4 and is protected with the Secure Real-time Transport Protocol (SRTP) 5 when you enable encryption.

With WebRTC, encrypted media is mandatory. It commonly uses DTLS-SRTP keying 6 plus ICE (Interactive Connectivity Establishment) 7 to traverse NAT and pick the best media path. This is good for security, but it keeps you inside that app’s world unless you add a SIP–WebRTC bridge.

So the short rule: use SIP video to integrate with PBXs, SIP door phones, and IP phones. Use WebRTC when you want browser-first meetings. Connect them with a gateway when you need both.

Can my SIP door phone stream to mobile apps?

The visitor presses the button at the entrance, but the receptionist is walking around with a phone. If that video cannot reach mobile, people stop using the intercom.

Yes. A SIP door phone can deliver video to mobile apps as a SIP video call to a softphone/UC client, or as an RTSP/HTTP stream to a CCTV or intercom app, as long as codecs, signaling, and NAT rules match.

Businessman using smartphone for remote video entry through stainless steel SIP intercom at glass door
Mobile video intercom access for office entrance

Main integration patterns you can use

In most deployments, I see three patterns:

Pattern How it works When to use it
SIP → mobile softphone Door phone calls SIP extension on the app Hosted PBX, UC app with SIP support
SIP → PBX → push to mobile client PBX sends push, app wakes and answers with audio + video Cloud PBX, UCaaS with own mobile client
RTSP/HTTP → viewer app Mobile app pulls video stream directly from intercom Guards, security staff, monitoring only

For two-way talk and video, you want the mobile app to:

  • Register as a SIP extension.
  • Support H.264 video and a shared audio codec (often G.711 or Opus).
  • Receive push notifications so calls ring even when the app sleeps.

For live viewing without answering, RTSP or ONVIF support on the door phone lets CCTV apps show the stream all the time.

NAT, security, and user experience

When you extend video to mobile, you must think about NAT and security:

  • Put the door phone behind your PBX or SBC.
  • Do not open the intercom directly to the internet with port forwards.
  • Use TLS/SRTP when clients support it, or at least protect SIP with an SBC and strong auth.
  • Limit who can trigger door open actions inside the app.

User experience also matters:

  • Use ring groups so both desk phones and mobiles ring together.
  • Set fallback to audio-only on weak mobile networks.
  • Keep door-open DTMF or API actions simple and clear.

When all those pieces are in place, your SIP door phone feels modern. Staff can answer from the front desk, from an indoor station, or from a mobile app in the parking lot.

What bandwidth do 1080p SIP calls require?

High-resolution video looks great in demos. Then one busy day your uplink saturates and suddenly even voice calls start to break.

A typical 1080p SIP video call with H.264 needs about 2–3 Mbps one-way in real use; for planning, it is safer to reserve 3–4 Mbps per direction for each active 1080p call.

Bar chart comparing four users’ video-call performance metrics with avatars above each bar
Team video communication quality statistics

Rough sizing numbers for your design

Real bitrates vary with scene complexity, codec profile, and frame rate. These are good planning values:

Resolution Frame rate Codec Expected bitrate (one-way) Planning budget (one-way)
720p 25–30 fps H.264 1.0–1.5 Mbps 1.5–2.0 Mbps
1080p 25–30 fps H.264 2.0–3.0 Mbps 3.0–4.0 Mbps
1080p 25–30 fps H.265 1.0–2.0 Mbps 2.0–3.0 Mbps

So a two-way 1080p call:

  • Uses around 4–6 Mbps in both directions combined.
  • Should have 6–8 Mbps of clean headroom to avoid quality drops.

If you run multiple concurrent calls or a video wall at the guard desk, you must also look at switch backplanes and uplinks, not just ISP bandwidth.

How network quality changes real results

Bandwidth is not the only factor. You also need:

  • Latency under ~150 ms round trip for natural talk.
  • Jitter under a few tens of milliseconds, with proper jitter buffers.
  • Packet loss under 1%; beyond that, video will freeze or drop frames.

Good QoS planning helps keep voice safe even when video is heavy:

Traffic Priority idea
SIP signaling High, low bandwidth
Voice RTP Highest priority, strict real-time queue
Video RTP High, but below voice
Data Normal or low

On LTE or small WAN links, it is often better to cap intercom or client apps at 720p and keep 1080p for local indoor stations and NVR recording. That way calls stay smooth and you still get good detail where it matters.

How do I enable video calling on IP PBX?

Many teams assume that installing video phones is enough. Later they find that the PBX or trunk silently strips video, so every call falls back to audio-only.

To enable video calling on an IP PBX, you must turn on video support globally, allow video codecs like H.264 on extensions and trunks, keep video on-net where trunks are audio-only, and make sure NAT, SRTP, and QoS all handle the larger media streams.

Network administrator monitoring SIP and video systems from workstation beside server rack
Operator supervising VoIP and video infrastructure in control room

Step 1: Check PBX video capabilities

First make sure your platform actually supports video:

  • Global toggle like “Enable video” or “Support video codecs”.
  • Codec list includes H.264 (and maybe H.265, VP8/VP9).
  • Licensing does not restrict video or conferencing to special plans.

Some old or very small PBXs are audio-only. In those cases, you can still use video peer-to-peer between endpoints, or connect the intercom’s video to an NVR/VMS and keep the PBX for audio and door control.

Step 2: Enable codecs on extensions and trunks

On each extension or user profile:

  • Enable video.
  • Allow H.264 and place it at the top of the video codec order.
  • Set a reasonable max resolution (often 720p or 1080p for indoor stations).

On each SIP trunk:

  • Confirm if the carrier supports video. Many PSTN trunks do not.
  • If the trunk is audio-only, leave video enabled between local extensions but expect audio-only when calls go out to the PSTN.
  • Avoid unnecessary transcoding. Align codecs on both sides whenever you can.

A small table helps as a rule of thumb:

Call type Recommended approach
Extension ↔ extension Audio: Opus/G.722, Video: H.264 (720p/1080p)
Extension ↔ door phone Audio: G.711/Opus, Video: H.264
Extension ↔ PSTN Audio only, G.711 toward trunk

Step 3: Fix NAT, SRTP, and QoS for video

Video is more sensitive to design mistakes than audio. Check:

  • SBC or PBX external IP is set correctly so endpoints know where to send media.
  • Firewall allows the full RTP/SRTP port range in and out to the PBX or SBC.
  • TLS and SRTP are enabled and certificates are trusted by phones.
  • QoS markings treat video as important but still lower than voice.

In many projects I see, once you fix the SBC and firewall rules, “mystery” one-way video and mobile issues disappear.

Step 4: Test real-world flows, not just lab pairs

Do test calls that match how the system will be used:

  • Desk video phone ↔ desk video phone.
  • Desk phone ↔ SIP door intercom.
  • Indoor station ↔ mobile softphone.
  • Local extension ↔ external PSTN caller.

During tests, capture:

  • SIP traces to confirm codec negotiation.
  • Switch and router stats to watch bandwidth and errors.
  • PBX CPU and memory to see if transcoding is hurting capacity.

If the PBX CPU climbs fast with video, reduce transcoding by aligning codec lists or lowering resolutions. When video can flow end-to-end without changes, the PBX only handles signaling, and the system scales much better.

Conclusion

Video calling works best when SIP, WebRTC, bandwidth, and PBX settings are planned together, so your intercoms, indoor stations, and mobile apps all see the same clear, low-latency picture.


Footnotes


  1. Understand H.264 profiles/levels so door stations, phones, and clients decode video reliably. ↩︎  

  2. Reference for SIP dialogs, REGISTER/INVITE behavior, and interoperability fundamentals. ↩︎  

  3. Browser-level WebRTC API and media behavior used by most web-based calling apps. ↩︎  

  4. Defines RTP packetization and timing used by most SIP audio/video media streams. ↩︎  

  5. Explains how SRTP encrypts and authenticates RTP media for secure video calling. ↩︎  

  6. Shows how WebRTC commonly derives SRTP keys using DTLS during call setup. ↩︎  

  7. Details ICE negotiation for NAT traversal and path selection in real-world video calls. ↩︎  

About The Author
Picture of DJSLink R&D Team
DJSLink R&D Team

DJSLink China's top SIP Audio And Video Communication Solutions manufacturer & factory .
Over the past 15 years, we have not only provided reliable, secure, clear, high-quality audio and video products and services, but we also take care of the delivery of your projects, ensuring your success in the local market and helping you to build a strong reputation.

Request A Quote Today!

Your email address will not be published. Required fields are marked *. We will contact you within 24 hours!
Kindly Send Us Your Project Details

We Will Quote for You Within 24 Hours .

OR
Recent Products
Get a Free Quote

DJSLink experts Will Quote for You Within 24 Hours .

OR