Text messages and video calls feel like they should work the same way under the hood — both are just data moving between two people in real time. They actually rely on genuinely different technology, and the reason comes down to a tradeoff that's worth understanding: perfect delivery versus speed.
Before WebSockets existed, checking for new messages meant the browser repeatedly asking a server "is there anything new yet?" — a clunky, inefficient pattern called polling. WebSockets solved this by keeping a persistent, two-way connection open, so a server can push new data the moment it happens rather than waiting to be asked.
WebSockets run on top of TCP, which guarantees that data arrives complete and in the correct order. For text — a chat message, a notification — that's exactly the right guarantee: you don't want a message arriving scrambled or out of sequence.
That same guarantee becomes a real liability for live video. TCP uses something called head-of-line blocking: if one packet goes missing, every packet that arrived after it gets held back until the missing one is found and resent. For a single lost packet in a video stream, this can introduce a real, noticeable delay — and under genuinely poor network conditions, that delay can stretch much further as TCP's retry behavior backs off. The practical result is the freezing and stuttering people associate with bad video calls — not because the network was necessarily terrible, but because TCP's insistence on perfect order makes it handle loss badly for something time-sensitive like video.
WebRTC takes the opposite approach for media. It primarily uses UDP, which doesn't guarantee delivery or order at all — if a packet is lost, WebRTC simply moves on rather than stopping everything to wait for it. That sounds like it should make things worse, but for video specifically it's the better tradeoff: a single dropped frame shows up as a barely-noticeable blip, while waiting for it to be resent the way TCP would is what actually causes the freeze people notice and find frustrating.
It's worth being accurate about the actual numbers here rather than overstating them: WebRTC video calls commonly achieve glass-to-glass latency in roughly the 20 to 100 millisecond range under good conditions, though real-world conditions vary, and figures up to several hundred milliseconds or more aren't unusual depending on network quality. That's still meaningfully faster than typical TCP-based, server-relayed streaming approaches, which often run from several hundred milliseconds up to multiple seconds of delay — but "fast" here means "fast relative to the alternative," not a guaranteed, universal number under all conditions.
Modern video chat platforms don't pick one technology over the other — they use each for what it's actually good at. WebSockets (or a similar approach) typically handle signaling: the initial handshake that helps two people's browsers find each other, plus things like text chat messages, where perfect ordering matters more than shaving off milliseconds. WebRTC handles the actual video and audio stream, where speed and tolerance for occasional loss matter more than perfect delivery. We cover the signaling side of this in more depth in our guide on how P2P video calling actually works.
If WebRTC is the better fit for video, it's worth asking why some applications still route video through a server-based approach instead. The honest answer: peer-to-peer connections are considerably more complex to engineer well. Most devices sit behind a firewall or router that hides their real network address, so establishing a direct connection requires a process (ICE, using STUN and TURN) to find a workable path — and occasionally, when a strict firewall blocks every direct option, a relay server has to step in as a fallback. None of that complexity is visible to the person using the app; it's just genuinely more work to build correctly than routing everything through a central server the simpler way.
There's also a real infrastructure cost consideration worth knowing honestly, without overstating it as the single explanation for anything. Routing large volumes of high-definition video through central servers is genuinely expensive in bandwidth terms. Because WebRTC connects two devices directly most of the time, that bandwidth cost is avoided for the platform — though it's one factor among several in how a free service stays sustainable, not the entire explanation on its own.
Technically yes, but it tends to perform poorly for the reasons above — TCP's head-of-line blocking makes it prone to stuttering the moment network conditions aren't ideal.
Yes — WebRTC mandates encryption (DTLS and SRTP) by design. An unencrypted WebRTC media stream isn't possible under the standard.
A full freeze (rather than a brief blip) usually means the connection itself dropped — a network change, a lost signal — and WebRTC is attempting to find a new path, not that the UDP approach itself failed.
Neither protocol is "better" in the abstract — they're built for genuinely different priorities. TCP and WebSockets prioritize getting every piece of data exactly right, which is the correct priority for text. UDP and WebRTC prioritize keeping things moving, even at the cost of occasional small losses, which is the correct priority for something as time-sensitive as a live conversation. Using both together, for what each is actually good at, is what makes browser-based video chat work as smoothly as it does.
For more on how this connects to the bigger picture, see our guide on how WebRTC actually works and our practical connection troubleshooting guide.