Before Baby Monitor Timmy transmits audio and video, two devices need to find each other and trust each other. This pairing step is the most critical moment in the whole flow. Here I explain how Timmy pairs, which cryptography sits behind it, and why a nearby attacker cannot take over the connection unnoticed.
The problem: How does my device know who it's talking to?
When two devices connect for the first time, the central question is: is Device A really talking to Device B, or is someone sitting in between? In cryptography, that is called a man-in-the-middle attack (MITM).
Timmy solves this through an Elliptic Curve Diffie-Hellman (ECDH) key exchange over Firebase, combined with visual verification by the user.
The following diagram shows the complete pairing flow at a glance:
sequenceDiagram
autonumber
participant A as ๐ฑ Device A
participant F as โ๏ธ Firebase
participant B as ๐ฑ Device B
Note over A,B: Phase 1 โ Discovery
A->>A: Generate ECDH key pair (P-256)
B->>B: Generate ECDH key pair (P-256)
alt Auto-Pairing (Nearby BLE)
A-->>B: BLE broadcast: SBM:XKQM
B-->>A: BLE broadcast: SBM:R7NP
Note over A,B: Lower code wins โ determines creator/joiner
else Manual Pairing
A->>A: Display 4-char code
Note right of A: User reads code
B->>B: User enters code
end
Note over A,B: Phase 2 โ ECDH Key Exchange
A->>F: Write public key (PubA) to meeting doc
B->>F: Write public key (PubB) to meeting doc
F-->>B: Read PubA
F-->>A: Read PubB
Note over A,B: Phase 3 โ Shared Secret
A->>A: sharedSecret = ECDH(privA, PubB)
B->>B: sharedSecret = ECDH(privB, PubA)
Note over A,B: Both compute identical 32-byte secret
A->>A: SAS = SHA-256("sas:" + sort(PubA,PubB) + secret) โ 2-digit number
B->>B: SAS = SHA-256("sas:" + sort(PubA,PubB) + secret) โ 2-digit number
Note over A,B: Phase 4 โ Visual Verification
A->>A: Display SAS: 42
B->>B: Display SAS: 42
Note over A,B: ๐ค User compares numbers on both screens
A->>A: User confirms โ
B->>B: User confirms โ
Note over A,B: Phase 5 โ Key Derivation
A->>A: pairingKey = SHA-256("pair:" + secret)
B->>B: pairingKey = SHA-256("pair:" + secret)
A->>A: docKey = SHA-256("doc:" + pairingKey)
A->>A: encKey = SHA-256("enc:" + pairingKey)
Note over A,B: โ
Paired โ all future signaling encrypted with AES-256-GCM
Complete pairing protocol sequence โ editable source: docs/diagrams/pairing-sequence.mmd
Step 1: Each device generates a key pair
When opening the pairing screen, each device generates an ephemeral ECDH key pair on the P-256 curve (secp256r1):
- A private key โ stays exclusively on the device
- A public key โ exchanged over Firebase
The keys are created using a cryptographically secure random number generator (Random.secure())
and are valid only for this single pairing attempt. Fresh keys are generated
for every new attempt.
Step 2: Exchange public keys over Firebase
To let two devices find each other, Timmy uses a 4-character code as a meeting point. This code can be discovered automatically via Nearby Connections (Bluetooth Low Energy) or entered manually. It has no cryptographic value; it only makes both devices find the same Firebase Firestore document.
Once both devices know the code, each writes its public ECDH key to a shared Firestore document. Then each device reads the other device's public key from that document.
Crucially: only the public key is sent. The private key never leaves the device. Anyone watching Firebase traffic sees public keys, but cannot compute the shared secret from them. That relies on the difficulty of the Elliptic Curve Discrete Logarithm Problem (ECDLP).
Step 3: Computing the shared secret
Once both devices have discovered each other's public key, they independently compute the same shared secret:
sharedSecret = ECDH(myPrivateKey, remotePublicKey)
โ 32 bytes (identical on both devices)
The mathematics of elliptic curves guarantees that both computations yield the same result, even though each device only knows its own private key and the other's public key.
Step 4: The verification number (SAS)
From the shared secret, a Short Authentication String (SAS) is derived โ a two-digit number displayed on both devices:
hash = SHA-256("sas:" + sort(pubkeyA, pubkeyB) + sharedSecret)
number = (hash[0] ร 256 + hash[1]) mod 100 โ 00 to 99
Both devices display the same number โ for example, 42. The user visually compares whether the numbers on both screens match, then confirms on each device individually.
Why an attacker cannot forge this
A man-in-the-middle would need to intercept the key exchange in Firebase. Specifically, they would need to:
- Replace the real public keys stored in the Firestore document with their own
- Establish separate shared secrets with each device
sequenceDiagram
autonumber
participant A as ๐ฑ Device A
participant M as ๐ต๏ธ Attacker (MITM)
participant B as ๐ฑ Device B
Note over A,B: Attacker intercepts the Firebase key exchange
A->>A: Generate key pair (privA, PubA)
B->>B: Generate key pair (privB, PubB)
M->>M: Generate TWO key pairs (privM1, PubM1) + (privM2, PubM2)
A->>M: Write PubA to Firebase
M->>M: Replace PubA with PubM1
M->>B: B reads PubM1 (thinks it is PubA)
B->>M: Write PubB to Firebase
M->>M: Replace PubB with PubM2
M->>A: A reads PubM2 (thinks it is PubB)
Note over A,B: Each device computes a DIFFERENT shared secret
A->>A: secret_A = ECDH(privA, PubM2)
M->>M: secret_A = ECDH(privM2, PubA)
M->>M: secret_B = ECDH(privM1, PubB)
B->>B: secret_B = ECDH(privB, PubM1)
Note over A,M: secret_A โ secret_B
A->>A: SAS_A = SHA-256("sas:" + sort(PubA,PubM2) + secret_A) โ 73
B->>B: SAS_B = SHA-256("sas:" + sort(PubM1,PubB) + secret_B) โ 18
rect rgb(255, 230, 230)
Note over A,B: โ User sees DIFFERENT numbers!
A->>A: Display: 73
B->>B: Display: 18
Note over A,B: ๐ค User notices mismatch โ cancels pairing
end
Note over A,B: ๐ก๏ธ Attack detected โ MITM cannot force SAS match (P = 1/100)
Man-in-the-middle detection via SAS mismatch โ editable source: docs/diagrams/mitm-detection.mmd
In this case, the attacker computes a shared secret S_A with Device A and a
different shared secret S_B with Device B. Since S_A โ S_B,
the devices compute different verification numbers.
The attacker cannot make the numbers match because:
- They don't know the devices' private keys
- SHA-256 is not reversible
- The probability of a random match is only 1 in 100
The user sees different numbers on the screens and cancels pairing. At that point, the attack has become visible.
Step 5: Completing the pairing
Only after the user has confirmed verification on both devices does pairing complete:
- A 64-character pairing key (256 bits) is derived from the shared secret:
SHA-256("pair:" + sharedSecret) โ pairingKey - The document key is derived as
SHA-256("doc:" + pairingKey)and serves as the Firestore document key - The encryption key is derived as
SHA-256("enc:" + pairingKey)and provides the AES-256-GCM key for encrypted signaling - Both devices store the same pairing key and navigate to mode selection
From this point on, all further connection attempts (Firestore signaling, WebRTC setup) are encrypted with the shared AES-256-GCM key. The pairing key is never sent to the backend; only its SHA-256 hash is used as the document identifier.
System architecture
The following diagram shows the components involved in pairing and communication:
flowchart TB
BABY["๐ฑ Baby Phone
Baby Mode"]
PARENT["๐ฑ Parent Phone
Parent Mode"]
BABY <==>|"๐ WebRTC Peer-to-Peer ยท DTLS-SRTP
Audio ยท Video ยท DataChannel"| PARENT
BABY -.-|"๐ต Bluetooth LE ยท Nearby
Auto-Discovery"| PARENT
subgraph FIREBASE["โ๏ธ Firebase (Google Cloud)"]
direction LR
AUTH["๐ชช Anonymous
Authentication"]
FS["๐ Firestore
Pairing + Signaling"]
CF["โก Cloud Functions
getTurnCredentials"]
end
BABY <-->|"๐ AES-256-GCM encrypted
SDP ยท ICE ยท ECDH keys"| FS
FS <-->|"๐ AES-256-GCM encrypted
SDP ยท ICE ยท ECDH keys"| PARENT
BABY -.->|Token| AUTH
PARENT -.->|Token| AUTH
STUN["๐ก STUN server
stun.cloudflare.com:3478"]
TURN["๐ TURN relay
local or Cloudflare"]
BABY & PARENT -->|Short-lived credentials| CF
CF -->|local first, Cloudflare fallback| TURN
BABY & PARENT -.->|NAT Traversal| STUN
BABY -.->|"Relay Fallback"| TURN
TURN -.->|"Relay Fallback"| PARENT
style BABY fill:#FBF6F0,stroke:#B5734A,stroke-width:2px
style PARENT fill:#FBF6F0,stroke:#B5734A,stroke-width:2px
style FIREBASE fill:#fff5f5,stroke:#E9B44C,stroke-width:2px
style AUTH fill:#E9B44C,stroke:#2B2D42
style FS fill:#E9B44C,stroke:#2B2D42
style CF fill:#E9B44C,stroke:#2B2D42
style STUN fill:#F6E3D2,stroke:#B5734A
style TURN fill:#7BC47F,stroke:#2B2D42
System architecture overview โ editable source: docs/diagrams/pairing-architecture.mmd
Communication paths in detail:
- WebRTC peer-to-peer (thick line): Audio, video and DataChannel flow directly between devices โ encrypted with DTLS-SRTP. No server sees this data.
- Firebase Firestore (solid line): Pairing data (ECDH keys) and signaling (SDP/ICE) go through Firestore โ end-to-end encrypted with AES-256-GCM. Firebase cannot decrypt the data.
- STUN server: Both devices discover their public IP address so a direct peer-to-peer connection can be established.
- TURN relay: If a direct connection is not possible (e.g., on mobile data), the selected local or Cloudflare TURN server relays the encrypted media. Short-lived credentials (24h) are fetched via Firebase Cloud Functions.
- Bluetooth LE (dotted line): Nearby Connections discovers nearby devices automatically โ only the meeting code is transmitted, no key material.
Fallback: Manual code entry
If Bluetooth is unavailable (e.g., on older devices), the 4-character code can also be typed in manually. Manual entry uses the same ECDH key exchange and the same SAS verification as automatic pairing. The only difference is that the code is read and typed by the user instead of discovered through BLE.
Because the ECDH key exchange happens over Firebase in both cases, security is identical. The 4-character code is only a meeting point; the real encryption is based on the 256-bit key derived from ECDH.
Summary
| Security mechanism | Protects against |
|---|---|
| ECDH key exchange (P-256) | Eavesdropping on key exchange traffic |
| Ephemeral key pairs | Forward secrecy โ past pairings remain safe |
| Visual verification number (SAS) | Man-in-the-middle (MITM) during key exchange |
| SHA-256 hash as document key | Code extraction from Firestore |
| AES-256-GCM encryption | Eavesdropping on signaling data |
| Dual-side confirmation | One-sided pairing without user knowledge |
| DTLS-SRTP (WebRTC) | Eavesdropping on audio/video |
These layers fit together: ECDH protects the key exchange, the verification number protects against MITM, AES-256-GCM protects signaling, and WebRTC protects media. An attacker would have to break this chain in several places without devices or parents noticing.