Airplay 2 is annoying me
So I have an iPhone. And I have a "HomePod Mini". Which should function like the bluetooth speaker it appears to be. But it doesn't, because it doesn't use bluetooth - it uses "AirPlay" (2). Which is mDNS+RTCP+RTSP+RTP over BT/wifi.
While I'm sure people subscribed to Apple Music find it perfectly acceptable, I am not, and just want it to play music from FinAmp on my phone (which is in turn playing music from my DragonflyBSD server running Jellyfin).
Now, with that said, my use case is supported. It works. Occasionally. But more-often-than-not, it will play for a bit, and then stop. At which point I need to "re-airplay" (I don't know if that's the right term) to continue. This is really annoying, because bluetooth speakers are less than half the cost, and less than half the work.
So what's going wrong? I don't know. But it made me look into how AirPlay (2) works.
How my phone finds the homepod
Bonjour! Or Avahi. Or Zeroconf. There's a million different names for it - I briefly covered the topic of multicast DNS for service discovery in my older post on trying to play music through an Apple TV (the box, not the service).
Effectively, my iPhone makes a DNS request for _airplay._tcp. It makes that request to 224.0.0.251 or ff02::fb - multicast address space. This happens over UDP, and on port 5353 instead of the usual 53. The query and response are slightly different than with typical unicast lookups - but not so much you wouldn't recognise them.
In the response my phone receieves from the homepod, it finds all sorts of information. OS version, serial number, model, and 'features'. Features indicate what the homepod can actually do - for example, it can play audio, but it can't do video (no screen).
Sending audio to the homepod
My phone can see the homepod, and has everything it needs. I imagine there's some authentication involved, but that wasn't the issue I was having, so I didn't look too closely into it. So how do we get Kapustin's "Nearly Waltz" from my phone to the homepod?
As it turns out, we use the Real-Time Streaming Protocol. A sort of quasi-HTTP with different verbs. Naturally Apple have mangled things a bit to suit their highly-specific needs.
SETUP requests are used to figure out things like the audio format of the stream, compression details, encryption information, how big the packets are, and so on. My music is stored as ALAC, which should be formally supported (param ct==2). In the course of this request, the homepod tells my phone what port to use for RTP - that being where to send the music.
There are other verbs available that are used for various purposes. SET_PARAMETER is apparently used for volume control. RECORD plays a part in streaming. TEARDOWN is (counter-intuitively) used for both pausing and disconnecting.
So what's happening?
I don't know. I'll try to find out, mind you. My best guess is that the music is stopping either because the buffered audio I'm streaming is running dry. My phone can tell the music has stopped on the homepod - so presumably I should be seeing a TEARDOWN from the homepod to the phone.
I suspect what I'll have to do is a bit of packet capture and wire-sharking. Try to align what was actually sent with a timeline RTCP messages to paint a picture. It could be that re-streaming audio is hitting an odd edge-case? The phone tries to send the entire file to the homepod so it can buffer the track, but doesn't have the entire file because it's streaming, so the "entire track" is whatever FinAmp had fetched at the point it started air-playing. Perhaps?