Session Initiation Protocol – Dual-tone multi-frequency (DTMF)

Session Initiation Protocol support devices and applications to exchange user input DMTF information end-to-end over IP networks.

There are 3 ways to support DTMF information communication:

  • DTMF audio tones:

DTMF digit waveforms are encoded inline with voice packets. This method only works with uncompressed audio codecs like G.711. Compressed audio codecs like G.729 and G.723 are incompatible with DTMF audio. DTMF audio is also referred to as in-band tones.

  • Out-of-band signaling events:

SIP INFO messages with Content-Type: application/dtmf-relay define out-of-band signaling events for transmitting DTMF information. SIP INFO messages separate DTMF digits from the voice stream and send them in their own signaling message.

  • RTP named telephony events (NTE):

RFC 2833 telephone-events are a standard that describes how to transport DTMF tones in RTP packets according to section 3 of RFC 2833.

With the previous definitions let’s get some captures.

DTMF Audio tones

RTP messages exchanged using DTMF audio tones, Wireshark capture shows audio “waves” something like this:

For SIP-INFO (Out-of-band signaling events) and RFC2833 (RTP named telephony events) captures, Linphone will be used (DTMF options can be set under the Network tab as shown in the image below):

Out-of-band signaling events

The first test DTMF method is setting to SIP INFO in both Linphone softphones (using our well-known lab environment), calling from External to Internal. 

When checking for VoIP calls and then Flow Sequence, this is what we can detect for the digits pressed (pressed all digits):

When checking the information inside the INFO sip message packets there are 2 important settings:

  • Content-Type set to application/dtmf-relay
  • In the message Body, Signal equals to the digit pressed (9 in the image below)

RTP named telephony events

Now setting DTMF to RFC 2833, this is the analysis of the capture (checking for VoIP calls and then Flow Sequence, this is what we can detect when all digits were pressed):

Finally, this is what we can find in the RTP packets associated with the pressed digits:

  • RTP packets are different from the ones that have payload (marked as RTP Event)
  • Multiple RTP Events are generated
  • Digit pressed is encoded in Event ID parameter (in the example below, digit 2 was pressed)

This concludes this entry.