Calculating the Length of an Audio File

When working with large collections of audios, one interesting dimension of the dataset is the length of the audios in seconds. Here we explore ways of calculating the audio lengths using python. We start of by creating a audio file.

Analyse Audio Using Sox Subprocess

Simplest way of analysing an audio is to use an existing program, like soxi, as a subprocess.

WAVE header

If we are working with an wave file, we could decode its header and use information stored there. Other audio formats like flac also provides the necessary values in its header.

The audio file is encoded into a wave container. The wave container has a 44 byte header:

|bytes|value            |description                        |
|1-4  |"RIFF"           |Marks the file as a riff file      |
|5-8  |file_size        |overall size of the file - 8 bytes |
|9-12 |"WAVE"           |Marks the file as WAVE             |
|13-16|"fmt "           |Format chunk marker                |
|17-20|fmt_length       |Length of format data              |
|21-22|type             |Type of format (1=PCM)             |
|23-24|channels         |Number of channels                 |
|25-28|sample_rate      |Number of samples per second       |
|29-32|bytes_per_second |Number of bytes per second         |
|33-34|bytes_per_sample |Number of bytes per sample         |
|35-36|bits_per_sample  |Sample bit size                    |
|37-40|"data"           |Start of data section              |
|40-44|data_length      |Size of data section               |

Call the libsox library from python

An third attractive method is to use the libsox library from python. This provides a nice middle ground between implementation details, code maintenance and speed. In order to use the libsox, we need to describe the data structures used in the library to python. We do this with ctypes.

How fast are these methods?