Logo williamharvey.com
Back to BPM Genie

BPM Analyzer Accuracy

A comparative study was done to validate the accuracy and evaluate the architectural differences between several different BPM (Beats Per Minute) detection implementations. There are dozens of other proprietary and open-source algorithms in use today that were not tested.

1. Algorithms Tested

The Gold Standard: Queen Mary (QM) Algorithms

Some modern high-accuracy analyzers rely on the Queen Mary University of London beat-tracking logic. It moves beyond simple volume spikes to analyze frequency-domain data.

The Legacy Algorithms

The Permissive Alternative

2. Technical Methodology

To ensure a fair comparison, the analyzers were tested against a diverse library of over 7,000 MP3 files. The focus was on how each engine processes the signal:

Method Technical Approach Strengths Weaknesses
Peak Detection Time-domain amplitude analysis Low CPU overhead Fails on tracks without "sharp" beats
STFT Analysis Frequency-domain (Fourier) analysis High precision; genre-agnostic Computationally expensive
Spectral Flux Measuring rate of energy change Great balance of speed/accuracy Slight "jitter" in tempo drift

3. Key Findings & Observations

The most interesting outcome of this study was a test playlist

Some algorithms work well in most situations, and other algorithms work better in specific examples. One objective of this study was to confirm that an algorithm was implemented correctly by comparing the results to a reference implementation. Seeking that confirmation led to the observation that when the results are accurate, they are “right” together and they are “wrong” together. Therefore, the algorithm is implemented correctly when it is predictable instead of when it is correct. After you have that information, and you look at the results from a test population of thousands of songs, you can focus on future testing with the songs that gave incorrect estimates.

Many songs have different tempos at different phases of the song

These tests were done with an assumption of static analysis of a song with pre-processing, not real-time signal analysis over the duration of a playback stream.

Most of the implementations analyze a chunk of time (~60 seconds) and categorize the song with that information. This makes sense if you are using BPM to transition, since the tempo during the intro is most important if that is what you are cross fading into. Analyzing other chunks of time yielded different BPM estimations.

The "Decoding Jitter" Phenomenon

During the evaluation of the C# Queen Mary (Q) implementation, the results were statistically close—but not identical—to the C++ Mixxx (MIX-Q) implementation. This discrepancy is likely attributed to the decoding pipeline: Mixxx uses libmad, while the test app used FFmpeg. Minor variations in PCM reconstruction can lead to slight shifts in energy frame calculations.

Implementation Warnings

Using the same algorithm on different computers produces slightly different results

The results were not substantially different, but slight variations were identified. Using the same software, on the same computer, with the same media file, would yield almost identical results over multiple test runs. However, keeping all other variables the same and testing on a different computer, BPM estimates could occasionally vary by a few points up or down.

Always test with a diverse collection of music

Initial test groups were dozens or hundreds of tracks of music in similar genres. This led to a non-representative accuracy estimate. To get an understanding of accuracy, the test must be across thousands of different music files with different tempos and audio characteristics.

4. The Data

This is the anonymized test data with results comparisons. “Interesting tracks” have title-artist exposed so they can be included in that “test playlist” for future examination and improvement in accuracy.

...