How We Fixed AI Key Detection (And Why Other Tools Still Get It Wrong)
Every commercial AI key detection tool confuses A Minor with G Major. We tested this on Sabrina Carpenter's "Espresso" — a textbook A Minor song. Most tools said G Major. So did ours. Then we fixed it.
If you've ever uploaded a song to an online key analyzer and gotten back a key that felt slightly off — usually the song's relative major or some other close-related key — you've experienced this problem firsthand. It's not a bug, it's a known weakness in the industry-standard algorithm. And almost nobody talks about it.
Here's why it happens, and what we did about it.
/ 01The Industry Standard: Krumhansl-Schmuckler
Most key detection tools — including the ones powering Spotify's "Song Stats," DJ software like Mixed In Key, and analysis libraries like librosa — use the Krumhansl-Schmuckler key-finding algorithm, developed in 1990 by music cognition researchers.
The math is elegant: build a chromagram (a 12-bin histogram of which pitches are most prominent in the song), then correlate that histogram against 24 reference profiles — one for each major and minor key. The reference profiles are based on perceptual studies of how listeners rate the "fittingness" of each scale degree.
It works great. Mostly.
/ 02Why It Fails on Relative Keys
Here's the dirty secret: A Minor and C Major share every single note. They use the same seven pitches — A, B, C, D, E, F, G — just with a different "home base" (tonic).
The Krumhansl profiles are different for each key (minor profiles weight the minor third heavily, major profiles weight the major third), but when your song moves through enough chords, the chromagram averages out to something close to "all the notes equally." Then both A Minor and C Major score nearly the same, and the algorithm coin-flips.
This is well-known in academic literature. Krumhansl herself reported about 60-75% accuracy on tonal music. Most modern systems claim higher because they layer additional heuristics — but the underlying confusion still happens often.
/ 03The Espresso Case
We tested with "Espresso" by Sabrina Carpenter. The actual key is A Minor. The chord progression is roughly Am9 → Em7 → Dm9 repeating throughout, with occasional G chords in the pre-chorus.
Our V5 chord detector (powered by Chordino) returned the chords correctly. But our key detector — pure Krumhansl — confidently reported G Major.
Why? Because the chord progression sits in the key signature of G Major (no sharps or flats), and the chromagram showed heavy emphasis on G, A, B, D, E — which fits G Major's reference profile slightly better than A Minor's. The algorithm picked G.
/ 04The Fix: Don't Trust the Notes — Listen to the Music
Krumhansl looks at the chromagram. We look at the music.
The chromagram tells you which pitches are loudest. That's not music — that's a histogram. Music is structure. It's where chords resolve, where progressions land, which chord the song actually centers on. A human listener doesn't count semitones; they listen for home.
So after Chordino returns the chord sequence, our detector evaluates every one of the 24 keys against the structural evidence of the song itself. We weigh multiple musical signals — the ones a music theory student would actually use to identify a key by ear. Each signal contributes, none of them dominates. The scoring is tuned against a reference set of songs whose keys we know cold.
If a candidate key clearly beats the Krumhansl pick, we switch. If it's close, we trust the chromagram. That threshold matters — it protects songs where Krumhansl was right and prevents the scorer from second-guessing itself.
The result for Espresso: the correct answer (A Minor) wins by a wide margin. No coin flip.
/ 05Why This Took A While
The hard part wasn't writing a scorer. The hard part was tuning it so it didn't over-correct.
Our first version was too aggressive — it would override Krumhansl on any margin, which broke a bunch of songs where the algorithm was actually right. Our second version was too cautious — it almost never switched. Tuning the threshold (and the weights behind each signal) took weeks of testing against a curated reference set: every song's correct key validated by hand against sheet music and music-theory students.
The reference set is the moat. Anyone can write a chord-aware scorer in an afternoon. Building one that's robust across pop, hip-hop, country, EDM, jazz, and acoustic — that's months of corner cases. Songs that modulate. Songs with deceptive cadences. Songs where the most-played chord isn't the tonic. Songs in modes (Dorian, Mixolydian) where neither pure major nor pure minor profiles fit cleanly.
Most "key detection" tools you see online are wrapping librosa's built-in function — pure Krumhansl, no post-processing. They get 60-75% accuracy and move on. We obsessed over the 25-40% they're getting wrong.
/ 06Bonus: Smarter Chord Names
Once we know the key, we go back through the chord progression and clean it up. Chordino sometimes confuses extended chords — Am9 and G6 share three of their four notes, so the detector occasionally picks the wrong root.
Our post-detection pass uses the key as context. If a detected chord doesn't fit the song's key but a closely-related chord does, we evaluate the swap. This catches the most common misidentifications — the Am9-as-G6 issue, the Cmaj7-as-Am9 issue, the Fmaj7-as-Dm9 issue — and leaves correctly-detected chords untouched. Borrowed chords (a non-diatonic chord with no in-key alternative) stay as-is, because they're often intentional and musically meaningful.
Result: fewer wrong chord labels in your chord chart. We still tag chord detection as BETA in the UI because no system is perfect — but ours is meaningfully better than the alternatives, and getting better every week.
/ 07How To Try It
Open the GoatWave console, click the Tools tab, then Key Analyzer. Drop any audio file. You'll get:
- Key — including the mode (Major or Minor)
- BPM
- Duration
- Chord Progression — with roman numeral analysis and timestamps. Chord detection is still labeled BETA because extended chord disambiguation isn't perfect, but the rescorer fixes most common cases.
The same algorithm powers our FretLab chord study tool. Both use the same v5 detector under the hood.
Try the Key Analyzer
Free. Browser-based. No signup. Drop any song and get the key, BPM, and chords in seconds.
Open the Console/ 08What's Next
The improved detector is live in production now. No neural net, no inference latency, no training data required — just better music theory applied at the right moment in the pipeline.
For chord extensions (Am9 / Cmaj7 / Dm11 type confusion), we're working on a few approaches we won't detail here — but the underlying philosophy is the same. Use more information, applied smarter. Most failures in audio AI aren't a lack of compute; they're a lack of musical context.
If you've tried other key-detection tools and gotten frustrated with the results — drop your test songs in our Key Analyzer. We'd love to hear when we get it wrong. That feedback shapes the next round.
And if you're a competitor reading this trying to figure out how we did it: good luck. The signals matter less than the tuning. The tuning matters less than the reference set. The reference set takes time and ears, not algorithms.