I've done some controlled spectral analysis of the RMNoise "Phone web client version 824e959" model. I wanted to quantify a subjective impression that the denoised audio sounds "dull" compared to the original — and the data confirms it. The same behavior is observed both through my ka9q-web SDR integration and through the official RMNoise web client, so this is a model characteristic, not an integration issue.
Test setup:
- RX888 MkII → ka9q-radio (radiod) → ka9q-web (HB9VQQ fork)
- 80m SSB signals, strong stations (S7-S8)
- Recordings taken back-to-back on same signal: RMNoise Off, 60% mix, 90% mix
- Compressor and all EQ OFF for all recordings
- Audio resampled to 8 kHz before sending to RMNoise (matching the documented spec)
- Confirmed same behavior on official RMNoise web client
- Analysis: Welch PSD (4096-point FFT, Hann window, 50% overlap)
- Multiple sample sets recorded to confirm repeatability
Results (averaged across multiple samples):
Code: Select all
Band | 60% mix | 90% mix
─────────────────────────┼────────────────┼──────────────
0-300 Hz (sub-voice) | ~0 dB | ~0 dB
300-1000 Hz (low voice) | -5 to -7 dB | -1 to -3 dB
1000-2000 Hz (mid voice) | -3 to -6 dB | -2 to -3 dB
2000-3000 Hz (presence) | -4 to -7 dB | -5 to -7 dB
3000+ Hz (brilliance) | -13 to -17 dB | -15 to -17 dB
I also tested whether my send-path LPF cutoff was contributing: changing from 3000 Hz to 2800 Hz (matching the documented bandwidth spec) actually made the presence loss worse (-6.9 dB vs -5.7 dB at 90% mix), confirming the attenuation is inherent to the AI model, not the input filtering.
Additional observations:
- The mix slider doesn't scale the effect linearly across all bands. At 90% the low-voice band (300-1000 Hz) is barely touched while presence is heavily cut. At 60% the lows are cut more but presence stays similar. The model's spectral shaping appears signal-dependent.
- The model appears to treat upper voice harmonics in the 2-3 kHz range as noise. This is where consonants (s, t, f, th) live and where the human ear is most sensitive to speech intelligibility.
- Sub-voice (0-300 Hz) is essentially untouched at any mix level.
Would it be possible to train a model variant that preserves more energy in the 2-3 kHz presence band? Even 2-3 dB less attenuation there would significantly improve perceived voice clarity for HF SSB use.
Spectral comparison plots attached.
73, Roland HB9VQQ
https://rx888.hb9vqq.ch:8081