- Updated: June 30, 2026
- 2 min read
Towards Reliable Audio Deepfake Attribution and Model Recognition: A Multi-Level Autoencoder-Based Framework
Reliable Audio Deepfake Attribution with the LAVA Framework
The rapid rise of audio deepfakes threatens the integrity of digital communication. While detection techniques have matured, pinpointing the source model behind a fabricated audio clip remains a critical, under‑explored challenge. In this article we present LAVA (Layered Architecture for Voice Attribution), a hierarchical, attention‑enhanced autoencoder framework that not only detects deepfakes but also attributes them to their generating technology and specific model instance.

LAVA leverages a convolutional autoencoder trained exclusively on synthetic audio to extract robust latent representations. Two specialized classifiers operate on these features:
- Audio Deepfake Attribution (ADA) – identifies the generation technology (e.g., GAN, diffusion, TTS).
- Audio Deepfake Model Recognition (ADMR) – recognizes the exact model instance.
To ensure reliability under open‑set conditions, confidence‑based rejection thresholds are incorporated. Extensive experiments on ASVspoof2021, FakeOrReal, and CodecFake demonstrate state‑of‑the‑art performance: ADA achieves F1‑scores > 95 % across all datasets, while ADMR reaches a macro F1 of 96.31 % across six model classes. Additional tests on unseen attacks (ASVspoof2019 LA) confirm LAVA’s robustness.
All code and pretrained models are publicly released, enabling the research community to build upon this work. Explore the repository at github.com/adipiz99/lava-framework.
For more insights on audio security and deepfake mitigation, visit the Ubos Tech Blog and explore related articles.