- Updated: June 10, 2026
- 7 min read
RULER: Representation-Level Verification of Machine Unlearning
Direct Answer
The paper introduces RULER (Representation‑Level Verification of Machine Unlearning), a suite of metrics that assess whether a model has truly erased the influence of specific training records from its internal representations, not just from its outputs. This matters because existing verification methods can be fooled: a model may appear to “forget” at the prediction level while still retaining hidden traces of the removed data, posing privacy and compliance risks.
Background: Why This Problem Is Hard
Machine unlearning promises to let organizations delete user data from deployed models without the cost of full retraining. In practice, three output‑level checks dominate the field:
- Membership inference attacks – test whether an adversary can tell if a record was in the training set.
- Retained accuracy – ensure the model’s performance on the remaining data does not degrade.
- Forget‑set accuracy – verify that the model’s predictions on the “forgotten” records are no better than random.
These checks are necessary but not sufficient. A model’s hidden layers can still encode subtle statistical fingerprints of the removed records. Because most modern models (deep CNNs, transformers, large language models) rely on high‑dimensional embeddings, a small amount of residual information can survive even when the final logits appear clean. Detecting such latent remnants is difficult for three reasons:
- Black‑box access: Practitioners often only have API‑level access to a model, limiting inspection to inputs and outputs.
- Scale of representations: Intermediate activations can contain millions of parameters; exhaustive comparison is computationally prohibitive.
- Lack of ground truth: Without a “gold‑standard” model trained from scratch without the forgotten data, it is hard to know what a fully erased representation should look like.
Consequently, organizations risk complying with “right‑to‑be‑forgotten” regulations while still leaking private information through hidden layers—a risk that becomes acute in high‑stakes domains such as healthcare, finance, and facial‑recognition security.
What the Researchers Propose
RULER tackles the verification gap by introducing two complementary metrics that operate directly on model representations:
- M2 (oracle‑comparative): Measures the distance between the representation of a forget‑set record in the unlearned model and the representation of the same record in an “oracle” model that was retrained from scratch without that record. If the two embeddings align, the model has effectively erased the record.
- M4 (oracle‑free): Detects anomalous similarity patterns among forget‑set records using only the unlearned model’s internal similarity matrix. It flags clusters that remain unusually tight, indicating residual memorisation without needing a retrained baseline.
Both metrics are designed to be MECE (mutually exclusive, collectively exhaustive): M2 provides a gold‑standard comparison when resources allow full retraining, while M4 offers a lightweight, pre‑unlearning diagnostic that can be run on any deployed model.
Key components of the RULER framework include:
- Representation extractor: A hook into the target layer(s) of the model (e.g., the penultimate embedding layer) that outputs a fixed‑size vector for each input.
- Oracle trainer (optional): A separate training pipeline that builds the reference model without the forget‑set data.
- Similarity analyzer: Computes pairwise cosine similarities (or other distance measures) among the extracted vectors.
- Statistical tester: Applies linear mixed‑effects modeling to determine whether observed similarity patterns differ significantly from a null distribution.
The researchers also provide a practical workflow for integrating RULER into existing ML pipelines, allowing teams to flag unlearning failures before they become compliance liabilities.
How It Works in Practice
Implementing RULER follows a three‑stage pipeline that can be inserted into any model‑as‑a‑service (MaaS) deployment:
Stage 1 – Data Partitioning
- Keep‑set: Records that remain in the training distribution.
- Forget‑set: Records that must be removed per user request or regulation.
Both sets are passed through the same preprocessing pipeline to ensure comparable embeddings.
Stage 2 – Representation Extraction
The unlearned model (after applying any unlearning algorithm such as gradient scrubbing, Fisher‑based pruning, or the “Bad Teacher” approach) processes each record. A hook captures the activation vector from a designated layer, producing two matrices:
R_unlearned_keep– embeddings for the keep‑set.R_unlearned_forget– embeddings for the forget‑set.
Stage 3 – Metric Computation
- M2 (if oracle available):
- Train the oracle model on
keep‑setonly. - Extract
R_oracle_forgetfor the same forget‑set inputs. - Compute the average Euclidean (or cosine) distance between
R_unlearned_forgetandR_oracle_forget. Small distances indicate successful erasure.
- Train the oracle model on
- M4 (oracle‑free):
- Calculate the pairwise similarity matrix
S_unlearned_forgetforR_unlearned_forget. - Fit a linear mixed‑effects model that predicts similarity as a function of record identity, controlling for batch effects.
- Significant positive intercepts or low‑variance clusters reveal lingering memorisation.
- Calculate the pairwise similarity matrix
The output is a concise report: a numeric M2 score, an M4 p‑value, and a visual heatmap of similarity clusters. Teams can set thresholds (e.g., M2 > 0.8 similarity, M4 p < 0.05) to automate acceptance or trigger further sanitisation.
Evaluation & Results
The authors evaluated RULER across five unlearning strategies and four data domains, totaling twelve experimental conditions. The key dimensions were:
| Domain | Model Type | Unlearning Method | Forget Fraction |
|---|---|---|---|
| Tabular (UCI Adult) | Feed‑forward NN | Gradient Scrubbing | 10 % / 30 % / 50 % |
| Image (CIFAR‑10) | ResNet‑18 | Fisher Pruning | 10 % / 30 % / 50 % |
| Clinical Text (MIMIC‑III) | Transformer encoder | Knowledge Distillation | 10 % / 30 % / 50 % |
| Face Identity (VGGFace2) | ArcFace | Bad Teacher | 10 % / 30 % / 50 % |
All methods passed the traditional output‑level suite (membership inference, retained accuracy, forget‑set accuracy). However, RULER’s M2 metric flagged significant residuals in 10 of the 12 conditions (p < 0.05). The effect size grew proportionally with the forget fraction, confirming that larger deletions leave more detectable traces.
Even the “Bad Teacher” approach—designed to overwrite forgotten knowledge with adversarial examples—showed persistent clusters in the similarity heatmaps, a finding that M4 highlighted without any oracle model. In the face‑recognition scenario, M4 identified identity‑level memorisation that none of the tested unlearning techniques fully removed, underscoring a privacy risk for biometric systems.
These results demonstrate that output‑only verification can be misleading. RULER provides a more stringent, representation‑aware lens that reveals hidden leakage, especially as the proportion of data to be forgotten increases.
Why This Matters for AI Systems and Agents
For practitioners building AI‑driven products, RULER offers three concrete benefits:
- Regulatory compliance assurance: GDPR, CCPA, and emerging AI‑specific statutes require demonstrable erasure. RULER’s metrics give auditors quantifiable evidence that a model’s internal state no longer contains the requested data.
- Risk mitigation for downstream agents: Autonomous agents that query a shared model (e.g., recommendation bots, conversational assistants) inherit any latent privacy leaks. By verifying representation cleanliness, teams can prevent inadvertent exposure through agent‑to‑agent communication.
- Operational efficiency: The oracle‑free M4 metric can be run as a pre‑unlearning diagnostic, allowing data‑engineers to decide whether a cheap unlearning method suffices or whether a full retraining pass is warranted.
Integrating RULER into a model‑orchestration platform such as UBOS’s unified AI platform enables automated compliance checks as part of the CI/CD pipeline for ML models. Teams can set policy thresholds, generate compliance reports, and trigger alerts when residual memorisation is detected.
What Comes Next
While RULER marks a significant step forward, several open challenges remain:
- Scalability to billion‑parameter models: Computing pairwise similarities for massive embedding spaces can be prohibitive. Approximate nearest‑neighbor techniques or sketching algorithms may be needed.
- Cross‑modal verification: Current experiments focus on a single representation layer. Future work could explore multi‑layer or cross‑modal consistency checks for multimodal models.
- Adversarial unlearning: An attacker could deliberately manipulate the unlearning process to hide residuals from RULER. Robustness against such attacks is an open research direction.
- Standardisation of thresholds: Industry‑wide benchmarks for what constitutes “sufficient” M2 or M4 scores are still missing. Collaborative efforts could produce certification suites.
Potential applications extend beyond privacy. For example, UBOS’s unlearning toolkit could leverage RULER to prune outdated bias‑inducing data from fairness‑critical models, ensuring that corrective updates truly remove harmful representations.
In the longer term, representation‑level verification may become a core component of AI governance frameworks, complementing model cards, data sheets, and audit logs. By shining a light on the hidden layers where most of a model’s “knowledge” resides, RULER helps bridge the gap between theoretical privacy guarantees and practical, provable erasure.
For a deeper dive into the methodology and experimental details, see the original RULER paper on arXiv.