From RTBF to Delta Tables: A Modern Approach to PII in Data Engineering

Rakesh Gupta
2 min readJan 27, 2025

--

Handling PII (Personally Identifiable Information) is a critical responsibility in data engineering, especially to comply with GDPR and other regulatory requirements. One key aspect is the Right to Be Forgotten (RTBF), which ensures users’ data can be securely erased when needed.

Over the years, I’ve explored various approaches to storing and managing PII data — from SQL Server and file formats (an approach I’ve since outgrown) to Delta tables. My experience with Delta tables, particularly for managing RTBF, has been exceptional.

If PII isn’t required for downstream use cases, it’s better not to ingest it at all. This minimizes risk and reduces compliance burdens. For analytics or data science models, teams can use consistent PII hashing, ensuring no direct link back to the original data. This not only aligns with best practices but can save organizations significant costs from regulatory penalties.

For those using dltHub with Microsoft Fabric and a Lakehouse architecture, separating PII and non-PII data into Delta tables is key. This approach — depicted by the solution diagram and sample code snippet shared in the comment — streamlines data governance and makes managing access much easier. Teams can focus on non-PII data while ensuring sensitive information remains secure.

--

--

Rakesh Gupta
Rakesh Gupta

Written by Rakesh Gupta

Founder and IT Consultant, SketchMyView (www.sketchmyview.com). Reach me here: linkedin.com/in/grakeshk

No responses yet