Skip to content

PII

Personally Identifiable Information (PII) is any data that can be used to identify a specific individual. This includes direct identifiers (name, address, social security number) and indirect identifiers (date of birth, employer, job title) that could identify someone when combined.

PII encompasses a wide range of data types with varying sensitivity levels. Direct identifiers include full names, AHV/social security numbers, passport numbers, postal addresses, phone numbers, and email addresses. Any single one can identify an individual. Indirect identifiers include dates of birth, gender, nationality, employer, job title, and geolocation data. Individually these may seem harmless, but in combination they enable re-identification (the so-called mosaic effect).

Under GDPR (Art. 4(1)) and FADP (Art. 5(a)), personal data (the legal equivalent of PII) receives legal protection requiring organizations to implement appropriate technical and organizational safeguards. The FADP specifically identifies genetic data, biometric data, and data on administrative or criminal proceedings as sensitive personal data (Art. 5(c)), demanding heightened protection. Document anonymization (the removal or replacement of PII) is a key technique for enabling document sharing, legal research, and publication while protecting individuals.

PII detection in unstructured documents (contracts, court decisions, correspondence) is significantly harder than in structured databases. Names may appear in different formats, addresses span multiple lines, and contextual identifiers ("the tenant at Bahnhofstrasse 42") require semantic understanding rather than pattern matching.

DocIQ Shield uses a fine-tuned NER + LLM pipeline optimized for Swiss legal documents to detect PII. Detected entities are presented with confidence scores and categorized by type (name, address, date, AHV number, legal entity). Smart preservation rules distinguish between public figures (judges, government officials, corporate officers in public filings) and private individuals requiring protection. Shield processes documents with zero data persistence, meaning the PII detection occurs entirely in memory with no storage of the original content.

Related Terms

DocIQ Products