Gaining access to high-quality data is a vital necessity in knowledge-based decision making. But data in its raw form often contains sensitive information about individuals. Providing solutions to this problem, the methods and tools of privacy-preserving data publishing enable the publication of useful information while protecting data privacy. Introduction to Privacy-Preserving Data Publishing: Concepts and Techniques presents state-of-the-art information sharing and data integration methods that take into account privacy and data mining requirements.
The first part of the book discusses the fundamentals of the field. In the second part, the authors present anonymization methods for preserving information utility for specific data mining tasks. The third part examines the privacy issues, privacy models, and anonymization methods for realistic and challenging data publishing scenarios. While the first three parts focus on anonymizing relational data, the last part studies the privacy threats, privacy models, and anonymization methods for complex data, including transaction, trajectory, social network, and textual data.
This book not only explores privacy and information utility issues but also efficiency and scalability challenges. In many chapters, the authors highlight efficient and scalable methods and provide an analytical discussion to compare the strengths and weaknesses of different solutions.
Benjamin C. M. Fung is an assistant professor in the Concordia Institute for Information Systems Engineering at Concordia University in Montreal, Quebec. Dr. Fung is also a research scientist and the treasurer of the National Cyber-Forensics and Training Alliance Canada (NCFTA Canada). Ke Wang is a professor in the School of Computing Science at Simon Fraser University in Burnaby, British Columbia. Ada Wai-Chee Fu is an associate professor in the Department of Computer Science and Engineering at the Chinese University of Hong Kong. Philip S. Yu is a professor in the Department of Computer Science and the Wexler Chair in Information and Technology at the University of Illinois at Chicago.
THE FUNDAMENTALS Introduction Data Collection and Data Publishing What Is Privacy-Preserving Data Publishing? Related Research Areas Attack Models and Privacy Models Record Linkage Model Attribute Linkage Model Table Linkage Model Probabilistic Model Modeling Adversary's Background Knowledge Anonymization Operations Generalization and Suppression Anatomization and Permutation Random Perturbation Information Metrics General Purpose Metrics Special Purpose Metrics Trade-Off Metrics Anonymization Algorithms Algorithms for the Record Linkage Model Algorithms for the Attribute Linkage Model Algorithms for the Table Linkage Model Algorithms for the Probabilistic Attack Attacks on Anonymous Data ANONYMIZATION FOR DATA MINING Anonymization for Classification Analysis Introduction Anonymization Problems for Red Cross BTS High-Dimensional Top-Down Specialization (HDTDS) Workload-Aware Mondrian Bottom-Up Generalization Genetic Algorithm Evaluation Methodology Summary and Lesson Learned Anonymization for Cluster Analysis Introduction Anonymization Framework for Cluster Analysis Dimensionality Reduction-Based Transformation Related Topics Summary EXTENDED DATA PUBLISHING SCENARIOS Multiple Views Publishing Introduction Checking Violations of k-Anonymity on Multiple Views Checking Violations with Marginals Multi-Relational k-Anonymity Multi-Level Perturbation Summary Anonymizing Sequential Releases with New Attributes Introduction Monotonicity of Privacy Anonymization Algorithm for Sequential Releases Extensions Summary Anonymizing Incrementally Updated Data Records Introduction Continuous Data Publishing Dynamic Data Republishing HD-Composition Summary Collaborative Anonymization for Vertically Partitioned Data Introduction Privacy-Preserving Data Mashup Cryptographic Approach Summary and Lesson Learned Collaborative Anonymization for Horizontally Partitioned Data Introduction Privacy Model Overview of the Solution Discussion ANONYMIZING COMPLEX DATA Anonymizing Transaction Data Introduction Cohesion Approach Band Matrix Method km-Anonymization Transactional k-Anonymity Anonymizing Query Logs Summary Anonymizing Trajectory Data Introduction LKC-Privacy (k, ??)-Anonymity MOB k-Anonymity Other Spatio-Temporal Anonymization Methods Summary Anonymizing Social Networks Introduction General Privacy-Preserving Strategies Anonymization Methods for Social Networks Data Sets Summary Sanitizing Textual Data Introduction ERASE Health Information DE-identification (HIDE) Summary Other Privacy-Preserving Techniques and Future Trends Interactive Query Model Privacy Threats Caused by Data Mining Results Privacy-Preserving Distributed Data Mining Future Directions References