World Bank and International Household Network (IHSN) programs have provided considerable guidance to countries on the creation of enabling legislation and dissemination policies for microdata dissemination. This has addressed part of the problem. Producers often list lack of capacity and knowledge of Statistical Disclosure Control (SDC) methods for privacy control as a significant barrier to greater release of microdata.
To help address this problem Statistics Austria, the Vienna University of Technology, the International Household Survey Network (IHSN) , PARIS21 (OECD), and the World Bank have contributed to the development of an open source software package for SDC, called sdcMicro. sdcMicro is maintained by a Statistics Austria Team.
This software application is:
- Open source;
- Implements a large collection of algorithms discussed and developed in academic literature (including methods for global recoding, local suppression, post-randomization, noise addition, micro-aggregation and shuffling);
- Optimized for large datasets;
- Provided with a user-friendly GUI (this part developed with financial support from the World Bank and Google).
A Graphic User Interface (GUI)
The application of many anonymization methods is complex and requires knowledge of the methods and access to suitable tools for implementation. For users comfortable with using R, the package sdcMicro provides a tool for the application of a comprehensive suite of methods commonly used and described in literature on disclosure control. Users not familiar with R, but who have an immediate need for tools to anonymize data, would benefit from a friendly Graphic User Interface (GUI) for the sdcMicro package. To provide a GUI environment for the non-R user the World Bank Microdata Library Team facilitated and funded the developed a Shiny application called sdcApp, which is included in the sdcMicro package. Users of the GUI can implement the most widely used anonymization methods present in the sdcMicro package without requiring in depth knowledge of R. In addition to the anonymization methods implemented in the sdcMicro package, the GUI offers a comprehensive set of risk and utility measures. This includes functions to measure, visualize and compare risk and utility throughout the anonymization process. The GUI also helps agencies by preparing reports on the process suitable for internal and external audiences. To guarantee reproducibility, the underlying code can also be saved. For users of other statistical packages, the GUI supports importing and exporting microdata in several formats (STATA, SAS, SPSS, R). Like R, sdcMicro is open source and available in the CRAN Repositories and on GitHub. The availability of a GUI for applying common anonymization methods has the potential to lower barriers to a greater number of users both in agencies with lower capacity and in more advanced agencies seeking to use the power of sdcMicro without investing in learning R.