University of Maryland

Data Literacy & Evidence Building

University Of Maryland | Coleridge Initiative




Background: The legal framework

“Prominent demographers are asking the U.S. Census Bureau to abandon a controversial method for protecting survey and census participants’ confidentiality, saying it is jeopardizing the usability of numbers that are the foundation of the nation’s data infrastructure.”

https://apnews.com/article/census-2020-us-bureau-government-and-politics-20e683c71eeb62ee4b7792d7d8530419

Goals

  1. Characterize the historical and current legal framework for data access
  2. Learn about the risk-utility framework for data access
  3. Describe the Five Safes approach to balancing the risk-utility tradeoff
  4. Understand how synthetic data are constructed – and the pros and cons of their use

Historical context

Agencies have had a clear mission to produce data and evidence – or, at least, enumeration – is enshrined in the first section of the US constitution.  In the 20th Century, Congress set up a broad, workable framework..READ MORE

Risk-utility framework

Much policy work involves using data on human beings or organizations and there are legal restrictions on the access to and use of such data.  In the case of education to workforce transitions, data are generated about three entities. READ MORE

Current legal framework

Agencies have had a clear mission to produce data and evidence – or, at least, enumeration – is enshrined in the first section of the US constitution.  In the 20th Century, Congress set up a broad, workable framework of statistical agencies that largely remains in place today to address specific, practical issues (1). The primary responsibilities of these agencies are to:

  1. Produce and disseminate relevant and timely information that has value to the public
  2. Conduct credible, accurate, and objective statistical activities.
  3. Protect the trust of information providers by minimizing disclosure risk. READ MORE

The five safes framework

One concrete approach is the Five Safes approach [12].  The Five Safes approach was developed in the early 2000s and has been broadly adopted by various countries and academic institutions, including the U.K’s Office of National Statistics , Stats New Zealand, Australian Bureau of Statistics and Eurostat. This framework provides a standardized approach to mitigating disclosure risk. The Five Safes framework groups all aspects of data along five dimensions: READ MORE

How synthetic data fits in

Previous classes have been taught inside a secure environment(8, 9), which has the advantage of training agency staff directly on actual data.   However, it introduces additional burdens in terms of both the legal requirements, particularly the complexity of non=disclosure agreements and screen sharing restrictions.  The advantage of tiered access is that it is now possible to have the best of both worlds.  Initial work can be done using synthetic data, and subsequent work on data that will be used in practice can be done in the secure environment. READ MORE

Exercise

READ MORE

Readings

References

  1. Norwood JL. Organizing to count: Change in the federal statistical system: The Urban Insitute; 1995.
  2. United Nations Economic Commission for Europe, editor Statistical confidentiality and access to microdata. Proceedings of the seminar session of the 2003 Conference of European Statisticians Statistics,Sweden, Stockholm; 2003.
  3. National Academies of Sciences E, Medicine. Federal statistics, multiple data sources, and privacy protection: next steps. 2018.
  4. Duncan GT, Fienberg SE, Krishnan R, Padman R, Roehrig SF. Disclosure limitation methods and information loss for tabular data. In: Doyle P, Lane J, Theeuwes J, Zayatz L, editors. Confidentiality, disclosure and data access: theory and practical applications for statistical agencies: North-Holland; 2001. p. 135-66.
  5. Lane JI. Optimizing Access to Micro Data. Journal of Official Statistics. 2007;23:299-317.
  6. Advisory Committee on Data for Evidence Building. Advisory Committee on Data for Evidence Building: Year 2 Report. Washington DC. 2022.
  7. Foster I, Ghani R, Jarmin RS, Kreuter F, Lane J. Big data and social science: data science methods and tools for research and practice: CRC Press; 2020.
  8. Kuehn D. Better Data for Better Policy: The Coleridge Initiative in Ohio. https://www.urban.org/sites/default/files/2022-05/Better%20Data%20for%20Better%20Policy%20-%20the%20Coleridge%20Initiative%20in%20Ohio.pdf. 2022.
  9. Kuehn D. Better Data for Better Policy: Lessons Learned from Across the Coleridge Initiative’s Partnerships. https://www.urban.org/research/publication/better-data-better-policy. 2022.
  10. Burman L. Privacy & Confidentiality Technologies. ACDEB Presentation2021.
  11. Taub J, Elliot M. The synthetic data challenge. Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, The Hague, The Netherlands. 2019.
  12. Giles O, Hosseini K, Mingas G, Strickson O, Bowler L, Smith CR, et al. Faking feature importance: A cautionary tale on the use of differentially-private synthetic data. arXiv preprint arXiv:220301363. 2022.
  13. Drechsler J, Haensch A-C. 30 Years of Synthetic Data. arXiv preprint arXiv:230402107. 2023.
  14. Advisory Committee on Data for Evidence Building. Year 2 Report, Supplementary Materials. In: Office of Management and Budget, editor. 2022.