Jump to content

Barnardisation

From Wikipedia, the free encyclopedia

Barnardisation is a method of statistical disclosure control for tables of counts. It involves adding +1, 0 or -1 to some or all of the internal non-zero cells in a table in a pseudo-random fashion. The probability of adjustment for each internal cell is calculated as p/2 (add 1), 1-p (leave as is), p/2 (subtract 1). The table totals are then calculated as the sum of the post-adjustment internal counts.[1][2]

Etymology

[edit]

The technique of Barnardisation appears to have been named after Professor George Alfred Barnard (1915–2002), a Professor of Mathematics at the University of Essex. Barnard, at that time President of the Royal Statistical Society, was one of three Fellows appointed by the Council of the Royal Statistical Society to help provide a government-commissioned review of data security for the 1971 UK Census.[3] The resulting report questioned whether rounding small numbers to the nearest five was the best approach to preserving respondent confidentiality.[3]: para 3.3.8  The formal government response to the report noted that an additional safeguard of small random adjustments had been introduced for 1971 Census, the suggestion for which they explicitly attributed to Professor Barnard,[3]: para 4.20 and footnote  as did a New Scientist article dated July 1973.[4] Muddying the waters slightly, a 1973 paper in the Journal of the Royal Statistical Society discussing this new safeguard reported that "after much discussion, a variant of a procedure suggested in Canada was adopted.".[5]: p.520  Presumably Professor Barnard was involved in these discussions, and was the inventor of the variant. In any case, no evidence can be found of any such safeguard being applied in Canada, with Statistics Canada seeming to stick instead to the use of random rounding of all counts to the nearest 0 or 5.[6]: p.13 

Despite originating from Prof Barnard, in documentation surrounding the 1971 Census the method of adjustment now known as Barnardisation was simply described as a 'procedure';[5] an 'adjustment of values';[7] a 'special procedure';[1] a 'process of random error injection';[8] or a 'modification' or 'adjustment'.[9][10]

The earliest use of the term 'Barnardisation' found in print so far dates to an Office for Population Censuses and Surveys working paper written by Hakim in 1979, where the term is mentioned without citation, and without ascribing it to Prof G A Barnard.[11] But, at the time, Hakim's coinage of this term appears to have been either widely overlooked or widely ignored, at least in print, as demonstrated by the wide range of later publications already cited above.

The term 'Barnardisation' does not appear to have reemerged in print until the 1995 publication of Stan Openshaw's Census Users' Handbook,[12] where it is used by two separate chapter authors and by the index compiler. However, by at least the late 1980s the term was already in widespread conversational usage during UK academic conferences and meetings.[13] More recently the term 'Barnardisation' has also become firmly ensconced in the lexicon of official reports produced by official UK statistical agencies and others.[2][14]

Operational details

[edit]

As originally conceived and implemented in the 1971 UK Census, Barnardisation had the added characteristic of pairing tables from separate areas, and applying equal and opposite adjustments to the two areas. For example, if a given table cell in Area A had its value increased by 1, then in paired Area B the equivalent table cell would have its value reduced by 1 (subject to not making the value negative). The purpose of this pairing was to cancel out, as much as possible, the amount of noise introduced via the Barnardisation process at a more aggregate level.[1]

For the 1991 UK Census the pairing of areas prior to the application of Barnardisation was dropped; and for the more detailed Local Base Statistics, its scope was extended to include adjustments of -2, -1, 0, +1 or +2, achieved by applying the +1, 0 or +1 adjustment twice.[10]

In the United Kingdom, barnardisation became increasingly employed by public agencies in order to enable them to provide information for statistical purposes without infringing the information privacy rights of the individuals to whom the information relates (e.g.[2][15]). In some cases this has involved further modifications to the Barndardisation procedure. For example, as implemented by the Common Service Agency, adjustments of -1, 0 or +1 were only applied to counts of 1 to 4, whilst counts of 0, instead of being left unchanged, were adjusted by the addition of 0 or +1.[15]: para 16 

Pros and cons

[edit]

A review of Statistical Disclosure Control methods in the run up to the 2011 UK Census [14] identified the following list of pros/cons of Barnardisation from the point-of view of the data provider:

Advantages

  • Easy to understand
  • Easy to implement
  • Table totals are consistent with internal cell values
  • The adjustment is unbiased

Disadvantages

  • Leads to inconsistent values for the same cell counts and table totals if they are present in two or more separately barnardised tables
  • The adjustment can be unpicked via differencing if other tables are available that share the same counts or totals, or that provide an unadjusted total for a larger spatial area within which the barnardised tables nest
  • The probability of adjustment used is typically small, meaning that many cell values are left unadjusted

From a user point-of-view, another advantage of Barnardisation is that it has been shown to have a smaller impact on typical user analyses than the following Statistical Disclose Control measures: random rounding to base 5; as used by Statistics Canada; random rounding to base 3, as used by Statistics New Zealand; and Small Cell Adjustment, as used at various points in time by the Office for National Statistics and the Australian Bureau of Statistics.[16]

Efficacy reappraised

[edit]

Since the late 1990s concerns over the efficacy of Barnardisation in protecting confidentiality have increased to the point where it is now no longer recommended as a 'go to' tool, but rather as a technique only to be used in special circumstances. This change in attitudes appears to centre around the relatively high probability that Barnardisation will leave a small count (in particular a 1) unadjusted [2][15] and, secondarily, to the dangers of reverse engineering the original value if sufficient overlapping barnardised tables are released.[14] For these and other reasons UK Censuses from 2001 onwards have abandoned the use of Barnardisation. See Spicer for a good review of the 2001, 2011 and 2021 alternatives to Barnardisation that have been adopted, and the rationale for this,.[17]

The question of whether barnardisation may fall short of the complete anonymisation of data, and the status of barnardised data under the complex provisions of the Data Protection Act 1998, were considered by the Scottish Information Commissioner. Some aspects of an initial decision by the Commissioner were overturned on appeal to the House of Lords, and the Commissioner was invited to revisit his original decision. The Commissioner's final decision ruled that barnardisation provided insufficient disclosure protection for rare events (in this case, Childhood Leukaemia), reversing in part his original decision: "the barnardised data, by themselves, can lead to identification, and [...] the effect of barnardisation on the actual figures, at least as deployed by the CSA, does not have the effect of concealing or disguising the data which he [the Commissioner] had originally considered that it would."[15]: para 20  However, in his written decision the Commissioner offered no statistical justification for this assertion. Instead the Commissioner's decision centred mainly around addressing points of law relating to the nature of the original and barnardised data, and how this related to legal definitions of (sensitive) personal data.

References

[edit]
  1. ^ a b c Newman, Dennis (1978). Techniques for ensuring the confidentiality of census information in Great Britain (Occasional Paper 4 ed.). Census Division, OPCS.
  2. ^ a b c d ONS (2006). Review of the dissemination of health statistics: confidentiality guidance (PDF) (Working Paper 3: Risk Management ed.). Office for National Statistics.
  3. ^ a b c Moore, P G (1973). "'Security of the Census of Population". Journal of the Royal Statistical Society. Series A (General). 136 (4): 583–596. doi:10.2307/2344751. JSTOR 2344751.
  4. ^ New Scientist (1973). "Census data not so secret". New Scientist (19th July): 142.
  5. ^ a b Jones, H. J. M.; Lawson, H. B.; Newman, D. (1973). "Population census: recent British developments in methodology". Royal Statistical Society. Series A (General). 136 (4): 505–538. doi:10.2307/2344749. JSTOR 2344749. S2CID 133740484. Retrieved 16 May 2022.
  6. ^ Statistics Canada (1974). 1971 Census of Canada : population : vol. I - part 1 (PDF) (Introduction to volume I (part 1) ed.). Ottawa: Statistics Canada. Retrieved 16 May 2022.
  7. ^ Rhind, D W (1975). Geographical analysis and mapping of the 1971 UK Census data, Working Paper 3. Dept of Geography, University of Durham: Census Research Unit.
  8. ^ Hakim, Catherine (1978). Census confidentiality, microdata and census analysis (Occasional Paper 3 ed.). Census Division, OPCS.
  9. ^ J. C. Dewdney (1983). "Censuses past and present". In Rhind, D W (ed.). A Census User's Handbook. London: Methuen. pp. 1–16.
  10. ^ a b Marsh (1993). "Privacy, confidentiality and anonymity in the 1991 Census". In Dale, A; Marsh, C (eds.). The 1991 Census User's Guide. London: HMSO. pp. 129–154. ISBN 0-11-691527-7.
  11. ^ Hakim, Catherine (1979). "Census confidentiality in Britain". In Bulmer, M (ed.). Censuses, Surveys and Privacy. London: Palgrave. pp. 132–157. doi:10.1007/978-1-349-16184-3_10. ISBN 978-0-333-26223-8.
  12. ^ Openshaw, Stan (1995). Census Users' Handbok. Cambridge: Pearson. ISBN 1-899761-06-3.
  13. ^ Williamson, Paul (2022). "Personal communication". Dept. of Geography and Planning, University of Liverpool.
  14. ^ a b c SDC UKCDMAC Subgroup. "Statistical Disclosure Control (SDC) methods short-listed for 2011 UK Census tabular outputs, Paper 1" (PDF). Office for National Statistics. Retrieved 16 May 2022.
  15. ^ a b c d Scottish Information Commissioner (2010). "Decision 021/2005 Mr Michael Collie and the Common Services Agency for the Scottish Health ServiceChildhood leukaemia statistics in Dumfries and Galloway" (PDF). Retrieved 16 May 2022.
  16. ^ Willliamson, Paul (2007). "The impact of cell adjustment on the analysis of aggregate census data". Environment and Planning A. 39 (5): 1058–1078. doi:10.1068/a38142. S2CID 154653446.
  17. ^ Spicer, K. EAP125 on Statistical disclosure control (SDC) for Census 2021. Titchfield: Office for National Statistics. Retrieved 16 May 2022.[date missing]