So if you’ve been an Australian on the internet in the past month, you will know there is a furore in process about changes to the way identifying information (names and birth dates) will be used and stored after the 2016 census.
This post is an attempt to explain what’s going on in laymen’s terms, based on my understanding as a researcher who uses this kind of linked data from the Australian federal government. This post reflects my understanding; as far I as I know it is free of errors, but please let me know if you believe I’m mistaken about any of the facts. It’s also not exhaustive; I’m focusing on the parts I feel strongly about as a public health researcher.
What the ABS appears to be proposing is that as part of completing the census this year is people will be expected to give their names, dates of birth, genders, and addresses. This data will be stored for up to four years, an increase on the previous time period, and will be used to create a linkage key to allow the ABS to do additional data analysis with other datasets (pdf link, see page 7 ).
What’s a linkage key?
A linkage key is an ID number that allows the linking together of multiple datasets, via identifying information. For example, one of the data sources that the ABS has mentioned is health data, i.e. the Medicare database. Medicare knows your name, your date of birth, and your current address. Once the ABS has this information on your census record, computer folk can use the identifying information to match the two records together.
This is done by assigning an ID number (the linkage key) in one dataset, and then matching the same number to the name, date of birth and address in the second dataset, in order to subsequently link the two datasets together. So Jennifer Smith, born 5th April 1979, living on Canterbury Rd in Malvern, gets ID number 300002 assigned to her in her Medicare record, and then this number gets matched to the census record with the same name, date of birth, and address. Then Jennifer’s name, date of birth and address can be deleted from both datasets, and the datasets can be linked together just using her ID number. This process is repeated for everybody. Eventually a researcher or a statistician ends up with a dataset containing everybody’s Medicare data, everybody’s Census data, and everybody’s ID numbers, but nobody’s name, address, or date of birth. The names, dates of birth, and addresses can then be discarded or destroyed. (This is what I assume the ABS is referring to when they say they will never share identified information about you – at the point it’s shared, your name has been removed from it).
Why do this kind of research?
There is a lot of information about people that is relevant to their health, but isn’t routinely recorded in their medical records. A key example in Australia is whether somebody identifies as Aboriginal and Torres Strait Islander, which is normally not recorded in their Medicare data, but is recorded in the census. Linking Medicare and census therefore allows detailed research into issues affecting the health of Aboriginal and Torres Strait Islander people, at the population level. This kind of research is immensely valuable, and whole-of-population linked datasets are the holy grail in terms of data sources. This is just one example; you can do data linkage research on anything if the information you need is contained in two or more separate databases; the ABS has also mentioned employment and unemployment as a topic of interest.
So what’s the problem?
The problem, at least for me as a researcher, is that people aren’t being given the option to withhold consent and opt out of this research. I don’t think people are morally obligated to allow their personal data be used to research if they don’t want it to be. I certainly don’t think they should be legally obligated to provide their names and dates of birth so that this kind of research can take place, if they would rather complete the census anonymously. The primary purpose of the census is to count and describe the Australian population, and names and dates of birth are not necessary for that; the data linkage research opportunity is a bonus, and I don’t believe it should be compulsory for people to participate in that.
Other people seem to object to the data being collected at all, regardless of what it’s used for later, and to my mind that’s potentially reasonable as well. But I’ll leave the debate about that for data security people, since it’s not really my area.
Don’t they already know all this stuff anyway?
My first reaction when I saw people starting to get upset about the census was, “Oh, wow, people have no idea how much of their personal data is already stored and can potentially be linked”. The government does indeed already have a great deal of data about us; they have our tax records, our Medicare and prescription medication data, our drivers licenses and traffic infraction histories, our criminal records, and on and on. But different government departments hold each of these data sets, and they’re not routinely allowed to link your data together without your consent.
Research involving data linkage has been going on in health for quite a long time, but unconsented linkage projects are very rare, and require a very compelling case for public benefit before they can proceed. Whether they should ever be approved at all is an important ethical question, and in my opinion, one which deserves a public debate. We’re starting to have that debate now. It is strange that the census is what prompted it, given that our Medicare or Centrelink data might be much more sensitive? Perhaps, but that doesn’t mean the debate isn’t worth having.
People are entitled to know how their data are collected, stored, and used. They’re entitled to have opinions about that, and to have strong feelings about which data they’re happy to share and what they prefer to withhold. If the research community want the public to support data linkage research, then we have to convince people that our work is sufficiently valuable that they should donate their data to assist us. I don’t think we’re entitled to people’s medical records and employment histories just because we want them; I think it should be up to individuals whether to participate in the kind of research we conduct. And if people don’t want to participate, I think they should be able to opt out.