Dirt mover

3/23/2023

Similarly the l in l-diversity counts something.Īs a sidenote, the variable names here follow the old FORTRAN convention that variables i through n represent integers by default, and that all other variables represent real (floating point) numbers by default.

Someone used the variable k to count the number of redundant database tables, and the name stuck. This is unfortunate because it isn’t descriptive and locks in a variable naming convention. It’s common in math to use a variable name as an adjective, as with k-anonymity, l-diversity, and t-closeness. But in general “quasi-identifier” is not a precisely defined concept. For example, someone’s birthday would be a quasi-identifier. A quasi-identifier is something that helps narrow down who a person is, but isn’t an identifier. A phone number, for example, is an identifier. An identifier is something that clearly identifies an individual. What exactly is a quasi-identifier? That’s hard to say precisely. To transform X into Z we need to move the same amount of mass, but we need to move it 4x further, and so the EMD is 3. To transform X into Y, we need to move a probability mass of 0.75 from 1 to 2, and so the EMD is 0.75. We can calculate the EMD for the example above. By some measures, X is equally far from Y and Z, but the earth mover distance would say that X is closer to Y than to Z, which is more appropriate in our setting. Distributions Y and Z are the same except they have 80% of their probability mass at 2 and at 5 respectively. Suppose distribution X has probability 0.8 at 1 and 0.05 for the rest of the responses.

Suppose your data is some ordered response, 1 through 5. The key attribute of EMD is that it takes distance into account. The idea of EMD is to imagine both probability distributions as piles of dirt and calculate the minimum amount of work needed to reshape the first pile so that it has the same shape as the second. Instead, t-closeness uses the so-called earth mover distance (EMD), also known as the Wasserstein metric. A common choice is the Kullback-Leiler divergence, though that’s not what we’ll use here. There are a lot of ways to measure the similarity of two probability distributions. When we say that the distribution on sensitive data within a group is not far from the distribution in the full data, how do we quantify what “far” means? That is, how do we measure the distance between two distributions? If the sensitive data in a group doesn’t stand out, this thwarts the homogeneity attack and the background knowledge attack. The “ t” comes from requiring that the distributions be no more than a distance t apart in a sense that we’ll define below. The idea of t-closeness is that the distribution of sensitive data in every group is not too far from the distribution in the full population. This post won’t go into l-diversity because it’s an intermediate step to where we want to go, which is t-closeness. One way to address this shortcoming is l-diversity. That is, the method is subject to a background knowledge attack. Or going the other way around, if you know already know something that stands about a group, this could help you identify the record belonging to an individual. That is, the method is subject to a homogeneity attack. A database could be k-anonymous but reveal information about a group if that group is homogeneous with respect to some field. Even when k is large, k-anonymity might prevent re-identification but still suffer from attribute disclosure.Īnother problem with k-anonymity is that it doesn’t offer group privacy. If k = 1, then k-anonymity offers no anonymity. But as you get more fields, it becomes more likely that a combination of fields is unique. If you have a lot of records and few fields, your value of k could be high. The idea of k-anonymity is that every database record appears at least k times when you restrict your attention to quasi-identifiers. An analogous principle in privacy is that a record preserves privacy if it’s like a lot of other records. There’s an old saying that if you want to hide a tree, put it in a forest.

0 Comments

Dirt mover

Leave a Reply.

Author

Archives

Categories