To clarify: our goal was to map the nodes in the training dataset to the real identities in the social network that was used to create the data. […]
We were able to deanonymize about 80% of the nodes, including the vast majority of the high-degree nodes (both in- and out-degree.) We’re not sure what the overall error rate is, but for the high-degree nodes it is essentially zero.
We can go back to my questions: who will decide, regulate, and guarantee the level of privacy for data sets traded on the big data market?
Original title and link: Big Data Marketplaces and Data Privacy (NoSQL databases © myNoSQL)