Start Early! Determine the data privacy protection strategy during the
planning phase of a deployment, preferably before moving any data into
Hadoop. This will prevent the possibility of damaging compliance exposure
for the company and avoid unpredictability in the roll out schedule.
Identify what data elements are defined as sensitive within your
organization. Consider company privacy policies, pertinent industry
regulations and governmental regulations.
Discover whether sensitive data is embedded in the environment, assembled
or will be assembled in Hadoop.
Determine the compliance exposure risk based on the information
Determine whether business analytic needs require access to real data or
if desensitized data can be used. Then, choose the right remediation
technique (masking or encryption). If in doubt, remember that masking
provides the most secure remediation while encryption provides the most
flexibility, should future needs evolve.
Ensure the data protection solutions under consideration support both
masking and encryption remediation techniques, especially if the goal is to
keep both masked and unmasked versions of sensitive data in separate Hadoop
Ensure the data protection technology used implements consistent masking
across all data files (Joe becomes Dave in all files) to preserve the
accuracy of data analysis across every data aggregation dimensions.
Determine whether a tailored protection for specific data sets is
required and consider dividing Hadoop directories into smaller groups where
security can be managed as a unit.
Ensure the selected encryption solution interoperates with the company’s
access control technology and that both allow users with different
credentials to have the appropriate, selective access to data in the Hadoop
Ensure that when encryption is required, the proper technology (Java,
Pig, etc.) is deployed to allow for seamless decryption and ensure expedited
access to data.
Wait… where’s point 11, buy Dataguise?
Original title and link: Dataguise Presents 10 Best Practices for Securing Sensitive Data in Hadoop ( ©myNoSQL)
Dataguise says the latest version of its data-protection product enables
users to encrypt sensitive data right down to specific fields within an open
source Apache Hadoop database.
DG for Hadoop 4.3 also makes use of the traditional Dataguise “masking”
capability across single or multiple Hadoop clusters to camouflage sensitive
$25.000 a piece (hopefully not a piece of encrypted data though).
✚ Apache Accumulo is known to offer a BigTable inspired open source implementation with cell-based access control.
Original title and link: Field-Level Encryption for Apache Hadoop From Dataguise ( ©myNoSQL)