Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Encrypt data while in-transit for data migration

Encrypt data while in-transit for data migration

New Contributor

Hi Team,

I am trying to implement a data migration strategy to migrate data from Teradata to Redshift using NiFi.

Here is the configuration I am using for NiFi cluster:

  • I am using a 3 node cluster for the same
  • To read data from Teradata, I am using a combination of ExecuteScript(python script for creating select query)  - ExecuteSQL processor (executing the SQL query created in ExecuteScript processor)
  • Post that I am converting these table files into individual text files using AvroReader as the Record Reader and CSVRecordSetWriter as a Record Writer in the ConvertRecord Processor.
  • This is followed by PutS3object where I insert table files into an S3 bucket. I am using the IAM role using the default credentials for AWSCredentialProviderControllerService
  • Finally, I load data into Redshift using copy command in the ExecuteSQL command
  • While the overall data migration works perfectly while my data is encrypted at rest with the S3 bucket encrypted via a KMS key. However, my concern is encrypting the data while it is in transit
  • I looked for different options in NiFi but nothing helped me.

 

Approaches tried:

1) Encryption before PutS3Object :

Here we are using the EncyptContent processor (encrypt strategy using Nifi Legacy KDF, MD5_128AES algorithm) and decrypted after PutS3Object (using the same details used as earlier
except 'Decrypt' as the strategy. It fails at Redshift when I use the copy command to insert data from S3 to redshift as it is trying to read the encrypted file from S3. It throws me an error as 'Invalid operation'.
When I run the same on AWS Redshift, it does execute but no records are inserted into the target table.

 

2) Encryption just after reading data from source (in my case Teradata)

I am using the ExecuteSQL processor which reads data from the source and then converts source tables into CSV format using the ConvertRecord processor before we insert those files into an S3 bucket.
If I am using encrypt content processor after ExecuteSQL, it throws me an error when the ConvertRecord processor tries to read those files and gives an error as 'The incoming file is not a data file'

 

My intent is if I can encrypt the data end to end. I tried looking for different options but so far it isn't giving me the desired result. Maybe if I am missing any processor or ignoring any properties that can help me achieve encryption, not sure. Anyone who can guide me to the solution or can suggest a better pipeline to implement this type of data migration will be helpful

Don't have an account?
Coming from Hortonworks? Activate your account here