Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Securely bulk ingest data

Highlighted

Securely bulk ingest data

New Contributor

I have been looking for a secure way to ingest data from relational databases into an HDP 2.5 cluster in a secure manor. As far as I can see sqoop in HDP 2.5 is still not updated to sqoop2, so there is no authentication/authorization (as far as I understand). What are alternative services are there that provide security and still make it easy to ingest data from relational databases into the cluster?

2 REPLIES 2

Re: Securely bulk ingest data

Super Guru

HDF can securely ingest data with full provenance, security and logins.

http://hortonworks.com/hadoop-tutorial/realtime-event-processing-nifi-kafka-storm/

Re: Securely bulk ingest data

Rising Star

Sqoop 2 is not a production ready - it is a WIP. That said, sqoop 1 does allow for DB authorization and authentication to be used (in some cases using DB specific functionality - for example Oracle wallets for Oracle etc). On the Hadoop side, it works with kerberos enabled clusters.

Can you expand on authentication and authorization aspects of your question. Are you talking about running a sqoop action as different users with specific access restriction on the hadoop datasets? Sqoop being a tool runs with the identity of the current user, but you can use Oozie sqoop action (either as part of a workflow or as a script action) to schedule sqoop jobs to be scheduled for different users.

Don't have an account?
Coming from Hortonworks? Activate your account here