I have been looking for a secure way to ingest data from relational databases into an HDP 2.5 cluster in a secure manor. As far as I can see sqoop in HDP 2.5 is still not updated to sqoop2, so there is no authentication/authorization (as far as I understand). What are alternative services are there that provide security and still make it easy to ingest data from relational databases into the cluster?
Sqoop 2 is not a production ready - it is a WIP. That said, sqoop 1 does allow for DB authorization and authentication to be used (in some cases using DB specific functionality - for example Oracle wallets for Oracle etc). On the Hadoop side, it works with kerberos enabled clusters.
Can you expand on authentication and authorization aspects of your question. Are you talking about running a sqoop action as different users with specific access restriction on the hadoop datasets? Sqoop being a tool runs with the identity of the current user, but you can use Oozie sqoop action (either as part of a workflow or as a script action) to schedule sqoop jobs to be scheduled for different users.