Support Questions
Find answers, ask questions, and share your expertise

From SQLServer (SSIS) to HDP 2.6.5, via WebHDFS

New Contributor

Hello!

My team and me, tried to send data, from the Dataware/SQLServer to the Datalake/HDP 2.6.5 via the Apache Hive endpoint (through Apache Knox) but the performances are very bad/slow (more than 8 minutes, to push 100 lines in Hive). We tried also to send the same data via the WebHDFS endpoint (through Apache Knox). But in this second scenario, we were unable to make it work.

The simple workflow is following:
2.png
With this "easy" configuration information (HTTPS and the WebHDFS host = knox.xxxxx.domain.com/gateway/default/webhdfs/v1)
1.png

Do you have any idea why this is not working?
We have already contacted Microsoft about this, for weeks, without any result.
They ask us to contact you (Cloudera support)
Microsoft case (can't connect to HDP HDFS thru Hadoop connector - TrackingID#2202230030001219)

And we can't use Polybase (in SQLServer) due to the Knox activation 😉
Cf the documentation: https://docs.microsoft.com/en-us/sql/relational-databases/polybase/polybase-versioned-feature-summar...

Maybe, meanwhile, we will test/skip Knox, and see if "Option 2: Enable mutual trust between the Windows domain and the Kerberos realm" can work 🙂
https://docs.microsoft.com/en-us/sql/integration-services/connection-manager/hadoop-connection-manag...

Kind regards,

PS: For your information, we benchmarked WebHDFS from an other environment. We spent 2h18 to send 774Go (552Mo/min, e 10Mo/s) [snappy parquet, 600 files, 128Mo/file]

0 REPLIES 0
; ;