Member since
03-15-2022
1
Post
0
Kudos Received
0
Solutions
03-18-2022
09:36 AM
Hello! My team and me, tried to send data, from the Dataware/SQLServer to the Datalake/HDP 2.6.5 via the Apache Hive endpoint (through Apache Knox) but the performances are very bad/slow (more than 8 minutes, to push 100 lines in Hive). We tried also to send the same data via the WebHDFS endpoint (through Apache Knox). But in this second scenario, we were unable to make it work. The simple workflow is following: With this "easy" configuration information (HTTPS and the WebHDFS host = knox.xxxxx.domain.com/gateway/default/webhdfs/v1 ) Do you have any idea why this is not working? We have already contacted Microsoft about this, for weeks, without any result. They ask us to contact you (Cloudera support) Microsoft case (can't connect to HDP HDFS thru Hadoop connector - TrackingID#2202230030001219) And we can't use Polybase (in SQLServer) due to the Knox activation 😉 Cf the documentation: https://docs.microsoft.com/en-us/sql/relational-databases/polybase/polybase-versioned-feature-summary?view=sql-server-ver15#known-limitations Maybe, meanwhile, we will test/skip Knox, and see if "Option 2: Enable mutual trust between the Windows domain and the Kerberos realm" can work 🙂 https://docs.microsoft.com/en-us/sql/integration-services/connection-manager/hadoop-connection-manager?view=sql-server-ver15#connect-with-kerberos-authentication Kind regards, PS: For your information, we benchmarked WebHDFS from an other environment. We spent 2h18 to send 774Go (552Mo/min, e 10Mo/s) [snappy parquet, 600 files, 128Mo/file]
... View more
Labels: