Created 12-13-2016 02:09 PM
when we use PolyBase which is sql server 2016 technique and add an external table to a table in hive and we want to insert data in this external table =>inserting this data in associated hive table my question is ? is there any limit in external table max inserted records i mean if iam inserting data in external table from another sql server table that has more than 30000 records i encounter this error
Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "SQLNCLI11". 110802;An internal DMS error occurred that caused this operation to fail. Details: Exception: Microsoft.SqlServer.DataWarehouse.DataMovement.Common.ExternalAccess.HdfsAccessException, Message: Java exception raised on call to HdfsBridge_DestroyRecordWriter: Error [0 at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:513) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updatePipelineInternal(FSNamesystem.java:6379) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updatePipeline(FSNamesystem.java:6344) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updatePipeline(NameNodeRpcServer.java:822) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updatePipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:971) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) ] occurred while accessing external file.
while inserting less than 30000 records leads to every thing works ok and data is inserted in hive will this error becuase of one of the reasons
1- there is a limit in external table insert records number
2- there is a limit in poly base configuration
3- Any other problem in hive
Created 12-13-2016 03:34 PM
Polybase Configuration Issue it seems
https://msdn.microsoft.com/en-us/library/dn935026.aspx
The maximum number of concurrent PolyBase queries is 32. When 32 concurrent queries are running, each query can read a maximum of 33,000 files from the external file location. The root folder and each subfolder also count as a file. If the degree of concurrency is less than 32, the external file location can contain more than 33,000 files.
Is this a single file or multiple files you are accessing. What is the Hive DDL?
I would recommend Hive with ORC format and fewer files.
See:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-service-capacity-limits
Category | Description | Maximum |
---|---|---|
Polybase Loads | Bytes per row | 32,768 Polybase loads are limited to loading rows both smaller than 32K and cannot load to VARCHR(MAX), NVARCHAR(MAX) or VARBINARY(MAX). While this limit exists today, it will be removed fairly soon. |
see: https://blogs.msdn.microsoft.com/sqlcat/2016/06/21/polybase-setup-errors-and-possible-solutions/
Check for errors on the Hadoop server.
Which version of HDP or HDInsight? Hive version? Is there any memory issues?
Check logs and ambari.
Created 12-13-2016 03:32 PM
When you say that you are able to insert less than 30K records, does that mean the same source and destination? The reason I ask is because your error points towards a permission/access issue.
Created 12-14-2016 09:48 AM
source sql server table ,destination is hive table
i haven't configured any permissions configurations in hadoop yet
so my problem is because of polybase limited inserted rows
thanks for you
Created 12-13-2016 03:34 PM
Polybase Configuration Issue it seems
https://msdn.microsoft.com/en-us/library/dn935026.aspx
The maximum number of concurrent PolyBase queries is 32. When 32 concurrent queries are running, each query can read a maximum of 33,000 files from the external file location. The root folder and each subfolder also count as a file. If the degree of concurrency is less than 32, the external file location can contain more than 33,000 files.
Is this a single file or multiple files you are accessing. What is the Hive DDL?
I would recommend Hive with ORC format and fewer files.
See:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-service-capacity-limits
Category | Description | Maximum |
---|---|---|
Polybase Loads | Bytes per row | 32,768 Polybase loads are limited to loading rows both smaller than 32K and cannot load to VARCHR(MAX), NVARCHAR(MAX) or VARBINARY(MAX). While this limit exists today, it will be removed fairly soon. |
see: https://blogs.msdn.microsoft.com/sqlcat/2016/06/21/polybase-setup-errors-and-possible-solutions/
Check for errors on the Hadoop server.
Which version of HDP or HDInsight? Hive version? Is there any memory issues?
Check logs and ambari.
Created 12-14-2016 09:49 AM
thanks for your brilliant detailed answer
Created 01-06-2017 07:15 PM
There are people who instead of helping what they do is confusing. This is a straight answer to the point. Congratulations and thank you.