Support Questions

Find answers, ask questions, and share your expertise

Inserting From external Data Table to Hive Table

avatar
Rising Star

when we use PolyBase which is sql server 2016 technique and add an external table to a table in hive and we want to insert data in this external table =>inserting this data in associated hive table my question is ? is there any limit in external table max inserted records i mean if iam inserting data in external table from another sql server table that has more than 30000 records i encounter this error

Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "SQLNCLI11". 110802;An internal DMS error occurred that caused this operation to fail. Details: Exception: Microsoft.SqlServer.DataWarehouse.DataMovement.Common.ExternalAccess.HdfsAccessException, Message: Java exception raised on call to HdfsBridge_DestroyRecordWriter: Error [0 at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:513) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updatePipelineInternal(FSNamesystem.java:6379) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updatePipeline(FSNamesystem.java:6344) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updatePipeline(NameNodeRpcServer.java:822) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updatePipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:971) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) ] occurred while accessing external file.

while inserting less than 30000 records leads to every thing works ok and data is inserted in hive will this error becuase of one of the reasons

1- there is a limit in external table insert records number

2- there is a limit in poly base configuration

3- Any other problem in hive

1 ACCEPTED SOLUTION

avatar
Master Guru

Polybase Configuration Issue it seems

https://msdn.microsoft.com/en-us/library/dn935026.aspx

The maximum number of concurrent PolyBase queries is 32. When 32 concurrent queries are running, each query can read a maximum of 33,000 files from the external file location. The root folder and each subfolder also count as a file. If the degree of concurrency is less than 32, the external file location can contain more than 33,000 files.

Is this a single file or multiple files you are accessing. What is the Hive DDL?

I would recommend Hive with ORC format and fewer files.

See:

https://sqlwithmanoj.com/2016/06/09/polybase-error-in-sql-server-2016-row-size-exceeds-the-defined-m...

https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-service-capacity-limits

Loads

CategoryDescriptionMaximum
Polybase LoadsBytes per row32,768 Polybase loads are limited to loading rows both smaller than 32K and cannot load to VARCHR(MAX), NVARCHAR(MAX) or VARBINARY(MAX). While this limit exists today, it will be removed fairly soon.

see: https://blogs.msdn.microsoft.com/sqlcat/2016/06/21/polybase-setup-errors-and-possible-solutions/

Check for errors on the Hadoop server.

Which version of HDP or HDInsight? Hive version? Is there any memory issues?

Check logs and ambari.

View solution in original post

5 REPLIES 5

avatar
Super Guru
@oula.alshiekh@gmail.com alshiekh

When you say that you are able to insert less than 30K records, does that mean the same source and destination? The reason I ask is because your error points towards a permission/access issue.

avatar
Rising Star

source sql server table ,destination is hive table

i haven't configured any permissions configurations in hadoop yet

so my problem is because of polybase limited inserted rows

thanks for you

avatar
Master Guru

Polybase Configuration Issue it seems

https://msdn.microsoft.com/en-us/library/dn935026.aspx

The maximum number of concurrent PolyBase queries is 32. When 32 concurrent queries are running, each query can read a maximum of 33,000 files from the external file location. The root folder and each subfolder also count as a file. If the degree of concurrency is less than 32, the external file location can contain more than 33,000 files.

Is this a single file or multiple files you are accessing. What is the Hive DDL?

I would recommend Hive with ORC format and fewer files.

See:

https://sqlwithmanoj.com/2016/06/09/polybase-error-in-sql-server-2016-row-size-exceeds-the-defined-m...

https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-service-capacity-limits

Loads

CategoryDescriptionMaximum
Polybase LoadsBytes per row32,768 Polybase loads are limited to loading rows both smaller than 32K and cannot load to VARCHR(MAX), NVARCHAR(MAX) or VARBINARY(MAX). While this limit exists today, it will be removed fairly soon.

see: https://blogs.msdn.microsoft.com/sqlcat/2016/06/21/polybase-setup-errors-and-possible-solutions/

Check for errors on the Hadoop server.

Which version of HDP or HDInsight? Hive version? Is there any memory issues?

Check logs and ambari.

avatar
Rising Star

thanks for your brilliant detailed answer

avatar
Contributor

There are people who instead of helping what they do is confusing. This is a straight answer to the point. Congratulations and thank you.