Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Problem inserting in Hive using Pentaho Data Integration

Problem inserting in Hive using Pentaho Data Integration

New Contributor

Hello everybody!

I am trying to fill a Date Dimension in Hive using PDI Spoon transformations, my environment is the HDP Sandbox 2.6.4.

I already filled some small dimensions tables using PDI (v8.1), but for some reason the maximum amount of rows inserted is only 57, after that the jobs begin to throw errors like the following:

2018/09/29 14:41:36 - D_Date.0 - ERROR (version 8.1.0.0-365, build 8.1.0.0-365 from 2018-04-30 09.42.24 by buildguy) : Because of an error, this step can't continue:
2018/09/29 14:41:36 - D_Date.0 - ERROR (version 8.1.0.0-365, build 8.1.0.0-365 from 2018-04-30 09.42.24 by buildguy) : org.pentaho.di.core.exception.KettleException: 
2018/09/29 14:41:36 - D_Date.0 - Error inserting row into table [dim_fecha] with values: [2015/01/01 00:00:00], [54], [2015/02/23 00:00:00.000], [20150223,0], [2015,0], [1,0], [2,0], [23,0], [1,0], [February], [Feb], [Monday], [Mon], [1,0], [1,0], [2,0], [54,0]
2018/09/29 14:41:36 - D_Date.0 - 
2018/09/29 14:41:36 - D_Date.0 - Error inserting/updating row
2018/09/29 14:41:36 - D_Date.0 - Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1538242990698_0009_54_00, diagnostics=[Task failed, taskId=task_1538242990698_0009_54_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error while closing operators
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators
	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:370)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:164)
	... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /apps/hive/warehouse/awv_almacen.db/dim_fecha/.hive-staging_hive_2018-09-29_18-41-23_686_5852176182341767434-4/_task_tmp.-ext-10000/_tmp.000000_0 could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1719)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3368)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3292)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:850)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:504)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)


	at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:202)
	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1046)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:620)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634)
	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:346)
	... 15 more

And the Spoon's log output is longer than that but keep saying the same (more or less).

Does anyone have an idea of what the error might be? I would really appreciate some help :) I am still very new in the Hadoop’s world.

Regards.

4 REPLIES 4

Re: Problem inserting in Hive using Pentaho Data Integration

New Contributor

Hi Harry Li, I am using 'Hadoop Copy Files' pace of Pentaho version 8.X. Its GUI is completely changed. I don’t see connection string option obtainable to provide Hadoop server information and port no anywhere to join HDFS. Do I have to request where can I acquire comprehensive guideline Help with Essays?

Re: Problem inserting in Hive using Pentaho Data Integration

Contributor

have you tried executing the same insert using beeline with the same credentials? it looks like the kettle engine is invoking doAs (execute a command on bahalf of user x while logged in as some superuser), confirm the doAs is in fact enabled /possible for that admin user

you can invoke the same doAs in your beeline connection when testing

in doing this you should see the actual hive error (if any) that's happening

Highlighted

Re: Problem inserting in Hive using Pentaho Data Integration

New Contributor

Hello @rtheron, today I tried to insert rows through the Hive Web UI that comes with Ambari (logged in as admin) and they inserted well. Also, I tried changing the execution motor of Hive from TEZ to MapReduce and it worked, it filled the Date table of Hive.

Coincidentally I saw this other question today very similar to my case:

https://community.hortonworks.com/questions/222722/hive-query-fails-in-tez-runs-in-mr-mode.html

Do you another idea about the root of the problem?

Re: Problem inserting in Hive using Pentaho Data Integration

Contributor

Using Hive Web UI (Hive View) does not mimic the Pentaho DoAs command correctly, Hive View will execute the DoAs as the "admin" user, while impersonating the end user (user logged in to Ambari), "admin" would by default have the privilege to do this

You need to test this on command line using the beeline utility, specifically with a JDBC connection that invokes the impersonation command on behalf of the user that Pentaho is configured to connect as (if the Pentaho processor has a specific connection string you can use that as well for your jdbc connection string in beeline)

The exercise here is to connect to hive exactly the same way that Pentaho would, using Hive view does not (necessarily) do that

an example of kerberos authenticated user Hive impersonating user "testuser":

jdbc:hvie2://HiveHost:10001/default;principal=hive/_host@HOST1.COM;hive.server2.proxy.user=testuser

for more information see the below article on impersonation in the zeppelin notebook interface:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_zeppelin-component-guide/content/config-...