Created 10-01-2018 04:40 PM
Hello everybody!
I am trying to fill a Date Dimension in Hive using PDI Spoon transformations, my environment is the HDP Sandbox 2.6.4.
I already filled some small dimensions tables using PDI (v8.1), but for some reason the maximum amount of rows inserted is only 57, after that the jobs begin to throw errors like the following:
2018/09/29 14:41:36 - D_Date.0 - ERROR (version 8.1.0.0-365, build 8.1.0.0-365 from 2018-04-30 09.42.24 by buildguy) : Because of an error, this step can't continue: 2018/09/29 14:41:36 - D_Date.0 - ERROR (version 8.1.0.0-365, build 8.1.0.0-365 from 2018-04-30 09.42.24 by buildguy) : org.pentaho.di.core.exception.KettleException: 2018/09/29 14:41:36 - D_Date.0 - Error inserting row into table [dim_fecha] with values: [2015/01/01 00:00:00], [54], [2015/02/23 00:00:00.000], [20150223,0], [2015,0], [1,0], [2,0], [23,0], [1,0], [February], [Feb], [Monday], [Mon], [1,0], [1,0], [2,0], [54,0] 2018/09/29 14:41:36 - D_Date.0 - 2018/09/29 14:41:36 - D_Date.0 - Error inserting/updating row 2018/09/29 14:41:36 - D_Date.0 - Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1538242990698_0009_54_00, diagnostics=[Task failed, taskId=task_1538242990698_0009_54_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:370) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:164) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /apps/hive/warehouse/awv_almacen.db/dim_fecha/.hive-staging_hive_2018-09-29_18-41-23_686_5852176182341767434-4/_task_tmp.-ext-10000/_tmp.000000_0 could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1719) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3368) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3292) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:850) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:504) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:202) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1046) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:620) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:346) ... 15 more
And the Spoon's log output is longer than that but keep saying the same (more or less).
Does anyone have an idea of what the error might be? I would really appreciate some help 🙂 I am still very new in the Hadoop’s world.
Regards.
Created 10-01-2018 04:40 PM
Hi Harry Li, I am using 'Hadoop Copy Files' pace of Pentaho version 8.X. Its GUI is completely changed. I don’t see connection string option obtainable to provide Hadoop server information and port no anywhere to join HDFS. Do I have to request where can I acquire comprehensive guideline Help with Essays?
Created 10-01-2018 10:39 PM
have you tried executing the same insert using beeline with the same credentials? it looks like the kettle engine is invoking doAs (execute a command on bahalf of user x while logged in as some superuser), confirm the doAs is in fact enabled /possible for that admin user
you can invoke the same doAs in your beeline connection when testing
in doing this you should see the actual hive error (if any) that's happening
Created 10-02-2018 10:43 PM
Hello @rtheron, today I tried to insert rows through the Hive Web UI that comes with Ambari (logged in as admin) and they inserted well. Also, I tried changing the execution motor of Hive from TEZ to MapReduce and it worked, it filled the Date table of Hive.
Coincidentally I saw this other question today very similar to my case:
https://community.hortonworks.com/questions/222722/hive-query-fails-in-tez-runs-in-mr-mode.html
Do you another idea about the root of the problem?
Created 10-02-2018 10:58 PM
Using Hive Web UI (Hive View) does not mimic the Pentaho DoAs command correctly, Hive View will execute the DoAs as the "admin" user, while impersonating the end user (user logged in to Ambari), "admin" would by default have the privilege to do this
You need to test this on command line using the beeline utility, specifically with a JDBC connection that invokes the impersonation command on behalf of the user that Pentaho is configured to connect as (if the Pentaho processor has a specific connection string you can use that as well for your jdbc connection string in beeline)
The exercise here is to connect to hive exactly the same way that Pentaho would, using Hive view does not (necessarily) do that
an example of kerberos authenticated user Hive impersonating user "testuser":
jdbc:hvie2://HiveHost:10001/default;principal=hive/_host@HOST1.COM;hive.server2.proxy.user=testuser
for more information see the below article on impersonation in the zeppelin notebook interface: