Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive query fails - Requested replication 3 exceeds maximum 1

Highlighted

Hive query fails - Requested replication 3 exceeds maximum 1

Rising Star

I'm setting up a tiny pseudo-distributed cluster for testing. I have only one datanode. I works and I can load and query data in Hive. Great. Then I changed the HDFS replication factor from 3 to 1, and also changed the max replication factor to 1, too. I restarted all HDFS, YARN, and MR processes (that's all that Ambari indicated that needed restarted). Now, when I run a Hive query, I see this in the hadoop-hdfs-namenode-log

2016-05-19 15:26:03,322 INFO  ipc.Server (Server.java:run(2172)) - IPC Server handler 103 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 172.19.64.3:57450 Call#27745 Retry#0java.io.IOException: file /tmp/hive/mpetronic/_tez_session_dir/5e16aba9-d9d3-4138-afda-58c1d0027e4d/hive-hcatalog-core.jar on client 172.19.64.3.Requested replication 3 exceeds maximum 1   at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.verifyReplication(BlockManager.java:988)   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2374)   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2335)   at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:688)   at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:397)   at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)   at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)   at java.security.AccessController.doPrivileged(Native Method)   at javax.security.auth.Subject.doAs(Subject.java:422)   at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)

And my query fails with this...

0: jdbc:hive2://mpws:10000/default> select count(device_id) from vsat_modc_batch where device_id like 'DSS%' limit 10;
INFO  : Tez session hasn't been created yet. Opening session
ERROR : Failed to execute tez graph.
java.io.IOException: Previous writer likely failed to write hdfs://mpws:8020/tmp/hive/mpetronic/_tez_session_dir/9216a3de-e9fc-4940-a4b3-21e49bcca4b5/hive-hcatalog-core.jar. Failing because I am unlikely to write too.
    at org.apache.hadoop.hive.ql.exec.tez.DagUtils.localizeResource(DagUtils.java:982)
    at org.apache.hadoop.hive.ql.exec.tez.DagUtils.addTempResources(DagUtils.java:862)
    at org.apache.hadoop.hive.ql.exec.tez.DagUtils.localizeTempFilesFromConf(DagUtils.java:805)
    at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.refreshLocalResourcesFromConf(TezSessionState.java:233)
    at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:158)
    at org.apache.hadoop.hive.ql.exec.tez.TezTask.updateSession(TezTask.java:271)
    at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:151)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1720)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1477)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1254)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1118)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1113)
    at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)
    at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)
    at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask (state=08S01,code=1)

So, it seems that something did not get the memo about the reduced replication factor. Did I miss some additional configuration needed to run at replication factor = 1? I reconfigured back to 3 and everything works again. I could not find anything indicating I need to configure something in the hive or tez clients that is related to replication.

1 REPLY 1

Re: Hive query fails - Requested replication 3 exceeds maximum 1

The issues might be related to the missing blocks. Verify the block report and for missing blocks delete or upload the files.

Don't have an account?
Coming from Hortonworks? Activate your account here