Support Questions
Find answers, ask questions, and share your expertise

Hive query fails - Requested replication 3 exceeds maximum 1

Highlighted

Hive query fails - Requested replication 3 exceeds maximum 1

Rising Star

I'm setting up a tiny pseudo-distributed cluster for testing. I have only one datanode. I works and I can load and query data in Hive. Great. Then I changed the HDFS replication factor from 3 to 1, and also changed the max replication factor to 1, too. I restarted all HDFS, YARN, and MR processes (that's all that Ambari indicated that needed restarted). Now, when I run a Hive query, I see this in the hadoop-hdfs-namenode-log

2016-05-19 15:26:03,322 INFO  ipc.Server (Server.java:run(2172)) - IPC Server handler 103 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 172.19.64.3:57450 Call#27745 Retry#0java.io.IOException: file /tmp/hive/mpetronic/_tez_session_dir/5e16aba9-d9d3-4138-afda-58c1d0027e4d/hive-hcatalog-core.jar on client 172.19.64.3.Requested replication 3 exceeds maximum 1   at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.verifyReplication(BlockManager.java:988)   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2374)   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2335)   at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:688)   at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:397)   at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)   at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)   at java.security.AccessController.doPrivileged(Native Method)   at javax.security.auth.Subject.doAs(Subject.java:422)   at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)

And my query fails with this...

0: jdbc:hive2://mpws:10000/default> select count(device_id) from vsat_modc_batch where device_id like 'DSS%' limit 10;
INFO  : Tez session hasn't been created yet. Opening session
ERROR : Failed to execute tez graph.
java.io.IOException: Previous writer likely failed to write hdfs://mpws:8020/tmp/hive/mpetronic/_tez_session_dir/9216a3de-e9fc-4940-a4b3-21e49bcca4b5/hive-hcatalog-core.jar. Failing because I am unlikely to write too.
    at org.apache.hadoop.hive.ql.exec.tez.DagUtils.localizeResource(DagUtils.java:982)
    at org.apache.hadoop.hive.ql.exec.tez.DagUtils.addTempResources(DagUtils.java:862)
    at org.apache.hadoop.hive.ql.exec.tez.DagUtils.localizeTempFilesFromConf(DagUtils.java:805)
    at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.refreshLocalResourcesFromConf(TezSessionState.java:233)
    at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:158)
    at org.apache.hadoop.hive.ql.exec.tez.TezTask.updateSession(TezTask.java:271)
    at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:151)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1720)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1477)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1254)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1118)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1113)
    at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)
    at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)
    at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask (state=08S01,code=1)

So, it seems that something did not get the memo about the reduced replication factor. Did I miss some additional configuration needed to run at replication factor = 1? I reconfigured back to 3 and everything works again. I could not find anything indicating I need to configure something in the hive or tez clients that is related to replication.

1 REPLY 1
Highlighted

Re: Hive query fails - Requested replication 3 exceeds maximum 1

The issues might be related to the missing blocks. Verify the block report and for missing blocks delete or upload the files.