Created 11-06-2019 09:46 AM
I'm running a from-scratch cluster on AWS EC2. I have an external table (partitioned) defined with data on S3. I'm able to query this table and receive results to the console with a simple select * statement:
hive> set hive.execution.engine=tez;
hive> select * from external_table where partition_1='1' and partition_2='2';
[correct results returned]
Running a query that requires Tez doesn't return the results to the console:
hive> set hive.execution.engine=tez;
hive> select count(*) from external_table where partition_1='1' and partition_2='2';
Status: Running (Executing on YARN cluster with App id application_1572972524483_0012)
OK
+------+
| _c0 |
+------+
+------+
No rows selected (8.902 seconds)
However, if I dig in the logs and on the filesystem, I can find the results from that query:
(yarn.resourcemanager.log) org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1572972524483_0022 CONTAINERID=container_1572972524483_0022_01_000002 RESOURCE=<memory:1024, vCores:1> QUEUENAME=default
(container_folder/syslog_attempt) [TezChild] |exec.FileSinkOperator|: New Final Path: FS file:/tmp/[REALLY LONG FILE PATH]/000000_0
[root #] cat /tmp/[REALLY LONG FILE PATH]/000000_0
SEQ"org.apache.hadoop.io.BytesWritableorg.apache.hadoop.io.Textl▒ꩇ1som}▒▒j¹▒ 2060
2060 is the correct count for the partition.
Now, oddly enough, I'm able to get the results from the application if I insert overwrite directory on HDFS:
hive> set hive.execution.engine=tez;
hive> INSERT OVERWRITE DIRECTORY '/tmp/local_out' select count(*) from external_table where partition_1='1' and partition_2='2';
[root #] hdfs dfs -cat /tmp/local_out/000000_0
2060
However, attempting to insert overwrite local directory fails:
hive> set hive.execution.engine=tez;
hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/local_out' select count(*) from external_table where partition_1='1' and partition_2='2';
[root #] cat /tmp/local_out/000000_0
cat: /tmp/local_out/000000_0: No such file or directory
If I cat the container result file for this query, it's only the number, no class name or special characters:
[root #] cat /tmp/[REALLY LONG FILE PATH]/000000_0
2060
The only out-of-place log message I can find comes from the YARN ResourceManager log:
(yarn.resourcemanager.log) INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1572972524483_0023 CONTAINERID=container_1572972524483_0023_01_000004 RESOURCE=<memory:1024, vCores:1> QUEUENAME=default
(yarn.resourcemanager.log) WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root IP=NMIP OPERATION=AM Released Container TARGET=Scheduler RESULT=FAILURE DESCRIPTION=Trying to release container not owned by app or with invalid id. PERMISSIONS=Unauthorized access or invalid container APPID=application_1572972524483_0023 CONTAINERID=container_1572972524483_0023_01_000004
I've also tried creating a table and inserting data into it. The table creates just fine but when I tried to insert data, it throws an error:
hive> set hive.execution.engine=tez;
hive> insert into test_table (test_col) values ('blah'), ('blahblah');
Query ID = root_20191106172949_5301b127-7219-46d1-8fd2-dc80ca7e96ee
Total jobs = 1
Launching Job 1 out of 1
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1573060958692_0001_1_00, diagnostics=[Vertex vertex_1573060958692_0001_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: _dummy_table initializer failed, vertex=vertex_1573060958692_0001_1_00 [Map 1], org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/tmp/root/a9b76683-8e19-446a-be74-7a5daedf70e5/hive_2019-11-06_17-29-49_820_224977921325223208-2/dummy_path
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:332)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:274)
at org.apache.hadoop.hive.shims.Hadoop23Shims$1.listStatus(Hadoop23Shims.java:134)
at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:76)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:321)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:444)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:564)
at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateOldSplits(MRInputHelpers.java:488)
at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:337)
at org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.initialize(MRInputAMSplitGenerator.java:122)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
My versions are as follows:
Hadoop 3.2.1
Hive 3.1.2
Tez 0.9.2
Any help is much appreciated!
Created 11-08-2019 06:17 AM
This problem is caused by "mapreduce.framework.name=local" (default in Hadoop 3.2.1). Solved with "set mapreduce.framework.name=yarn".
Created 11-08-2019 06:17 AM
This problem is caused by "mapreduce.framework.name=local" (default in Hadoop 3.2.1). Solved with "set mapreduce.framework.name=yarn".