Member since
12-21-2020
91
Posts
8
Kudos Received
13
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1832 | 08-12-2021 05:16 AM | |
2073 | 06-29-2021 06:21 AM | |
2509 | 06-16-2021 07:15 AM | |
1742 | 06-14-2021 12:08 AM | |
5835 | 05-14-2021 06:03 AM |
07-08-2021
10:23 PM
Hi @asish , Apologies for the late reply. I tried the query you have suggested but still don't observe any difference in the runtime. Thanks, Megh
... View more
07-01-2021
07:33 AM
Hello All,
A Hive Query just ran for almost 7.5 Hours and I'm trying to understand what has caused such a significantly long runtime.
These are the query stats:
[2K[36;1m VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
[22;0m[2K----------------------------------------------------------------------------------------------
[2KMap 1 .......... container SUCCEEDED 3348 3348 0 0 0 0
Map 4 .......... container SUCCEEDED 27 27 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1009 1009 0 0 0 0
Reducer 3 ...... container SUCCEEDED 1 1 0 0 0 0
[2K----------------------------------------------------------------------------------------------
[2K[31;1mVERTICES: 04/04 [==========================>>] 100% ELAPSED TIME: 27285.00 s
[22;0m[2K----------------------------------------------------------------------------------------------
INFO : Query Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : OPERATION DURATION
INFO : ----------------------------------------------------------------------------------------------
INFO : Compile Query 0.26s
INFO : Prepare Plan 0.05s
INFO : Get Query Coordinator (AM) 0.00s
INFO : Submit Plan 0.13s
INFO : Start DAG 0.03s
INFO : Run DAG 27270.01s
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : Task Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS
INFO : ----------------------------------------------------------------------------------------------
INFO : Map 1 20831855.00 326,923,890 1,208,403 9,569,213,817 1,112,670,816,499
INFO : Map 4 38042.00 403,640 2,166 47,259,215 73,854,016
INFO : Reducer 2 6524435.00 108,305,790 1,367,042 9,617,725,549 1,009
INFO : Reducer 3 73228.00 2,860 0 1,009 0
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : org.apache.tez.common.counters.DAGCounter:
INFO : NUM_SUCCEEDED_TASKS: 4385
INFO : TOTAL_LAUNCHED_TASKS: 4385
INFO : DATA_LOCAL_TASKS: 554
INFO : RACK_LOCAL_TASKS: 2821
INFO : AM_CPU_MILLISECONDS: 1643080
INFO : AM_GC_TIME_MILLIS: 6737
INFO : File System Counters:
INFO : FILE_BYTES_READ: 1860884020897
INFO : FILE_BYTES_WRITTEN: 3773542394532
INFO : HDFS_BYTES_READ: 2949584168417
INFO : HDFS_BYTES_WRITTEN: 74639411482
INFO : HDFS_READ_OPS: 8926
INFO : HDFS_WRITE_OPS: 3029
INFO : HDFS_OP_CREATE: 2019
INFO : HDFS_OP_GET_FILE_STATUS: 3029
INFO : HDFS_OP_OPEN: 5897
INFO : HDFS_OP_RENAME: 1010
INFO : org.apache.tez.common.counters.TaskCounter:
INFO : REDUCE_INPUT_GROUPS: 410577279
INFO : REDUCE_INPUT_RECORDS: 9617725549
INFO : COMBINE_INPUT_RECORDS: 0
INFO : SPILLED_RECORDS: 20722623211
INFO : NUM_SHUFFLED_INPUTS: 3406384
INFO : NUM_SKIPPED_INPUTS: 0
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : MERGED_MAP_OUTPUTS: 71643172699
INFO : GC_TIME_MILLIS: 2577611
INFO : TASK_DURATION_MILLIS: 406524896
INFO : CPU_MILLISECONDS: 435636180
INFO : PHYSICAL_MEMORY_BYTES: 75333726371840
INFO : VIRTUAL_MEMORY_BYTES: 88797821091840
INFO : COMMITTED_HEAP_BYTES: 75333726371840
INFO : INPUT_RECORDS_PROCESSED: 9570513520
INFO : INPUT_SPLIT_LENGTH_BYTES: 2949584055487
INFO : OUTPUT_RECORDS: 9617726558
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_BYTES: 2238503743864
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 2139826941103
INFO : OUTPUT_BYTES_PHYSICAL: 1211054027209
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 425866143406
INFO : ADDITIONAL_SPILLS_BYTES_READ: 1747889484679
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : SHUFFLE_CHUNK_COUNT: 3375
INFO : SHUFFLE_BYTES: 1211054002993
INFO : SHUFFLE_BYTES_DECOMPRESSED: 2139826941103
INFO : SHUFFLE_BYTES_TO_MEM: 1003782114883
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_DISK_DIRECT: 207271888110
INFO : NUM_MEM_TO_DISK_MERGES: 180
INFO : NUM_DISK_TO_DISK_MERGES: 1116
INFO : SHUFFLE_PHASE_TIME: 14878024
INFO : MERGE_PHASE_TIME: 25783550
INFO : FIRST_EVENT_RECEIVED: 7091
INFO : LAST_EVENT_RECEIVED: 749095
INFO : DATA_BYTES_VIA_EVENT: 0
INFO : HIVE:
INFO : CREATED_FILES: 1010
INFO : DESERIALIZE_ERRORS: 0
INFO : RECORDS_IN_Map_1: 9569213817
INFO : RECORDS_IN_Map_4: 47259215
INFO : RECORDS_OUT_0: 1
INFO : RECORDS_OUT_1_upifrm2020.upi_ft_selected_py_510: 2196207622
INFO : RECORDS_OUT_INTERMEDIATE_Map_1: 1112670816499
INFO : RECORDS_OUT_INTERMEDIATE_Map_4: 73854016
INFO : RECORDS_OUT_INTERMEDIATE_Reducer_2: 1009
INFO : RECORDS_OUT_INTERMEDIATE_Reducer_3: 0
INFO : RECORDS_OUT_OPERATOR_FIL_32: 9570466334
INFO : RECORDS_OUT_OPERATOR_FIL_35: 47259215
INFO : RECORDS_OUT_OPERATOR_FS_10: 2196207622
INFO : RECORDS_OUT_OPERATOR_FS_20: 1
INFO : RECORDS_OUT_OPERATOR_GBY_16: 1009
INFO : RECORDS_OUT_OPERATOR_GBY_18: 1
INFO : RECORDS_OUT_OPERATOR_MAP_0: 0
INFO : RECORDS_OUT_OPERATOR_MERGEJOIN_31: 2196207622
INFO : RECORDS_OUT_OPERATOR_RS_17: 1009
INFO : RECORDS_OUT_OPERATOR_RS_34: 1112670816499
INFO : RECORDS_OUT_OPERATOR_RS_37: 73854016
INFO : RECORDS_OUT_OPERATOR_SEL_15: 2196207622
INFO : RECORDS_OUT_OPERATOR_SEL_33: 9570466334
INFO : RECORDS_OUT_OPERATOR_SEL_36: 47259215
INFO : RECORDS_OUT_OPERATOR_SEL_9: 2196207622
INFO : RECORDS_OUT_OPERATOR_TS_0: 9570466334
INFO : RECORDS_OUT_OPERATOR_TS_3: 47259215
INFO : TOTAL_TABLE_ROWS_WRITTEN: 2196207622
INFO : Shuffle Errors:
INFO : BAD_ID: 0
INFO : CONNECTION: 0
INFO : IO_ERROR: 0
INFO : WRONG_LENGTH: 0
INFO : WRONG_MAP: 0
INFO : WRONG_REDUCE: 0
INFO : Shuffle Errors_Reducer_2_INPUT_Map_1:
INFO : BAD_ID: 0
INFO : CONNECTION: 0
INFO : IO_ERROR: 0
INFO : WRONG_LENGTH: 0
INFO : WRONG_MAP: 0
INFO : WRONG_REDUCE: 0
INFO : Shuffle Errors_Reducer_2_INPUT_Map_4:
INFO : BAD_ID: 0
INFO : CONNECTION: 0
INFO : IO_ERROR: 0
INFO : WRONG_LENGTH: 0
INFO : WRONG_MAP: 0
INFO : WRONG_REDUCE: 0
INFO : TaskCounter_Map_1_INPUT_a:
INFO : INPUT_RECORDS_PROCESSED: 9570466334
INFO : INPUT_SPLIT_LENGTH_BYTES: 2949222655519
INFO : TaskCounter_Map_1_OUTPUT_Reducer_2:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : OUTPUT_BYTES: 2237144940183
INFO : OUTPUT_BYTES_PHYSICAL: 1210237277829
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 2138373445444
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_RECORDS: 9570466334
INFO : SHUFFLE_CHUNK_COUNT: 3348
INFO : SPILLED_RECORDS: 9570466334
INFO : TaskCounter_Map_4_INPUT_b:
INFO : INPUT_RECORDS_PROCESSED: 46177
INFO : INPUT_SPLIT_LENGTH_BYTES: 361399968
INFO : TaskCounter_Map_4_OUTPUT_Reducer_2:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : OUTPUT_BYTES: 1348619833
INFO : OUTPUT_BYTES_PHYSICAL: 808977386
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 1443301721
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_RECORDS: 47259215
INFO : SHUFFLE_CHUNK_COUNT: 27
INFO : SPILLED_RECORDS: 47259215
INFO : TaskCounter_Reducer_2_INPUT_Map_1:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 1747171072110
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 425288767549
INFO : COMBINE_INPUT_RECORDS: 0
INFO : FIRST_EVENT_RECEIVED: 3494
INFO : LAST_EVENT_RECEIVED: 668391
INFO : MERGED_MAP_OUTPUTS: 71643145456
INFO : MERGE_PHASE_TIME: 25289522
INFO : NUM_DISK_TO_DISK_MERGES: 1116
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : NUM_MEM_TO_DISK_MERGES: 180
INFO : NUM_SHUFFLED_INPUTS: 3378132
INFO : NUM_SKIPPED_INPUTS: 0
INFO : REDUCE_INPUT_GROUPS: 363318064
INFO : REDUCE_INPUT_RECORDS: 9570466334
INFO : SHUFFLE_BYTES: 1210237277829
INFO : SHUFFLE_BYTES_DECOMPRESSED: 2138373445444
INFO : SHUFFLE_BYTES_DISK_DIRECT: 207129855550
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_TO_MEM: 1003107422279
INFO : SHUFFLE_PHASE_TIME: 14385309
INFO : SPILLED_RECORDS: 11057638447
INFO : TaskCounter_Reducer_2_INPUT_Map_4:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 718412569
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 577375857
INFO : COMBINE_INPUT_RECORDS: 0
INFO : FIRST_EVENT_RECEIVED: 3584
INFO : LAST_EVENT_RECEIVED: 7551
INFO : MERGED_MAP_OUTPUTS: 27243
INFO : MERGE_PHASE_TIME: 494028
INFO : NUM_DISK_TO_DISK_MERGES: 0
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : NUM_MEM_TO_DISK_MERGES: 0
INFO : NUM_SHUFFLED_INPUTS: 27243
INFO : NUM_SKIPPED_INPUTS: 0
INFO : REDUCE_INPUT_GROUPS: 47259215
INFO : REDUCE_INPUT_RECORDS: 47259215
INFO : SHUFFLE_BYTES: 808977386
INFO : SHUFFLE_BYTES_DECOMPRESSED: 1443301721
INFO : SHUFFLE_BYTES_DISK_DIRECT: 141036712
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_TO_MEM: 667940674
INFO : SHUFFLE_PHASE_TIME: 419560
INFO : SPILLED_RECORDS: 47259215
INFO : TaskCounter_Reducer_2_OUTPUT_Reducer_3:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : DATA_BYTES_VIA_EVENT: 0
INFO : OUTPUT_BYTES: 10183848
INFO : OUTPUT_BYTES_PHYSICAL: 7771994
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 10193938
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_RECORDS: 1009
INFO : SPILLED_RECORDS: 0
INFO : TaskCounter_Reducer_3_INPUT_Reducer_2:
INFO : FIRST_EVENT_RECEIVED: 13
INFO : INPUT_RECORDS_PROCESSED: 1009
INFO : LAST_EVENT_RECEIVED: 73153
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : NUM_SHUFFLED_INPUTS: 1009
INFO : SHUFFLE_BYTES: 7747778
INFO : SHUFFLE_BYTES_DECOMPRESSED: 10193938
INFO : SHUFFLE_BYTES_DISK_DIRECT: 995848
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_TO_MEM: 6751930
INFO : SHUFFLE_PHASE_TIME: 73155
INFO : TaskCounter_Reducer_3_OUTPUT_out_Reducer_3:
INFO : OUTPUT_RECORDS: 0
INFO : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters:
INFO : GROUPED_INPUT_SPLITS_Map_1: 3348
INFO : GROUPED_INPUT_SPLITS_Map_4: 27
INFO : INPUT_DIRECTORIES_Map_1: 96
INFO : INPUT_DIRECTORIES_Map_4: 1
INFO : INPUT_FILES_Map_1: 5843
INFO : INPUT_FILES_Map_4: 27
INFO : RAW_INPUT_SPLITS_Map_1: 5843
INFO : RAW_INPUT_SPLITS_Map_4: 27
This is the query:
create table mydb.mytbl
as select col1,col2,...col18
from mydb.tbl1 a
inner join mydb.tbl2 b on concat(trim(col1),trim(col2))=colb where date_partition_col between date_sub('2021-05-10',95) and '2021-05-10';
This is the explain plan:
+----------------------------------------------------+
| Explain |
+----------------------------------------------------+
| Plan optimized by CBO. |
| |
| Vertex dependency in root stage |
| Map 1 <- Map 2 (BROADCAST_EDGE) |
| |
| Stage-0 |
| Fetch Operator |
| limit:-1 |
| Stage-1 |
| Map 1 vectorized |
| File Output Operator [FS_30] |
| Select Operator [SEL_29] (rows=122690953 width=12844) |
| Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17"] |
| Map Join Operator [MAPJOIN_28] (rows=122690953 width=12844) |
| Conds:SEL_27._col18=RS_25._col0(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17"] |
| <-Map 2 [BROADCAST_EDGE] vectorized |
| BROADCAST [RS_25] |
| PartitionCols:_col0 |
| Select Operator [SEL_24] (rows=47259215 width=110) |
| Output:["_col0"] |
| Filter Operator [FIL_23] (rows=47259215 width=110) |
| predicate:colb is not null |
| TableScan [TS_3] (rows=47259215 width=110) |
| mydb@tbl2,b, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["colb"] |
| <-Select Operator [SEL_27] (rows=111537228 width=12844) |
| Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17","_col18"] |
| Filter Operator [FIL_26] (rows=111537228 width=12844) |
| predicate:(concat(trim(col1), trim(col2)) is not null and CAST( date_partition_col AS DATE) BETWEEN DATE'2021-02-04' AND DATE'2021-05-10') |
| TableScan [TS_0] (rows=1056668483 width=12844) |
| mydb@tbl1,a,Tbl:PARTIAL,Col:PARTIAL,Output:["col1","col2",...,"col18"] |
Can someone help me figure out the bottlenecks here?
Thanks,
Megh
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez
06-29-2021
06:21 AM
Hi @Shifu , So supplying these config properties at runtime didn't work but changing the service configuration to modify the below parameters did the job for me: set tez.runtime.io.sort.mb=3072; set tez.task.resource.memory.mb=8192; set tez.am.resource.memory.mb=8192; set tez.am.launch.cmd-opts=-Xmx6553m; Not sure why that might be the case, but the issue seems to have been fixed. Thanks, Megh
... View more
06-28-2021
03:46 AM
Hi @Shifu , Tried the configuration parameters given by you, still facing the same error. Thanks, Megh
... View more
06-28-2021
02:59 AM
Hi @Fawze , This is happening for ANALYZE TABLE commands, which I think are map only. I've tried the heap space options you have mentioned and they're not helping. Thanks, Megh
... View more
06-28-2021
02:19 AM
Hi @Fawze , Did you manage to solve this? We're also hitting the same issue. Thanks, Megh
... View more
06-18-2021
06:09 AM
Hello Everyone, In my CDP setup, I'm not able to run Analyze table commands on external tables. It first doesn't start a YARN application immediately, and after 5-10 mins, when it starts the application, it gets killed within 10 seconds with the following error stacktrace. ERROR : Status: Failed
ERROR : Application application_1623850591633_0042 failed 2 times due to AM Container for appattempt_1623850591633_0042_000002 exited with exitCode: -104
Failing this attempt.Diagnostics: [2021-06-18 18:32:52.722]Container [pid=32822,containerID=container_e49_1623850591633_0042_02_000001] is running 34230272B beyond the 'PHYSICAL' memory limit. Current usage: 2.0 GB of 2 GB physical memory used; 3.9 GB of 4.2 GB virtual memory used. Killing container.
Dump of the process-tree for container_e49_1623850591633_0042_02_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 32829 32822 32822 32822 (java) 3070 3010 4117856256 532273 /usr/java/default/bin/java -Xmx1638m -Djava.io.tmpdir=/data8/yarn/nm/usercache/hive/appcache/application_1623850591633_0042/container_e49_1623850591633_0042_02_000001/tmp -server -Djava.net.preferIPv4Stack=true -XX:+PrintGCDetails -verbose:gc -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/data1/yarn/container-logs/application_1623850591633_0042/container_e49_1623850591633_0042_02_000001 -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel= org.apache.tez.dag.app.DAGAppMaster --session
|- 32822 32819 32822 32822 (bash) 0 0 118185984 372 /bin/bash -c /usr/java/default/bin/java -Xmx1638m -Djava.io.tmpdir=/data8/yarn/nm/usercache/hive/appcache/application_1623850591633_0042/container_e49_1623850591633_0042_02_000001/tmp -server -Djava.net.preferIPv4Stack=true -XX:+PrintGCDetails -verbose:gc -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/data1/yarn/container-logs/application_1623850591633_0042/container_e49_1623850591633_0042_02_000001 -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel='' org.apache.tez.dag.app.DAGAppMaster --session 1>/data1/yarn/container-logs/application_1623850591633_0042/container_e49_1623850591633_0042_02_000001/stdout 2>/data1/yarn/container-logs/application_1623850591633_0042/container_e49_1623850591633_0042_02_000001/stderr
[2021-06-18 18:32:52.731]Container killed on request. Exit code is 143
[2021-06-18 18:32:52.731]Container exited with a non-zero exit code 143.
For more detailed output, check the application tracking page: http://<yarn-hostname>:8088/cluster/app/application_1623850591633_0042 Then click on links to logs of each attempt.
. Failing the application.
ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Application application_1623850591633_0042 failed 2 times due to AM Container for appattempt_1623850591633_0042_000002 exited with exitCode: -104
Failing this attempt.Diagnostics: [2021-06-18 18:32:52.722]Container [pid=32822,containerID=container_e49_1623850591633_0042_02_000001] is running 34230272B beyond the 'PHYSICAL' memory limit. Current usage: 2.0 GB of 2 GB physical memory used; 3.9 GB of 4.2 GB virtual memory used. Killing container.
Dump of the process-tree for container_e49_1623850591633_0042_02_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 32829 32822 32822 32822 (java) 3070 3010 4117856256 532273 /usr/java/default/bin/java -Xmx1638m -Djava.io.tmpdir=/data8/yarn/nm/usercache/hive/appcache/application_1623850591633_0042/container_e49_1623850591633_0042_02_000001/tmp -server -Djava.net.preferIPv4Stack=true -XX:+PrintGCDetails -verbose:gc -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/data1/yarn/container-logs/application_1623850591633_0042/container_e49_1623850591633_0042_02_000001 -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel= org.apache.tez.dag.app.DAGAppMaster --session
|- 32822 32819 32822 32822 (bash) 0 0 118185984 372 /bin/bash -c /usr/java/default/bin/java -Xmx1638m -Djava.io.tmpdir=/data8/yarn/nm/usercache/hive/appcache/application_1623850591633_0042/container_e49_1623850591633_0042_02_000001/tmp -server -Djava.net.preferIPv4Stack=true -XX:+PrintGCDetails -verbose:gc -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/data1/yarn/container-logs/application_1623850591633_0042/container_e49_1623850591633_0042_02_000001 -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel='' org.apache.tez.dag.app.DAGAppMaster --session 1>/data1/yarn/container-logs/application_1623850591633_0042/container_e49_1623850591633_0042_02_000001/stdout 2>/data1/yarn/container-logs/application_1623850591633_0042/container_e49_1623850591633_0042_02_000001/stderr
[2021-06-18 18:32:52.731]Container killed on request. Exit code is 143
[2021-06-18 18:32:52.731]Container exited with a non-zero exit code 143.
For more detailed output, check the application tracking page: http://<yarn-hostname>:8088/cluster/app/application_1623850591633_0042 Then click on links to logs of each attempt.
. Failing the application.
INFO : Completed executing command(queryId=hive_20210618183003_95b31532-fda8-4d10-bb14-bfcb1e833ca7); Time taken: 17.503 seconds
INFO : OK
Error: Error while compiling statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Application application_1623850591633_0042 failed 2 times due to AM Container for appattempt_1623850591633_0042_000002 exited with exitCode: -104
Failing this attempt.Diagnostics: [2021-06-18 18:32:52.722]Container [pid=32822,containerID=container_e49_1623850591633_0042_02_000001] is running 34230272B beyond the 'PHYSICAL' memory limit. Current usage: 2.0 GB of 2 GB physical memory used; 3.9 GB of 4.2 GB virtual memory used. Killing container.
Dump of the process-tree for container_e49_1623850591633_0042_02_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 32829 32822 32822 32822 (java) 3070 3010 4117856256 532273 /usr/java/default/bin/java -Xmx1638m -Djava.io.tmpdir=/data8/yarn/nm/usercache/hive/appcache/application_1623850591633_0042/container_e49_1623850591633_0042_02_000001/tmp -server -Djava.net.preferIPv4Stack=true -XX:+PrintGCDetails -verbose:gc -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/data1/yarn/container-logs/application_1623850591633_0042/container_e49_1623850591633_0042_02_000001 -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel= org.apache.tez.dag.app.DAGAppMaster --session
|- 32822 32819 32822 32822 (bash) 0 0 118185984 372 /bin/bash -c /usr/java/default/bin/java -Xmx1638m -Djava.io.tmpdir=/data8/yarn/nm/usercache/hive/appcache/application_1623850591633_0042/container_e49_1623850591633_0042_02_000001/tmp -server -Djava.net.preferIPv4Stack=true -XX:+PrintGCDetails -verbose:gc -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/data1/yarn/container-logs/application_1623850591633_0042/container_e49_1623850591633_0042_02_000001 -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel='' org.apache.tez.dag.app.DAGAppMaster --session 1>/data1/yarn/container-logs/application_1623850591633_0042/container_e49_1623850591633_0042_02_000001/stdout 2>/data1/yarn/container-logs/application_1623850591633_0042/container_e49_1623850591633_0042_02_000001/stderr
[2021-06-18 18:32:52.731]Container killed on request. Exit code is 143
[2021-06-18 18:32:52.731]Container exited with a non-zero exit code 143.
For more detailed output, check the application tracking page: http://<yarn-hostname>:8088/cluster/app/application_1623850591633_0042 Then click on links to logs of each attempt.
. Failing the application. (state=08S01,code=2) This happens irrespective of the size of the table. Any ideas about this? Thanks, Megh
... View more
Labels:
06-17-2021
11:38 PM
@dmharshit , When you run this query, does a YARN application ID get generated? Or the query fails before triggering the YARN application? In case YARN application is triggered, please get the logs of that particular YARN application and check for errors. yarn logs -applicationId your_application_id > your_application_id.log 2>&1 Check and see if you're able to get any detailed errors in this log file and share. Thanks, Megh
... View more
06-16-2021
07:15 AM
Managed to get it working. Since my environment is Kerberized, I need to give a jaas.conf file while running a command which potentially modifies any collection configuration. This jaas.conf file should include the solr principal configuration. In my case, the jaas.conf file looks like this. Client {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="/solr/keytab/path/solr.keytab"
storeKey=true
useTicketCache=false
principal="solr/<fully.qualified.domain.name>@<REALM_NAME>";
}; and ran the command like this: solrctl --jaas /jaas/file/path/jaas.conf instancedir --update ranger_audits /tmp/ranger_audits
solrctl --jaas /jaas/file/path/jaas.conf collection --reload ranger_audits And it was successful! This documentation article also helped. Thanks, Megh
... View more
06-16-2021
05:01 AM
Found this article in the documentation, but while executing this command solrctl instancedir --update ranger_audits /tmp/ranger_audits I'm getting this error. org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /configs/ranger_audits/managed-schema
at org.apache.zookeeper.KeeperException.create(KeeperException.java:120)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:1793)
at org.apache.solr.common.cloud.SolrZkClient.lambda$delete$1(SolrZkClient.java:263)
at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:71)
at org.apache.solr.common.cloud.SolrZkClient.delete(SolrZkClient.java:262)
at org.apache.solr.common.cloud.ZkMaintenanceUtils.lambda$clean$1(ZkMaintenanceUtils.java:240)
at org.apache.solr.common.cloud.ZkMaintenanceUtils.traverseZkTree(ZkMaintenanceUtils.java:417)
at org.apache.solr.common.cloud.ZkMaintenanceUtils.traverseZkTree(ZkMaintenanceUtils.java:413)
at org.apache.solr.common.cloud.ZkMaintenanceUtils.clean(ZkMaintenanceUtils.java:236)
at org.apache.solr.common.cloud.SolrZkClient.clean(SolrZkClient.java:791)
at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:279)
Error: can't delete configuration I have done kinit with solr principal still facing this error. This is the klist output. [root@<hostname> ~]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: solr/<fully.qualified.domain.name>@<REALM.NAME>
Valid starting Expires Service principal
2021-06-16T15:50:04 2021-06-17T01:50:04 krbtgt/<REALM.NAME>@<REALM.NAME>
renew until 2021-06-23T15:50:04 Any idea? Thanks, Megh
... View more