Member since
09-16-2021
386
Posts
53
Kudos Received
32
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
96 | 03-28-2025 03:47 AM | |
689 | 02-10-2025 08:58 AM | |
322 | 02-04-2025 06:33 AM | |
419 | 02-04-2025 05:55 AM | |
968 | 11-22-2024 05:29 AM |
04-21-2025
09:35 AM
did you mean below query works as expected? select b.proc,b.yr,b.RCNT from table2 b
WHERE length(b.PROC)=5 AND (b.ytype='1' or b.PROC IN (SELECT c.tet FROM table1 c)); if so , please share the complete query which is not working for better understanding.
... View more
04-21-2025
12:08 AM
yes , it's working in latest versions. Validated this in 7.1.9 SP1. (Exact Version - 3.1.3000.7.1.9.1000-103) Test Results Beeline version 3.1.3000.7.1.9.1000-103 by Apache Hive
0: jdbc:hive2://ccycloud-3.backlinehive.root.> CREATE TABLE table1 (
. . . . . . . . . . . . . . . . . . . . . . .> tet VARCHAR(10)
. . . . . . . . . . . . . . . . . . . . . . .> );
INFO : Compiling command(queryId=hive_20250421065209_f6d6c2a9-3746-4b99-9f89-3f00c62d2622): CREATE TABLE table1 (
tet VARCHAR(10)
)
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=hive_20250421065209_f6d6c2a9-3746-4b99-9f89-3f00c62d2622); Time taken: 0.505 seconds
INFO : Executing command(queryId=hive_20250421065209_f6d6c2a9-3746-4b99-9f89-3f00c62d2622): CREATE TABLE table1 (
tet VARCHAR(10)
)
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=hive_20250421065209_f6d6c2a9-3746-4b99-9f89-3f00c62d2622); Time taken: 1.623 seconds
INFO : OK
No rows affected (3.052 seconds)
0: jdbc:hive2://ccycloud-3.backlinehive.root.> INSERT INTO table1 (tet) VALUES
. . . . . . . . . . . . . . . . . . . . . . .> ('A1234'),
. . . . . . . . . . . . . . . . . . . . . . .> ('B5678'),
. . . . . . . . . . . . . . . . . . . . . . .> ('C9012');
INFO : Compiling command(queryId=hive_20250421065218_43806b25-c4d1-41ae-8a6e-2d020a7fa076): INSERT INTO table1 (tet) VALUES
('A1234'),
('B5678'),
('C9012')
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_col0, type:varchar(10), comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20250421065218_43806b25-c4d1-41ae-8a6e-2d020a7fa076); Time taken: 0.785 seconds
INFO : Executing command(queryId=hive_20250421065218_43806b25-c4d1-41ae-8a6e-2d020a7fa076): INSERT INTO table1 (tet) VALUES
('A1234'),
('B5678'),
('C9012')
INFO : Query ID = hive_20250421065218_43806b25-c4d1-41ae-8a6e-2d020a7fa076
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Subscribed to counters: [] for queryId: hive_20250421065218_43806b25-c4d1-41ae-8a6e-2d020a7fa076
INFO : Tez session hasn't been created yet. Opening session
INFO : Dag name: INSERT INTO table1 (t...('B5678'),
('C9012') (Stage-1)
INFO : HS2 Host: [ccycloud-3.backlinehive.root.comops.site], Query ID: [hive_20250421065218_43806b25-c4d1-41ae-8a6e-2d020a7fa076], Dag ID: [null], DAG Session ID: [null]
INFO : Status: Running (Executing on YARN cluster with App id application_1744700109711_0004)
INFO : Status: DAG finished successfully in 6.03 seconds
INFO : DAG ID: null
INFO :
INFO : Query Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : OPERATION DURATION
INFO : ----------------------------------------------------------------------------------------------
INFO : Compile Query 0.79s
INFO : Prepare Plan 0.47s
INFO : Get Query Coordinator (AM) 8.69s
INFO : Submit Plan 0.33s
INFO : Start DAG 0.21s
INFO : Run DAG 6.03s
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : Task Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS
INFO : ----------------------------------------------------------------------------------------------
INFO : Map 1 2396.00 4,370 65 3 1
INFO : Reducer 2 0.00 540 0 1 0
INFO : ----------------------------------------------------------------------------------------------
INFO : FileSystem Counters Summary
INFO :
INFO : Scheme: HDFS
INFO : ----------------------------------------------------------------------------------------------
INFO : VERTICES BYTES_READ READ_OPS LARGE_READ_OPS BYTES_WRITTEN WRITE_OPS
INFO : ----------------------------------------------------------------------------------------------
INFO : Map 1 0B 2 0 914B 4
INFO : Reducer 2 0B 2 0 145B 2
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : Scheme: FILE
INFO : ----------------------------------------------------------------------------------------------
INFO : VERTICES BYTES_READ READ_OPS LARGE_READ_OPS BYTES_WRITTEN WRITE_OPS
INFO : ----------------------------------------------------------------------------------------------
INFO : Map 1 0B 0 0 93B 0
INFO : Reducer 2 93B 0 0 0B 0
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : org.apache.tez.common.counters.DAGCounter:
INFO : NUM_SUCCEEDED_TASKS: 2
INFO : TOTAL_LAUNCHED_TASKS: 2
INFO : RACK_LOCAL_TASKS: 1
INFO : AM_CPU_MILLISECONDS: 4380
INFO : AM_GC_TIME_MILLIS: 80
INFO : INITIAL_HELD_CONTAINERS: 0
INFO : TOTAL_CONTAINERS_USED: 1
INFO : TOTAL_CONTAINER_ALLOCATION_COUNT: 1
INFO : TOTAL_CONTAINER_LAUNCH_COUNT: 1
INFO : TOTAL_CONTAINER_REUSE_COUNT: 1
INFO : File System Counters:
INFO : FILE_BYTES_READ: 93
INFO : FILE_BYTES_WRITTEN: 93
INFO : HDFS_BYTES_WRITTEN: 1059
INFO : HDFS_READ_OPS: 4
INFO : HDFS_WRITE_OPS: 6
INFO : HDFS_OP_CREATE: 4
INFO : HDFS_OP_GET_FILE_STATUS: 4
INFO : HDFS_OP_MKDIRS: 1
INFO : HDFS_OP_RENAME: 1
INFO : org.apache.tez.common.counters.TaskCounter:
INFO : SPILLED_RECORDS: 0
INFO : NUM_SHUFFLED_INPUTS: 1
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : GC_TIME_MILLIS: 65
INFO : TASK_DURATION_MILLIS: 2365
INFO : CPU_MILLISECONDS: 4910
INFO : PHYSICAL_MEMORY_BYTES: 4294967296
INFO : VIRTUAL_MEMORY_BYTES: 10710315008
INFO : COMMITTED_HEAP_BYTES: 4294967296
INFO : INPUT_RECORDS_PROCESSED: 5
INFO : INPUT_SPLIT_LENGTH_BYTES: 1
INFO : OUTPUT_RECORDS: 1
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_BYTES: 39
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 47
INFO : OUTPUT_BYTES_PHYSICAL: 85
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : SHUFFLE_BYTES: 61
INFO : SHUFFLE_BYTES_DECOMPRESSED: 47
INFO : SHUFFLE_BYTES_TO_MEM: 0
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_DISK_DIRECT: 61
INFO : SHUFFLE_PHASE_TIME: 39
INFO : FIRST_EVENT_RECEIVED: 20
INFO : LAST_EVENT_RECEIVED: 20
INFO : DATA_BYTES_VIA_EVENT: 0
INFO : HIVE:
INFO : CREATED_FILES: 2
INFO : DESERIALIZE_ERRORS: 0
INFO : RECORDS_IN_Map_1: 3
INFO : RECORDS_OUT_0: 1
INFO : RECORDS_OUT_1_default.table1: 3
INFO : RECORDS_OUT_INTERMEDIATE_Map_1: 1
INFO : RECORDS_OUT_INTERMEDIATE_Reducer_2: 0
INFO : RECORDS_OUT_OPERATOR_FS_12: 1
INFO : RECORDS_OUT_OPERATOR_FS_5: 3
INFO : RECORDS_OUT_OPERATOR_GBY_10: 1
INFO : RECORDS_OUT_OPERATOR_GBY_8: 1
INFO : RECORDS_OUT_OPERATOR_MAP_0: 0
INFO : RECORDS_OUT_OPERATOR_RS_9: 1
INFO : RECORDS_OUT_OPERATOR_SEL_1: 1
INFO : RECORDS_OUT_OPERATOR_SEL_3: 3
INFO : RECORDS_OUT_OPERATOR_SEL_7: 3
INFO : RECORDS_OUT_OPERATOR_TS_0: 1
INFO : RECORDS_OUT_OPERATOR_UDTF_2: 3
INFO : TOTAL_TABLE_ROWS_WRITTEN: 3
INFO : TaskCounter_Map_1_INPUT__dummy_table:
INFO : INPUT_RECORDS_PROCESSED: 4
INFO : INPUT_SPLIT_LENGTH_BYTES: 1
INFO : TaskCounter_Map_1_OUTPUT_Reducer_2:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : DATA_BYTES_VIA_EVENT: 0
INFO : OUTPUT_BYTES: 39
INFO : OUTPUT_BYTES_PHYSICAL: 85
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 47
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_RECORDS: 1
INFO : SPILLED_RECORDS: 0
INFO : TaskCounter_Reducer_2_INPUT_Map_1:
INFO : FIRST_EVENT_RECEIVED: 20
INFO : INPUT_RECORDS_PROCESSED: 1
INFO : LAST_EVENT_RECEIVED: 20
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : NUM_SHUFFLED_INPUTS: 1
INFO : SHUFFLE_BYTES: 61
INFO : SHUFFLE_BYTES_DECOMPRESSED: 47
INFO : SHUFFLE_BYTES_DISK_DIRECT: 61
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_TO_MEM: 0
INFO : SHUFFLE_PHASE_TIME: 39
INFO : TaskCounter_Reducer_2_OUTPUT_out_Reducer_2:
INFO : OUTPUT_RECORDS: 0
INFO : Starting task [Stage-2:DEPENDENCY_COLLECTION] in serial mode
INFO : Starting task [Stage-0:MOVE] in serial mode
INFO : Loading data to table default.table1 from hdfs://ccycloud-3.backlinehive.root.comops.site:8020/warehouse/tablespace/managed/hive/table1
INFO : Starting task [Stage-3:STATS] in serial mode
INFO : Executing stats task
INFO : Table default.table1 stats: [numFiles=1, numRows=3, totalSize=703, rawDataSize=0, numFilesErasureCoded=0]
INFO : Table default.table1 stats: [numFiles=1, numRows=3, totalSize=703, rawDataSize=0, numFilesErasureCoded=0]
INFO : Completed executing command(queryId=hive_20250421065218_43806b25-c4d1-41ae-8a6e-2d020a7fa076); Time taken: 16.333 seconds
INFO : OK
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 6.79 s
----------------------------------------------------------------------------------------------
3 rows affected (17.313 seconds)
0: jdbc:hive2://ccycloud-3.backlinehive.root.> CREATE TABLE table2 (
. . . . . . . . . . . . . . . . . . . . . . .> proc VARCHAR(10),
. . . . . . . . . . . . . . . . . . . . . . .> yr INT,
. . . . . . . . . . . . . . . . . . . . . . .> RCNT INT,
. . . . . . . . . . . . . . . . . . . . . . .> ytype CHAR(1)
. . . . . . . . . . . . . . . . . . . . . . .> );
INFO : Compiling command(queryId=hive_20250421065325_a9958421-5cc4-44af-9106-fa75cc6658ee): CREATE TABLE table2 (
proc VARCHAR(10),
yr INT,
RCNT INT,
ytype CHAR(1)
)
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=hive_20250421065325_a9958421-5cc4-44af-9106-fa75cc6658ee); Time taken: 0.059 seconds
INFO : Executing command(queryId=hive_20250421065325_a9958421-5cc4-44af-9106-fa75cc6658ee): CREATE TABLE table2 (
proc VARCHAR(10),
yr INT,
RCNT INT,
ytype CHAR(1)
)
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=hive_20250421065325_a9958421-5cc4-44af-9106-fa75cc6658ee); Time taken: 0.307 seconds
INFO : OK
No rows affected (0.517 seconds)
0: jdbc:hive2://ccycloud-3.backlinehive.root.> INSERT INTO table2 (proc, yr, RCNT, ytype) VALUES
. . . . . . . . . . . . . . . . . . . . . . .> ('A1234', 2022, 5, '1'),
. . . . . . . . . . . . . . . . . . . . . . .> ('B5678', 2023, 8, '0'),
. . . . . . . . . . . . . . . . . . . . . . .> ('C9012', 2021, 3, '1'),
. . . . . . . . . . . . . . . . . . . . . . .> ('D3456', 2020, 2, '0'),
. . . . . . . . . . . . . . . . . . . . . . .> ('E7890', 2022, 6, '2'),
. . . . . . . . . . . . . . . . . . . . . . .> ('X123', 2023, 1, '1');
INFO : Compiling command(queryId=hive_20250421065337_589f06f8-3506-4120-9b74-6e6dfff4f344): INSERT INTO table2 (proc, yr, RCNT, ytype) VALUES
('A1234', 2022, 5, '1'),
('B5678', 2023, 8, '0'),
('C9012', 2021, 3, '1'),
('D3456', 2020, 2, '0'),
('E7890', 2022, 6, '2'),
('X123', 2023, 1, '1')
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_col0, type:varchar(10), comment:null), FieldSchema(name:_col1, type:int, comment:null), FieldSchema(name:_col2, type:int, comment:null), FieldSchema(name:_col3, type:char(1), comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20250421065337_589f06f8-3506-4120-9b74-6e6dfff4f344); Time taken: 0.428 seconds
INFO : Executing command(queryId=hive_20250421065337_589f06f8-3506-4120-9b74-6e6dfff4f344): INSERT INTO table2 (proc, yr, RCNT, ytype) VALUES
('A1234', 2022, 5, '1'),
('B5678', 2023, 8, '0'),
('C9012', 2021, 3, '1'),
('D3456', 2020, 2, '0'),
('E7890', 2022, 6, '2'),
('X123', 2023, 1, '1')
INFO : Query ID = hive_20250421065337_589f06f8-3506-4120-9b74-6e6dfff4f344
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Subscribed to counters: [] for queryId: hive_20250421065337_589f06f8-3506-4120-9b74-6e6dfff4f344
INFO : Session is already open
INFO : Dag name: INSERT INTO table2 (proc, yr, RCNT, y...'1') (Stage-1)
INFO : HS2 Host: [ccycloud-3.backlinehive.root.comops.site], Query ID: [hive_20250421065337_589f06f8-3506-4120-9b74-6e6dfff4f344], Dag ID: [null], DAG Session ID: [null]
INFO : Status: Running (Executing on YARN cluster with App id application_1744700109711_0004)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 10.06 s
----------------------------------------------------------------------------------------------
INFO : Status: DAG finished successfully in 9.23 seconds
INFO : DAG ID: null
INFO :
INFO : Query Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : OPERATION DURATION
INFO : ----------------------------------------------------------------------------------------------
INFO : Compile Query 0.43s
INFO : Prepare Plan 0.65s
INFO : Get Query Coordinator (AM) 0.01s
INFO : Submit Plan 0.11s
INFO : Start DAG 0.06s
INFO : Run DAG 9.23s
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : Task Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS
INFO : ----------------------------------------------------------------------------------------------
INFO : Map 1 4895.00 5,570 58 3 1
INFO : Reducer 2 277.00 580 0 1 0
INFO : ----------------------------------------------------------------------------------------------
INFO : FileSystem Counters Summary
INFO :
INFO : Scheme: HDFS
INFO : ----------------------------------------------------------------------------------------------
INFO : VERTICES BYTES_READ READ_OPS LARGE_READ_OPS BYTES_WRITTEN WRITE_OPS
INFO : ----------------------------------------------------------------------------------------------
INFO : Map 1 0B 2 0 1.18KB 4
INFO : Reducer 2 0B 2 0 332B 2
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : Scheme: FILE
INFO : ----------------------------------------------------------------------------------------------
INFO : VERTICES BYTES_READ READ_OPS LARGE_READ_OPS BYTES_WRITTEN WRITE_OPS
INFO : ----------------------------------------------------------------------------------------------
INFO : Map 1 0B 0 0 226B 0
INFO : Reducer 2 226B 0 0 0B 0
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : org.apache.tez.common.counters.DAGCounter:
INFO : NUM_SUCCEEDED_TASKS: 2
INFO : TOTAL_LAUNCHED_TASKS: 2
INFO : RACK_LOCAL_TASKS: 1
INFO : AM_CPU_MILLISECONDS: 900
INFO : AM_GC_TIME_MILLIS: 0
INFO : INITIAL_HELD_CONTAINERS: 0
INFO : TOTAL_CONTAINERS_USED: 1
INFO : TOTAL_CONTAINER_ALLOCATION_COUNT: 1
INFO : TOTAL_CONTAINER_LAUNCH_COUNT: 1
INFO : TOTAL_CONTAINER_REUSE_COUNT: 1
INFO : File System Counters:
INFO : FILE_BYTES_READ: 226
INFO : FILE_BYTES_WRITTEN: 226
INFO : HDFS_BYTES_WRITTEN: 1510
INFO : HDFS_READ_OPS: 4
INFO : HDFS_WRITE_OPS: 6
INFO : HDFS_OP_CREATE: 4
INFO : HDFS_OP_GET_FILE_STATUS: 4
INFO : HDFS_OP_MKDIRS: 1
INFO : HDFS_OP_RENAME: 1
INFO : org.apache.tez.common.counters.TaskCounter:
INFO : SPILLED_RECORDS: 0
INFO : NUM_SHUFFLED_INPUTS: 1
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : GC_TIME_MILLIS: 58
INFO : TASK_DURATION_MILLIS: 5430
INFO : CPU_MILLISECONDS: 6150
INFO : PHYSICAL_MEMORY_BYTES: 4294967296
INFO : VIRTUAL_MEMORY_BYTES: 10715058176
INFO : COMMITTED_HEAP_BYTES: 4294967296
INFO : INPUT_RECORDS_PROCESSED: 5
INFO : INPUT_SPLIT_LENGTH_BYTES: 1
INFO : OUTPUT_RECORDS: 1
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_BYTES: 174
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 183
INFO : OUTPUT_BYTES_PHYSICAL: 218
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : SHUFFLE_BYTES: 194
INFO : SHUFFLE_BYTES_DECOMPRESSED: 183
INFO : SHUFFLE_BYTES_TO_MEM: 0
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_DISK_DIRECT: 194
INFO : SHUFFLE_PHASE_TIME: 106
INFO : FIRST_EVENT_RECEIVED: 91
INFO : LAST_EVENT_RECEIVED: 91
INFO : DATA_BYTES_VIA_EVENT: 0
INFO : HIVE:
INFO : CREATED_FILES: 2
INFO : DESERIALIZE_ERRORS: 0
INFO : RECORDS_IN_Map_1: 3
INFO : RECORDS_OUT_0: 1
INFO : RECORDS_OUT_1_default.table2: 6
INFO : RECORDS_OUT_INTERMEDIATE_Map_1: 1
INFO : RECORDS_OUT_INTERMEDIATE_Reducer_2: 0
INFO : RECORDS_OUT_OPERATOR_FS_12: 1
INFO : RECORDS_OUT_OPERATOR_FS_5: 6
INFO : RECORDS_OUT_OPERATOR_GBY_10: 1
INFO : RECORDS_OUT_OPERATOR_GBY_8: 1
INFO : RECORDS_OUT_OPERATOR_MAP_0: 0
INFO : RECORDS_OUT_OPERATOR_RS_9: 1
INFO : RECORDS_OUT_OPERATOR_SEL_1: 1
INFO : RECORDS_OUT_OPERATOR_SEL_3: 6
INFO : RECORDS_OUT_OPERATOR_SEL_7: 6
INFO : RECORDS_OUT_OPERATOR_TS_0: 1
INFO : RECORDS_OUT_OPERATOR_UDTF_2: 6
INFO : TOTAL_TABLE_ROWS_WRITTEN: 6
INFO : TaskCounter_Map_1_INPUT__dummy_table:
INFO : INPUT_RECORDS_PROCESSED: 4
INFO : INPUT_SPLIT_LENGTH_BYTES: 1
INFO : TaskCounter_Map_1_OUTPUT_Reducer_2:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : DATA_BYTES_VIA_EVENT: 0
INFO : OUTPUT_BYTES: 174
INFO : OUTPUT_BYTES_PHYSICAL: 218
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 183
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_RECORDS: 1
INFO : SPILLED_RECORDS: 0
INFO : TaskCounter_Reducer_2_INPUT_Map_1:
INFO : FIRST_EVENT_RECEIVED: 91
INFO : INPUT_RECORDS_PROCESSED: 1
INFO : LAST_EVENT_RECEIVED: 91
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : NUM_SHUFFLED_INPUTS: 1
INFO : SHUFFLE_BYTES: 194
INFO : SHUFFLE_BYTES_DECOMPRESSED: 183
INFO : SHUFFLE_BYTES_DISK_DIRECT: 194
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_TO_MEM: 0
INFO : SHUFFLE_PHASE_TIME: 106
INFO : TaskCounter_Reducer_2_OUTPUT_out_Reducer_2:
INFO : OUTPUT_RECORDS: 0
INFO : Starting task [Stage-2:DEPENDENCY_COLLECTION] in serial mode
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0 ilesErasureCoded=0]
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 10.08 s
----------------------------------------------------------------------------------------------
6 rows affected (11.374 seconds)
0: jdbc:hive2://ccycloud-3.backlinehive.root.> select b.proc,b.yr,b.RCNT from table2 b
. . . . . . . . . . . . . . . . . . . . . . .> WHERE length(b.PROC)=5 AND (b.ytype='1' or b.PROC IN (SELECT c.tet FROM table1 c));
INFO : Compiling command(queryId=hive_20250421065359_2400b8d8-d1ff-47cb-b352-c993d8543abe): select b.proc,b.yr,b.RCNT from table2 b
WHERE length(b.PROC)=5 AND (b.ytype='1' or b.PROC IN (SELECT c.tet FROM table1 c))
INFO : Warning: Map Join MAPJOIN[33][bigTable=?] in task 'Map 1' is a cross product
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:b.proc, type:varchar(10), comment:null), FieldSchema(name:b.yr, type:int, comment:null), FieldSchema(name:b.rcnt, type:int, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20250421065359_2400b8d8-d1ff-47cb-b352-c993d8543abe); Time taken: 6.068 seconds
INFO : Executing command(queryId=hive_20250421065359_2400b8d8-d1ff-47cb-b352-c993d8543abe): select b.proc,b.yr,b.RCNT from table2 b
WHERE length(b.PROC)=5 AND (b.ytype='1' or b.PROC IN (SELECT c.tet FROM table1 c))
INFO : Query ID = hive_20250421065359_2400b8d8-d1ff-47cb-b352-c993d8543abe
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Subscribed to counters: [] for queryId: hive_20250421065359_2400b8d8-d1ff-47cb-b352-c993d8543abe
INFO : Session is already open
INFO : Dag name: select b.proc,b.yr,b.RCNT from table2 ...c)) (Stage-1)
INFO : Setting tez.task.scale.memory.reserve-fraction to 0.30000001192092896
INFO : HS2 Host: [ccycloud-3.backlinehive.root.comops.site], Query ID: [hive_20250421065359_2400b8d8-d1ff-47cb-b352-c993d8543abe], Dag ID: [null], DAG Session ID: [null]
INFO : Status: Running (Executing on YARN cluster with App id application_1744700109711_0004)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 2 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 3 ...... container SUCCEEDED 1 1 0 0 0 0
Map 4 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 5 ...... container SUCCEEDED 2 2 0 0 0 0
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 05/05 [==========================>>] 100% ELAPSED TIME: 5.11 s
----------------------------------------------------------------------------------------------
INFO : Status: DAG finished successfully in 4.85 seconds
INFO : DAG ID: null
INFO :
INFO : Query Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : OPERATION DURATION
INFO : ----------------------------------------------------------------------------------------------
INFO : Compile Query 6.07s
INFO : Prepare Plan 0.43s
INFO : Get Query Coordinator (AM) 0.01s
INFO : Submit Plan 0.05s
INFO : Start DAG 0.58s
INFO : Run DAG 4.86s
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : Task Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS
INFO : ----------------------------------------------------------------------------------------------
INFO : Map 1 2784.00 5,680 53 10 0
INFO : Map 2 2058.00 3,120 67 3 1
INFO : Map 4 2057.00 4,550 57 3 3
INFO : Reducer 3 1.00 300 0 1 1
INFO : Reducer 5 515.00 750 0 3 3
INFO : ----------------------------------------------------------------------------------------------
INFO : FileSystem Counters Summary
INFO :
INFO : Scheme: HDFS
INFO : ----------------------------------------------------------------------------------------------
INFO : VERTICES BYTES_READ READ_OPS LARGE_READ_OPS BYTES_WRITTEN WRITE_OPS
INFO : ----------------------------------------------------------------------------------------------
INFO : Map 1 1.29KB 4 0 162B 2
INFO : Map 2 775B 2 0 0B 0
INFO : Map 4 855B 2 0 0B 0
INFO : Reducer 3 0B 0 0 0B 0
INFO : Reducer 5 0B 0 0 0B 0
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : Scheme: FILE
INFO : ----------------------------------------------------------------------------------------------
INFO : VERTICES BYTES_READ READ_OPS LARGE_READ_OPS BYTES_WRITTEN WRITE_OPS
INFO : ----------------------------------------------------------------------------------------------
INFO : Map 1 0B 0 0 0B 0
INFO : Map 2 0B 0 0 57B 0
INFO : Map 4 112B 0 0 106B 0
INFO : Reducer 3 57B 0 0 56B 0
INFO : Reducer 5 50B 0 0 184B 0
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : org.apache.tez.common.counters.DAGCounter:
INFO : NUM_SUCCEEDED_TASKS: 6
INFO : TOTAL_LAUNCHED_TASKS: 6
INFO : DATA_LOCAL_TASKS: 3
INFO : AM_CPU_MILLISECONDS: 2650
INFO : AM_GC_TIME_MILLIS: 38
INFO : INITIAL_HELD_CONTAINERS: 0
INFO : TOTAL_CONTAINERS_USED: 3
INFO : TOTAL_CONTAINER_ALLOCATION_COUNT: 3
INFO : TOTAL_CONTAINER_LAUNCH_COUNT: 3
INFO : TOTAL_CONTAINER_REUSE_COUNT: 3
INFO : File System Counters:
INFO : FILE_BYTES_READ: 219
INFO : FILE_BYTES_WRITTEN: 403
INFO : HDFS_BYTES_READ: 2915
INFO : HDFS_BYTES_WRITTEN: 162
INFO : HDFS_READ_OPS: 8
INFO : HDFS_WRITE_OPS: 2
INFO : HDFS_OP_CREATE: 1
INFO : HDFS_OP_GET_FILE_STATUS: 2
INFO : HDFS_OP_OPEN: 6
INFO : HDFS_OP_RENAME: 1
INFO : org.apache.tez.common.counters.TaskCounter:
INFO : REDUCE_INPUT_GROUPS: 3
INFO : REDUCE_INPUT_RECORDS: 3
INFO : COMBINE_INPUT_RECORDS: 0
INFO : SPILLED_RECORDS: 6
INFO : NUM_SHUFFLED_INPUTS: 4
INFO : NUM_SKIPPED_INPUTS: 1
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : MERGED_MAP_OUTPUTS: 1
INFO : GC_TIME_MILLIS: 177
INFO : TASK_DURATION_MILLIS: 7326
INFO : CPU_MILLISECONDS: 14400
INFO : PHYSICAL_MEMORY_BYTES: 12884901888
INFO : VIRTUAL_MEMORY_BYTES: 32153378816
INFO : COMMITTED_HEAP_BYTES: 12884901888
INFO : INPUT_RECORDS_PROCESSED: 8
INFO : INPUT_SPLIT_LENGTH_BYTES: 1090
INFO : OUTPUT_RECORDS: 8
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_BYTES: 56
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 96
INFO : OUTPUT_BYTES_PHYSICAL: 221
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 50
INFO : ADDITIONAL_SPILLS_BYTES_READ: 50
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : SHUFFLE_CHUNK_COUNT: 1
INFO : SHUFFLE_BYTES: 149
INFO : SHUFFLE_BYTES_DECOMPRESSED: 96
INFO : SHUFFLE_BYTES_TO_MEM: 124
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_DISK_DIRECT: 25
INFO : NUM_MEM_TO_DISK_MERGES: 0
INFO : NUM_DISK_TO_DISK_MERGES: 0
INFO : SHUFFLE_PHASE_TIME: 3543
INFO : MERGE_PHASE_TIME: 271
INFO : FIRST_EVENT_RECEIVED: 2837
INFO : LAST_EVENT_RECEIVED: 3139
INFO : DATA_BYTES_VIA_EVENT: 0
INFO : HIVE:
INFO : CREATED_FILES: 1
INFO : DESERIALIZE_ERRORS: 0
INFO : RECORDS_IN_Map_1: 6
INFO : RECORDS_IN_Map_2: 3
INFO : RECORDS_IN_Map_4: 3
INFO : RECORDS_OUT_0: 3
INFO : RECORDS_OUT_INTERMEDIATE_Map_1: 0
INFO : RECORDS_OUT_INTERMEDIATE_Map_2: 1
INFO : RECORDS_OUT_INTERMEDIATE_Map_4: 3
INFO : RECORDS_OUT_INTERMEDIATE_Reducer_3: 1
INFO : RECORDS_OUT_INTERMEDIATE_Reducer_5: 3
INFO : RECORDS_OUT_OPERATOR_FIL_42: 3
INFO : RECORDS_OUT_OPERATOR_FIL_48: 5
INFO : RECORDS_OUT_OPERATOR_FIL_52: 3
INFO : RECORDS_OUT_OPERATOR_FS_54: 3
INFO : RECORDS_OUT_OPERATOR_GBY_38: 1
INFO : RECORDS_OUT_OPERATOR_GBY_40: 1
INFO : RECORDS_OUT_OPERATOR_GBY_43: 3
INFO : RECORDS_OUT_OPERATOR_GBY_45: 3
INFO : RECORDS_OUT_OPERATOR_MAPJOIN_50: 5
INFO : RECORDS_OUT_OPERATOR_MAPJOIN_51: 5
INFO : RECORDS_OUT_OPERATOR_MAP_0: 0
INFO : RECORDS_OUT_OPERATOR_RS_39: 1
INFO : RECORDS_OUT_OPERATOR_RS_41: 1
INFO : RECORDS_OUT_OPERATOR_RS_44: 3
INFO : RECORDS_OUT_OPERATOR_RS_47: 3
INFO : RECORDS_OUT_OPERATOR_SEL_37: 3
INFO : RECORDS_OUT_OPERATOR_SEL_46: 3
INFO : RECORDS_OUT_OPERATOR_SEL_49: 5
INFO : RECORDS_OUT_OPERATOR_SEL_53: 3
INFO : RECORDS_OUT_OPERATOR_TS_0: 6
INFO : RECORDS_OUT_OPERATOR_TS_3: 3
INFO : RECORDS_OUT_OPERATOR_TS_9: 3
INFO : Shuffle Errors:
INFO : BAD_ID: 0
INFO : CONNECTION: 0
INFO : IO_ERROR: 0
INFO : WRONG_LENGTH: 0
INFO : WRONG_MAP: 0
INFO : WRONG_REDUCE: 0
INFO : Shuffle Errors_Reducer_5_INPUT_Map_4:
INFO : BAD_ID: 0
INFO : CONNECTION: 0
INFO : IO_ERROR: 0
INFO : WRONG_LENGTH: 0
INFO : WRONG_MAP: 0
INFO : WRONG_REDUCE: 0
INFO : TaskCounter_Map_1_INPUT_Reducer_3:
INFO : FIRST_EVENT_RECEIVED: 1170
INFO : INPUT_RECORDS_PROCESSED: 1
INFO : LAST_EVENT_RECEIVED: 1170
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : NUM_SHUFFLED_INPUTS: 1
INFO : SHUFFLE_BYTES: 24
INFO : SHUFFLE_BYTES_DECOMPRESSED: 10
INFO : SHUFFLE_BYTES_DISK_DIRECT: 0
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_TO_MEM: 24
INFO : SHUFFLE_PHASE_TIME: 1340
INFO : TaskCounter_Map_1_INPUT_Reducer_5:
INFO : FIRST_EVENT_RECEIVED: 1576
INFO : INPUT_RECORDS_PROCESSED: 3
INFO : LAST_EVENT_RECEIVED: 1878
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : NUM_SHUFFLED_INPUTS: 1
INFO : SHUFFLE_BYTES: 50
INFO : SHUFFLE_BYTES_DECOMPRESSED: 39
INFO : SHUFFLE_BYTES_DISK_DIRECT: 0
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_TO_MEM: 50
INFO : SHUFFLE_PHASE_TIME: 1913
INFO : TaskCounter_Map_1_INPUT_b:
INFO : INPUT_RECORDS_PROCESSED: 1
INFO : INPUT_SPLIT_LENGTH_BYTES: 474
INFO : TaskCounter_Map_1_OUTPUT_out_Map_1:
INFO : OUTPUT_RECORDS: 0
INFO : TaskCounter_Map_2_INPUT_c:
INFO : INPUT_RECORDS_PROCESSED: 1
INFO : INPUT_SPLIT_LENGTH_BYTES: 308
INFO : TaskCounter_Map_2_OUTPUT_Reducer_3:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : DATA_BYTES_VIA_EVENT: 0
INFO : OUTPUT_BYTES: 3
INFO : OUTPUT_BYTES_PHYSICAL: 49
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 11
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_RECORDS: 1
INFO : SPILLED_RECORDS: 0
INFO : TaskCounter_Map_4_INPUT_c:
INFO : INPUT_RECORDS_PROCESSED: 1
INFO : INPUT_SPLIT_LENGTH_BYTES: 308
INFO : TaskCounter_Map_4_OUTPUT_Reducer_5:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : OUTPUT_BYTES: 24
INFO : OUTPUT_BYTES_PHYSICAL: 50
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 36
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_RECORDS: 3
INFO : SHUFFLE_CHUNK_COUNT: 1
INFO : SPILLED_RECORDS: 3
INFO : TaskCounter_Reducer_3_INPUT_Map_2:
INFO : FIRST_EVENT_RECEIVED: 23
INFO : INPUT_RECORDS_PROCESSED: 1
INFO : LAST_EVENT_RECEIVED: 23
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : NUM_SHUFFLED_INPUTS: 1
INFO : SHUFFLE_BYTES: 25
INFO : SHUFFLE_BYTES_DECOMPRESSED: 11
INFO : SHUFFLE_BYTES_DISK_DIRECT: 25
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_TO_MEM: 0
INFO : SHUFFLE_PHASE_TIME: 42
INFO : TaskCounter_Reducer_3_OUTPUT_Map_1:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : DATA_BYTES_VIA_EVENT: 0
INFO : OUTPUT_BYTES: 2
INFO : OUTPUT_BYTES_PHYSICAL: 48
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 10
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_RECORDS: 1
INFO : SPILLED_RECORDS: 0
INFO : TaskCounter_Reducer_5_INPUT_Map_4:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 50
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 50
INFO : COMBINE_INPUT_RECORDS: 0
INFO : FIRST_EVENT_RECEIVED: 68
INFO : LAST_EVENT_RECEIVED: 68
INFO : MERGED_MAP_OUTPUTS: 1
INFO : MERGE_PHASE_TIME: 271
INFO : NUM_DISK_TO_DISK_MERGES: 0
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : NUM_MEM_TO_DISK_MERGES: 0
INFO : NUM_SHUFFLED_INPUTS: 1
INFO : NUM_SKIPPED_INPUTS: 1
INFO : REDUCE_INPUT_GROUPS: 3
INFO : REDUCE_INPUT_RECORDS: 3
INFO : SHUFFLE_BYTES: 50
INFO : SHUFFLE_BYTES_DECOMPRESSED: 36
INFO : SHUFFLE_BYTES_DISK_DIRECT: 0
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_TO_MEM: 50
INFO : SHUFFLE_PHASE_TIME: 248
INFO : SPILLED_RECORDS: 3
INFO : TaskCounter_Reducer_5_OUTPUT_Map_1:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : DATA_BYTES_VIA_EVENT: 0
INFO : OUTPUT_BYTES: 27
INFO : OUTPUT_BYTES_PHYSICAL: 74
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 39
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_RECORDS: 3
INFO : SPILLED_RECORDS: 0
INFO : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters:
INFO : GROUPED_INPUT_SPLITS_Map_1: 1
INFO : GROUPED_INPUT_SPLITS_Map_2: 1
INFO : GROUPED_INPUT_SPLITS_Map_4: 1
INFO : INPUT_DIRECTORIES_Map_1: 1
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 2 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 3 ...... container SUCCEEDED 1 1 0 0 0 0
Map 4 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 5 ...... container SUCCEEDED 2 2 0 0 0 0
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 05/05 [==========================>>] 100% ELAPSED TIME: 5.72 s
----------------------------------------------------------------------------------------------
+---------+-------+---------+
| b.proc | b.yr | b.rcnt |
+---------+-------+---------+
| A1234 | 2022 | 5 |
| B5678 | 2023 | 8 |
| C9012 | 2021 | 3 |
+---------+-------+---------+
3 rows selected (13.47 seconds)
0: jdbc:hive2://ccycloud-3.backlinehive.root.>
... View more
04-10-2025
10:44 PM
To provide the exact HQL query, Please share the following : DDL for both the tables Sample records from each table The expected output based on the sample data. The above information will help to understand the problem statement better and validate the solution.
... View more
03-28-2025
03:47 AM
* This has been addressed as part of support case. * Tez job failed with below error. Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 104
Serialization trace:
columnTypeResolvers (org.apache.hadoop.hive.ql.exec.UnionOperator)
tableDesc (org.apache.hadoop.hive.ql.plan.PartitionDesc)
aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:186)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:219)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$PartitionDescSerializer.read(SerializationUtilities.java:580)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$PartitionDescSerializer.read(SerializationUtilities.java:572)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:181)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:161)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:39)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:219)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:709)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectByKryo(SerializationUtilities.java:755)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:661)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:638)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:492)
... 22 more * Serialization related jars loaded from different version of hive-exec (hive-exec-<version>.jar) * Remove older version of jars from the HS2 classpath and aux jars to overcome the problem.
... View more
03-25-2025
04:12 AM
If the Beeline shell gets stuck, first validate whether HS2 is up and running. Then, check if the query reaches HS2. If the query reaches HS2 but gets stuck, analyze the HS2 JSTACK and HS2 logs to identify the issue. If the query does not reach HS2, validate the Beeline JSTACK, HS2 JSTACK, and HS2 logs. If you are unable to determine the root cause with this information, I recommend raising a support ticket for further investigation.
... View more
03-21-2025
06:29 AM
To identify which user is writing the files, use HDFS CLI commands such as ls or getfacl
... View more
03-06-2025
07:42 AM
Assuming it's a MapReduce job, since you're looking for information related to MapReduce I/O counters. Script to calculate the counter info. [hive@node4 ~]$ cat get_io_counters.sh
#!/bin/bash
# Ensure a job ID is provided
if [ "$#" -ne 1 ]; then
echo "Usage: $0 <job_id>"
exit 1
fi
JOB_ID=$1
# Extract I/O counters from the MapReduce job status
mapred job -status "$JOB_ID" | egrep -A 1 'File Input Format Counters|File Output Format Counters' | awk -F'=' '
/File Input Format Counters/ {getline; bytes_read=$2}
/File Output Format Counters/ {getline; bytes_written=$2}
END {
total_io_mb = (bytes_read + bytes_written) / (1024 * 1024)
printf "BYTES_READ=%d\nBYTES_WRITTEN=%d\nTOTAL_IO_MB=%.2f\n", bytes_read, bytes_written, total_io_mb
}'
[hive@node4 ~]$ Sample Output [hive@node4 ~]$ ./get_io_counters.sh job_1741272271547_0007
25/03/06 15:38:34 INFO client.RMProxy: Connecting to ResourceManager at node3.playground-ggangadharan.coelab.cloudera.com/10.129.117.75:8032
25/03/06 15:38:35 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
BYTES_READ=288894
BYTES_WRITTEN=348894
TOTAL_IO_MB=0.61
[hive@node4 ~]$
... View more
02-10-2025
08:58 AM
we can use UDF to solve this problem. Sharing sample one for a example. from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark.sql.functions import *
from pyspark.sql import Row
# ... (schema definition remains the same)
# Define the schema
schema = StructType(
[
StructField(
"meta",
StructType(
[
StructField("id", StringType(), False),
StructField("timestamp", TimestampType(), False),
StructField("version", IntegerType(), False),
]
),
False,
),
StructField(
"tasks",
ArrayType(
StructType(
[
StructField("data_col_1", IntegerType(), False),
StructField("data_col_2", StringType(), False),
StructField("data_col_3", IntegerType(), True),
StructField("data_col_4", IntegerType(), True),
]
)
),
False,
),
]
)
def validate_record(row):
try:
# More efficient way to check against schema
row_dict = row.asDict() # Convert Row to dictionary
Row(**row_dict) #Create Row object from Dictionary. This will validate against schema.
# Additional checks within tasks array (same as before)
for task in row.tasks:
if task.data_col_1 is None or not isinstance(task.data_col_1, int):
return False, "data_col_1 is missing or invalid in a task"
if task.data_col_2 is None or not isinstance(task.data_col_2, str):
return False, "data_col_2 is missing or invalid in a task"
if task.data_col_3 is not None and not isinstance(task.data_col_3, int):
return False, "data_col_3 is invalid in a task"
if task.data_col_4 is not None and not isinstance(task.data_col_4, int):
return False, "data_col_4 is invalid in a task"
return True, None # All checks passed
except Exception as e:
return False, str(e) # Mark as invalid with the exception message
# Read JSON files with PERMISSIVE mode to capture corrupt records
raw_df = (
spark.read.schema(schema)
.option("mode", "PERMISSIVE")
.option("multiline", "true")
.json("/tmp/json_data")
.withColumn("src_filename", input_file_name())
)
# Apply validation using a UDF for better performance
validate_udf = udf(validate_record, StructType([StructField("is_valid", BooleanType()), StructField("error_message", StringType())]))
validated_df = raw_df.withColumn("validation_results", validate_udf(struct(*raw_df.columns))).select("*", "validation_results.*").drop("validation_results")
# Separate valid and invalid records (same as before)
valid_df = validated_df.filter("is_valid == true").drop("is_valid", "error_message")
invalid_df = validated_df.filter("is_valid == false").drop("is_valid")
# Show the results
valid_df.show(truncate=False)
invalid_df.show(truncate=False) Attaching sample cluster console output for reference. >>> from pyspark.sql import SparkSession
>>> from pyspark.sql.types import *
>>> from pyspark.sql.functions import *
>>> from pyspark.sql import Row
>>>
>>> # Define the schema
... schema = StructType(
... [
... StructField(
... "meta",
... StructType(
... [
... StructField("id", StringType(), False),
... StructField("timestamp", TimestampType(), False),
... StructField("version", IntegerType(), False),
... ]
... ),
... False,
... ),
... StructField(
... "tasks",
... ArrayType(
... StructType(
... [
... StructField("data_col_1", IntegerType(), False),
... StructField("data_col_2", StringType(), False),
... StructField("data_col_3", IntegerType(), True),
... StructField("data_col_4", IntegerType(), True),
... ]
... )
... ),
... False,
... ),
... ]
... )
>>>
>>>
>>> def validate_record(row):
... try:
... # Check meta (top level)
... meta_dict = row.meta.asDict() if row.meta else {} # Handle potential null meta
... Row(**meta_dict) #Validate meta against schema.
... validated_tasks = []
... for task in row.tasks:
... try:
... task_dict = task.asDict()
... Row(**task_dict) #Validate each task against the schema.
... validated_tasks.append(task) #If valid add it to validated task list
... except:
... validated_tasks.append(Row(data_col_1=None, data_col_2=None, data_col_3=None, data_col_4=None, _corrupt_record="Invalid Task")) # Append a "corrupted" task with nulls.
... return True, None, validated_tasks # Return validated tasks
... except Exception as e:
... return False, str(e), [] # Mark as invalid with the exception message and empty tasks
...
>>>
>>>
>>> raw_df = (
... spark.read.schema(schema)
... .option("mode", "PERMISSIVE")
... .option("multiline", "true")
... .json("/tmp/json_data")
... .withColumn("src_filename", input_file_name())
... )
>>> validate_udf = udf(validate_record, StructType([StructField("is_valid", BooleanType()), StructField("error_message", StringType()), StructField("validated_tasks", ArrayType(schema.fields[1].dataType.elementType))]))
>>>
>>> validated_df = raw_df.withColumn("validation_results", validate_udf(struct(*raw_df.columns))).select("*", "validation_results.*").drop("validation_results")
>>> valid_df = validated_df.filter("is_valid == true").drop("is_valid", "error_message").withColumn("tasks", col("validated_tasks")).drop("validated_tasks")
>>> invalid_df = validated_df.filter("is_valid == false").drop("is_valid", "validated_tasks")
>>> valid_df.show(truncate=False)
+--------------------------------------+---------------------------------------------+--------------------------------------------------------------------------------+
|meta |tasks |src_filename |
+--------------------------------------+---------------------------------------------+--------------------------------------------------------------------------------+
|[efgh5678, 2025-02-07 07:59:12.123, 1]|[[0, Required, 9, 7], [22, Required, 10, 11]]|hdfs://ccycloud-2.nightly-71x-ms.root.comops.site:8020/tmp/json_data/json_2.json|
+--------------------------------------+---------------------------------------------+--------------------------------------------------------------------------------+
>>> invalid_df.show(truncate=False)
+----+-----+--------------------------------------------------------------------------------+---------------------------------+
|meta|tasks|src_filename |error_message |
+----+-----+--------------------------------------------------------------------------------+---------------------------------+
|null|null |hdfs://ccycloud-2.nightly-71x-ms.root.comops.site:8020/tmp/json_data/json_1.json|'NoneType' object is not iterable|
+----+-----+--------------------------------------------------------------------------------+---------------------------------+
>>> Depending upon the use-case , feel free to edit the UDF.
... View more
02-05-2025
12:48 AM
If the environment allows , use SSSD with LDAP integration to avoid manually creating Users. If that's not possible , use Ansible to automate user creation across all nodes.
... View more
02-04-2025
06:35 AM
Check beeline console output and HS2 logs to identify where it gets stuck and act accordingly.
... View more