Member since
04-15-2020
3
Posts
0
Kudos Received
0
Solutions
02-17-2021
09:27 AM
thanks @Prakashcit looks the issue with timestamp column, we have int96 format timestamp column with millisecond precision and that performance is 10 time slower compared to a parquet with same column as string value or even with stripped of millisecond timestamp column. We are still investigating what's causing this behavior. For example a parquet with 2 columns date_time and value, Query with timestamp column with milliseconds ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... llap SUCCEEDED 10 10 0 0 0 0 Reducer 2 ...... llap SUCCEEDED 1 1 0 0 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 179.45 s query with timestamp column value as string or timestamp column without milliseconds ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... llap SUCCEEDED 9 9 0 0 0 0 Reducer 2 ...... llap SUCCEEDED 1 1 0 0 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 13.77 s ----------------------------------------------------------------------------------------------
... View more
01-29-2021
08:59 AM
2021-01-29T07:44:33,325 INFO [TezTR-849561_1763_2_0_0_0 (1611687849561_1763_2_00_000000_0)] exec.MapOperator: MAP[0]: records read - 100000
2021-01-29T07:44:35,116 INFO [TezTR-849561_1763_2_0_0_0 (1611687849561_1763_2_00_000000_0)] exec.MapOperator: MAP[0]: records read - 1000000
2021-01-29T07:46:52,194 INFO [TezTR-849561_1763_2_0_0_0 (1611687849561_1763_2_00_000000_0)] exec.MapOperator: MAP[0]: records read - 10000000 TezMapper takes almost 2min+ to read data whn the parquet file size in S3 is more than 10Mn+ Any suggestion on how to optimize it be faster?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez