Member since
03-22-2019
24
Posts
4
Kudos Received
3
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1504 | 02-24-2020 10:37 AM | |
| 2716 | 03-25-2019 09:20 AM |
02-24-2020
10:37 AM
Something like this should work. It should just be a matter of using the correct string manipulation functions: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_string_functions.html create table test1 (col1 string); insert into table test1 values ("IT Strategy& Architecture BDC India [MITX 999]"), ("Corporate & IC Solution Delivery [SVII]"), ("Operations Solution Delivery [SVIA]"), ("Mainframe Service [MLEM]"), ("Strategy & Architecture [MLEL]"); select * from test1; +------------------------------------------------+ | col1 | +------------------------------------------------+ | IT Strategy& Architecture BDC India [MITX 999] | | Corporate & IC Solution Delivery [SVII] | | Operations Solution Delivery [SVIA] | | Mainframe Service [MLEM] | | Strategy & Architecture [MLEL] | +------------------------------------------------+ create table test2 as select trim(split_part(col1, ' [', 1)), trim(concat(' [', split_part(col1, ' [', 2))) fr om test1; select * from test2; +-------------------------------------+------------+ | _c0 | _c1 | +-------------------------------------+------------+ | IT Strategy& Architecture BDC India | [MITX 999] | | Corporate & IC Solution Delivery | [SVII] | | Operations Solution Delivery | [SVIA] | | Mainframe Service | [MLEM] | | Strategy & Architecture | [MLEL] | +-------------------------------------+------------+
... View more
03-25-2019
09:20 AM
1 Kudo
What version of Impala are you using? I suspect the meaning of "duration" might have changed in IMPALA-1575 / IMPALA-5397. In general, its possible that the your definition of duration is different from Impala's. Depending on the version, Impala might include the time taken until the query has been actually closed (which would include fetching rows and releasing all resources). I *think* the waiting time is the difference between the current time and the time the query was last actively being processed. So this value could be high if the query has been completed, but the client has not closed the query (which is why "waiting time" shows up the section "waiting to be closed").
... View more
03-24-2019
09:16 AM
Is the stack trace the same every time this crashes? If so, there is likely a bug somewhere in your UDAF, it's hard for me to say exactly where the bug is, but I would recommend trying to reproduce the issue outside Impala (perhaps run the UDAF in a test harness). If you can reproduce the crash outside Impala, standard C++ debugging tools should help you pin down the issue.
... View more
03-23-2019
12:21 PM
Could you attach the full hs_err.log file? There should be a stack trace in the hs_err.log file of the thread that crashed. The impalad.INFO files would probably help as well.
... View more
03-22-2019
10:08 AM
The "Connection reset by peer" message should have just been printed by the client (e.g. Hue, impala-shell, etc.). If the impalad itself crashed it should have printed some exit message or other logs that indicated why the impalad crashed (logs should be in impalad.INFO). If the impalad logs don't have anything useful, then it should have written either a minidump or core dump which can be parsed to see what caused it to crash. How big are your input files? Is it possible that with only one input file, only a single scan range was used to read the table, and that with more input files, multiple scan ranges were spawned? If that is the case, it sounds like there might be a race condition somewhere .
... View more