Member since
03-22-2019
24
Posts
3
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
509 | 02-24-2020 10:37 AM | |
838 | 03-25-2019 09:20 AM | |
1694 | 03-24-2019 09:57 AM |
02-24-2020
10:43 AM
Does the user have access to the database RWE_2020_02921? The exception below seems to indicate he/she does not: " AuthorizationException: User 'zhengzhg' does not have privileges to access: RWE_2020_02921.*" What is the exact query that is being run against the RWE_2020_02921 database?
... View more
02-24-2020
10:37 AM
Something like this should work. It should just be a matter of using the correct string manipulation functions: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_string_functions.html create table test1 (col1 string); insert into table test1 values ("IT Strategy& Architecture BDC India [MITX 999]"), ("Corporate & IC Solution Delivery [SVII]"), ("Operations Solution Delivery [SVIA]"), ("Mainframe Service [MLEM]"), ("Strategy & Architecture [MLEL]"); select * from test1; +------------------------------------------------+ | col1 | +------------------------------------------------+ | IT Strategy& Architecture BDC India [MITX 999] | | Corporate & IC Solution Delivery [SVII] | | Operations Solution Delivery [SVIA] | | Mainframe Service [MLEM] | | Strategy & Architecture [MLEL] | +------------------------------------------------+ create table test2 as select trim(split_part(col1, ' [', 1)), trim(concat(' [', split_part(col1, ' [', 2))) fr om test1; select * from test2; +-------------------------------------+------------+ | _c0 | _c1 | +-------------------------------------+------------+ | IT Strategy& Architecture BDC India | [MITX 999] | | Corporate & IC Solution Delivery | [SVII] | | Operations Solution Delivery | [SVIA] | | Mainframe Service | [MLEM] | | Strategy & Architecture | [MLEL] | +-------------------------------------+------------+
... View more
02-05-2020
08:48 AM
datastream_sender_timeout_ms is an Impala startup flag. See https://impala.apache.org/docs/build/html/topics/impala_config_options.html for information on setting startup flag options. The "RPC client failed to connect" means that a client failed to connect to an impalad process running on p1i-hdp-srv09.lnt.com. It is hard to say which client failed to make the connection without more information, are there additional client and impalad logs available? What version of Impala is running? Is KRPC enabled on your cluster?
... View more
02-05-2020
08:01 AM
Unlikely to be related to admission control. The error you included is referred to as a DATASTREAM_SENDER_TIMEOUT error in Impala and is thrown by Impala's RPC layer. What version of Impala are you running? This could be related to IMPALA-6818 which reports a similar error. You could try increasing the value of datastream_sender_timeout_ms to a higher value such as 3600000.
... View more
01-28-2020
08:27 AM
The exchange operators seem to be the bottleneck: 16:EXCHANGE 1 26m14s 26m14s 22.55K 0 0 0 KUDU(KuduPartition(shift_timekey)) 13:EXCHANGE 1 26m13s 26m13s 21.46K -1 0 0 HASH(t.pnl_id,d.oper_code,d.factory) 12:EXCHANGE 1 24m17s 24m17s 108.37M -1 0 0 BROADCAST Each take 20+ minutes. More importantly, the profile says "WARNING: The following tables are missing relevant table and/or column statistics. rptpid.i_f_r_mes_hist_pnl_mod, rptpid.i_f_t_mes_hist_gop_inout_fab". You should compute stats on these tables. Having accurate statistics is important for Impala performance.
... View more
01-28-2020
08:23 AM
https://impala.apache.org/docs/build/html/topics/impala_admission.html is probably a good place to start.
... View more
01-28-2020
08:15 AM
1 Kudo
https://impala.apache.org/docs/build/html/topics/impala_misc_functions.html#misc_functions__get_json_object has some good documentation on how to use the get_json_object function. What Impala version are you using? What is the type of the column that contains the JSON data? I hit a couple issues when parsing the JSON you posted. I believe the JSON standard does not allow for single quotes. Standard online JSON parsers have trouble with the 'u' character as well. I was able to get the following to work on Impala master: [localhost:21000] default> select get_json_object("{\"CH4: NO2 (we-aux)\": {\"unit\": \"mV\", \"value\": 4.852294921875}, \"CH6: Ox concentration (we-aux)\": {\"unit\": \"ppb\", \"value\": -84.73094995471016}}", '$.*'); Query: select get_json_object("{\"CH4: NO2 (we-aux)\": {\"unit\": \"mV\", \"value\": 4.852294921875}, \"CH6: Ox concentration (we-aux)\": {\"unit\": \"ppb\", \"value\": -84.73094995471016}}", '$.*') +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | get_json_object('{"ch4: no2 (we-aux)": {"unit": "mv", "value": 4.852294921875}, "ch6: ox concentration (we-aux)": {"unit": "ppb", "value": -84.73094995471016}}', '$.*') | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | [{"unit":"mV","value":4.852294921875},{"unit":"ppb","value":-84.73094995471016}] | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
... View more
01-28-2020
07:59 AM
Offset means the offset into the actual csv file. So in this case, that means the 2432696320th byte of the file foo_042019.csv. There are multiple tools that should allow you to open the file and seek to the desired offset. For example, you could open the file in vim and run :goto 2432696320 which should seek the cursor to the 2432696320th byte of the file, and thus the offending row.
... View more
01-17-2020
03:48 PM
The error " Error converting column: 35 to TIMESTAMP" means there was an error when converting column 35 to the TIMESTAMP type. The error "Error parsing row: file: hdfs://blabla/foo_042019.csv, before offset: 2432696320" means there was an error while parsing the row at file offset 2432696320, in the file foo_042019.csv. So it looks like there are several rows in your dataset where certain fields cannot be converted to TIMESTAMPs. You should be able to open up the file, and seek to the specified offset to find the rows that are corrupted. I believe, Hive does not throw an exception when given the same dataset, instead it converts the corrupted rows to NULL. The same behavior can be emulated in Impala by setting 'abort_on_error=false'. However, be warned that setting this option can mask data corruption issues. See https://impala.apache.org/docs/build/html/topics/impala_abort_on_error.html for details.
... View more
01-17-2020
03:33 PM
This looks like an application issue (or possibly an Impala connector issue). The error message states that the Impala query was cancelled by another thread. Likely, another thread in your application got a handle to the query and cancelled it. The full runtime profile of the query would help as well.
... View more
11-06-2019
08:53 AM
IMPALA-8557 has been fixed. The fix should be coming in a CDH release soon.
... View more
09-24-2019
03:05 PM
It depends on how you are updating your partitions. Are you creating completely new partitions, or adding files to existing partitions? ALTER TABLE RECOVER PARTITIONS is specifically used for adding newly created partition directories to a partitioned table - https://impala.apache.org/docs/build/html/topics/impala_alter_table.html
... View more
09-24-2019
12:10 PM
Queries are in the "waiting to be closed" stage if they are in the EXCEPTION state or if all the rows from the query have been read. In either case, the query needs to be explicitly closed for it to be "completed". https://community.cloudera.com/t5/Support-Questions/Query-Cancel-and-idle-query-timeout-is-not-working/td-p/58104 might be useful as well.
... View more
09-24-2019
12:08 PM
The " InactiveTotalTIme" for the "KrpcDataStreamSender" corresponds to the amount of time spent waiting for in-flight RPCs to complete. RPCs are considered complete when they receive a response from the remote server, or if the RPC ends prematurely for other reasons (e.g. cancellation). Is you attach the full runtime profile of the query, it might be easier to debug.
... View more
09-23-2019
07:52 AM
The "TransmitData() to X.X.X.X:27000 failed" portion of the error message is thrown by the Impala RPC code. the "Connection timed out (error 110)" is a TCP error. TCP error code 110 corresponds to the error "Connection timed out". So, as the error message states, there was a TCP connection timeout between two Impala processes. It's hard to debug without more information. What query was being run? Can you post the full log files? What were the two processes that were trying to communicate with each other? In all likelihood, this looks like a network issue, does this happen consistently?
... View more
03-26-2019
09:35 AM
I believe OpenLDAP should work. Have you tried using "-- ldap_ca_certificate" instead of "--ca_cert". According to Impala, "--ldap_ca_certificate" is "The full path to the certificate file used to authenticate the LDAP server's certificate for SSL / TLS connections." Do the logs for the impalad you are trying to connect to contain any relevant debugging information? The main restriction I am aware of is lack of support for LDAP search / bind operations in Impala - https://issues.apache.org/jira/browse/IMPALA-2563
... View more
03-25-2019
09:20 AM
1 Kudo
What version of Impala are you using? I suspect the meaning of "duration" might have changed in IMPALA-1575 / IMPALA-5397. In general, its possible that the your definition of duration is different from Impala's. Depending on the version, Impala might include the time taken until the query has been actually closed (which would include fetching rows and releasing all resources). I *think* the waiting time is the difference between the current time and the time the query was last actively being processed. So this value could be high if the query has been completed, but the client has not closed the query (which is why "waiting time" shows up the section "waiting to be closed").
... View more
03-24-2019
09:57 AM
1 Kudo
Impala scanners internally have a RowBatch queue that allows Impala to decouple I/O from CPU processing. The I/O threads read data into RowBatches and put them into a queue, CPU threads asynchronously fetch data from the queue and process them. RowBatchQueueGetWaitTime is the amount of time CPU threads wait on data to arrive into the queue. Essentially, it means the CPU threads were waiting a long time for the I/O threads to read the data.
... View more
03-24-2019
09:50 AM
https://www.cloudera.com/documentation/enterprise/5-16-x/topics/impala_ldap.html might be helfpul here. Have you taken a look at the impalad logs on cdhmaster1?
... View more
03-24-2019
09:16 AM
Is the stack trace the same every time this crashes? If so, there is likely a bug somewhere in your UDAF, it's hard for me to say exactly where the bug is, but I would recommend trying to reproduce the issue outside Impala (perhaps run the UDAF in a test harness). If you can reproduce the crash outside Impala, standard C++ debugging tools should help you pin down the issue.
... View more
03-23-2019
12:21 PM
Could you attach the full hs_err.log file? There should be a stack trace in the hs_err.log file of the thread that crashed. The impalad.INFO files would probably help as well.
... View more
03-22-2019
10:08 AM
The " Connection reset by peer" message should have just been printed by the client (e.g. Hue, impala-shell, etc.). If the impalad itself crashed it should have printed some exit message or other logs that indicated why the impalad crashed (logs should be in impalad.INFO). If the impalad logs don't have anything useful, then it should have written either a minidump or core dump which can be parsed to see what caused it to crash. How big are your input files? Is it possible that with only one input file, only a single scan range was used to read the table, and that with more input files, multiple scan ranges were spawned? If that is the case, it sounds like there might be a race condition somewhere .
... View more
03-22-2019
09:47 AM
Agree with the comment from Tomas79, the filter will convert the LEFT JOIN to an INNER JOIN. I found this post helpful in explaining the behavior: http://wiki.lessthandot.com/index.php/WHERE_conditions_on_a_LEFT_JOIN You mentioned that things work correctly in Hive, what version of Hive are you using? I checked apache/hive master branch and it follows the same behavior as Impala.
... View more