Member since
03-22-2019
24
Posts
4
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1206 | 02-24-2020 10:37 AM | |
2193 | 03-25-2019 09:20 AM | |
3291 | 03-24-2019 09:57 AM |
02-24-2020
10:37 AM
Something like this should work. It should just be a matter of using the correct string manipulation functions: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_string_functions.html create table test1 (col1 string); insert into table test1 values ("IT Strategy& Architecture BDC India [MITX 999]"), ("Corporate & IC Solution Delivery [SVII]"), ("Operations Solution Delivery [SVIA]"), ("Mainframe Service [MLEM]"), ("Strategy & Architecture [MLEL]"); select * from test1; +------------------------------------------------+ | col1 | +------------------------------------------------+ | IT Strategy& Architecture BDC India [MITX 999] | | Corporate & IC Solution Delivery [SVII] | | Operations Solution Delivery [SVIA] | | Mainframe Service [MLEM] | | Strategy & Architecture [MLEL] | +------------------------------------------------+ create table test2 as select trim(split_part(col1, ' [', 1)), trim(concat(' [', split_part(col1, ' [', 2))) fr om test1; select * from test2; +-------------------------------------+------------+ | _c0 | _c1 | +-------------------------------------+------------+ | IT Strategy& Architecture BDC India | [MITX 999] | | Corporate & IC Solution Delivery | [SVII] | | Operations Solution Delivery | [SVIA] | | Mainframe Service | [MLEM] | | Strategy & Architecture | [MLEL] | +-------------------------------------+------------+
... View more
01-28-2020
08:41 AM
You are absolutely right. After struggling with various syntax, realized CDH5.16 version impalad version 2.12.0-cdh5.16.1 doesnt support get_json_object(). So finally using json.dumps() removed all the unicode u' characters and also removed all strange characters in the json fields to normal characters like CH4_NO2_WE_AUX. After that ended up using Hive instead of Impala with a query like below to extract the values as columns. The json_column1 is a string datatype. -------------------------------------------- select b.b1, c.c1,c.c2,d.d1,d.d2 from json_table1 a lateral view json_tuple(a.json_column1, 'CH4_NO2_WE_AUX', 'CH7_CO_CONCENTRATION_WE') b as b1,b2 lateral view json_tuple(b.b1,'unit','value') c as c1,c2 lateral view json_tuple(b.b2,'unit','value') d as d1,d2 ;
... View more
01-28-2020
07:59 AM
Offset means the offset into the actual csv file. So in this case, that means the 2432696320th byte of the file foo_042019.csv. There are multiple tools that should allow you to open the file and seek to the desired offset. For example, you could open the file in vim and run :goto 2432696320 which should seek the cursor to the 2432696320th byte of the file, and thus the offending row.
... View more
09-24-2019
03:05 PM
It depends on how you are updating your partitions. Are you creating completely new partitions, or adding files to existing partitions? ALTER TABLE RECOVER PARTITIONS is specifically used for adding newly created partition directories to a partitioned table - https://impala.apache.org/docs/build/html/topics/impala_alter_table.html
... View more
09-24-2019
12:10 PM
Queries are in the "waiting to be closed" stage if they are in the EXCEPTION state or if all the rows from the query have been read. In either case, the query needs to be explicitly closed for it to be "completed". https://community.cloudera.com/t5/Support-Questions/Query-Cancel-and-idle-query-timeout-is-not-working/td-p/58104 might be useful as well.
... View more
09-23-2019
07:52 AM
1 Kudo
The "TransmitData() to X.X.X.X:27000 failed" portion of the error message is thrown by the Impala RPC code. the "Connection timed out (error 110)" is a TCP error. TCP error code 110 corresponds to the error "Connection timed out". So, as the error message states, there was a TCP connection timeout between two Impala processes. It's hard to debug without more information. What query was being run? Can you post the full log files? What were the two processes that were trying to communicate with each other? In all likelihood, this looks like a network issue, does this happen consistently?
... View more
03-28-2019
10:19 AM
Hi, The Impala version I'm using is 2.11, so I have those changes there. One thing I noticed is the duration, when query is submitted from impala-shell, seems to match the duration of the query reported after all rows have been fetched, but that does not seem to be the case when we actually time the duration in the client that submitted the query. As you said, probably Impala is still dealing with further processing/closing/cleanup of the query at the time the client was already able to fetch all the results of the query and print out the elapsed time. Thanks for your answer, Paulo.
... View more
03-26-2019
10:19 AM
Impala expect your UDF code and dependencies to be in a single .so, so you'd have to statically link any libraries you depend on.
... View more
03-24-2019
09:57 AM
1 Kudo
Impala scanners internally have a RowBatch queue that allows Impala to decouple I/O from CPU processing. The I/O threads read data into RowBatches and put them into a queue, CPU threads asynchronously fetch data from the queue and process them. RowBatchQueueGetWaitTime is the amount of time CPU threads wait on data to arrive into the queue. Essentially, it means the CPU threads were waiting a long time for the I/O threads to read the data.
... View more