About parthk

parthk · ‎12-16-2020

Hello Team, We are using Impala 3.2.0 there are some queries which were working 3-4 weeks back and now they have suddenly started throwing "IllegalStateException: null" exception. In last 3-4 nothing has changed on our Impala cluster. Error logs also does not suggest the cause of the error. Can anybody help us as to why this issue arising. Query which failing looks something like this WITH banner_test AS (WITH layout_id AS ( SELECT <some columns> FROM table_v1 WHERE ( layout_component_id = <some_id>) ), layouts AS (SELECT <some columns>, coalesce() AS col1, <some columns> FROM table_v2 lch WHERE lch.some_id IN (SELECT some_id FROM table_v3 WHERE some_field_v1 IN (SELECT some_field from layout_id)) AND lch.start_time >= '2019-01-01T00:00:00.000+00:00' ), full_data AS (SELECT lctd.some_field_v1, l.*, lctd.some_field_v2, min(l.start_time)over(PARTITION BY some_field_v1) AS min_start_time, max(l.end_time)over(PARTITION BY some_field_v1) AS max_end_time FROM tables_v4 l INNER JOIN tables_v5 lctd ON l.some_id = lctd.some_id), tables_v5 AS ( SELECT <some_fields>, case when some_field like 'val1%' then 'val1' else 'val2' end as business FROM tables_v6 WHERE date_str >= '2019-01-01' ) SELECT <some_fields>, case when x3.some_field=x2.some_field then x3.some_field end as some_field, x3.some_id as x3.some_id FROM (SELECT DISTINCT some_filed, more_fields FROM tables_v7 sve INNER JOIN tables_v8 lfd ON sve.field1 = lfd.field2 AND sve.type = sve.type AND coalesce(sve.field,"somestring") = lfd.field AND sve.created_at BETWEEN lfd.min_start_time AND coalesce(lfd.max_end_time,'2030-12-31T00:00:00.000+00:00') INNER JOIN tables_v9 aau ON sve.some_id = aau.some_id WHERE aau.some_id IS NOT NULL) x1 INNER JOIN tables_v10 ild on ild.hash_value=x1.hash_value LEFT JOIN (SELECT DISTINCT some_filed, <some_fields> FROM tables_v10 sve INNER JOIN full_data lfd ON sve.some_id = lfd.some_id) x2 ON x1.some_id_1 = x2.some_id_1 AND x1.created_at < x2.created_at and ild.lcid = x2.id LEFT JOIN (SELECT <some_fileds> FROM table_v11 where status='SUCCESS') x3 ON x2._id = x3._id AND x2.created_at < x3.checked_out_at ) SELECT field_1, field_2, field_3, TO_DATE(FROM_UTC_TIMESTAMP(started_date ,'Asia/Bangkok')) AS started_date_1, TO_DATE(FROM_UTC_TIMESTAMP(ended_date ,'Asia/Bangkok')) AS ended_date_1, (((DATEDIFF(FROM_UTC_TIMESTAMP(banner_ended ,'Asia/Bangkok'), '1970-01-04')%7 + 7)%7 - 1 + 7)%(7)) AS ended_day_of_week_index_1, DAYNAME(FROM_UTC_TIMESTAMP(banner_ended ,'Asia/Bangkok')) AS _ended_day_of_week_1, _present_on_screen_id AS _test_screen_id_1, count(distinct _test.x2_session_id)/count(distinct _test.x1_session_id) AS test_ctr_sessions_1, count(distinct _test.x2_anonymous_id)/count(distinct _test.x1_anonymous_id) AS test_ctr_aid_1 FROM banner_test WHERE ((_test.banner_started >= TO_UTC_TIMESTAMP('1900-01-01 00:00:00.000000','Asia/Bangkok'))) AND (banner_test.region REGEXP '$') AND (banner_test.business REGEXP '$') GROUP BY 1,2,3,4,5,6,7,8 ORDER BY _banner_started_date_1 DESC LIMIT 500; I have obfuscated some of the details here but information regarding joins, aggregations, group by etc is preserved. Regards Parth

parthk · ‎12-15-2020

@Tim Armstrong If we implement the Admission controls then we can reduce the memory exceptions, but we can still encounter the situations where queries are not admitted. With admission controls and resource we can prioritise that queries from a certain query pool to get the resources first, please correct me if I am wrong. And w.r.t scheduling In Impala we are reading data from Kudu. Impala and Kudu services both are located on different nodes. So how does scheduling work in this case? Parth

parthk · ‎12-14-2020

Hello Team, We have 5 node Impala cluster (1 co-ordinator and 4 executors) we are running Impala 3.2.0. Each Impala node is of size 32 GB and 4 cores. Now we are facing an issue sometimes 2-3 Impala executors out of 5 are over utilised (80 - 90 %) memory usage and other are not, for example if executor 1 and 3 have memory usage of more than 80% and some new query issued it fails saying could allocate space(512MB) on executor 1 even tough there is more than enough memory on the other executors (2,4 and 5) whose memory utilisation is under 20%. Following is the error which I receive Memory limit exceeded: Failed to allocate row batch EXCHANGE_NODE (id=148) could not allocate 16.00 KB without exceeding limit. Error occurred on backend impala-node-executor-1:22000 Memory left in process limit: 19.72 GB Query(2c4bc52a309929f9:2fa5f79d00000000): Reservation=998.62 MB ReservationLimit=21.60 GB OtherMemory=104.12 MB Total=1.08 GB Peak=1.08 GB Unclaimed reservations: Reservation=183.81 MB OtherMemory=0 Total=183.81 MB Peak=398.94 MB Fragment 2c4bc52a309929f9:2fa5f79d00000021: Reservation=0 How are query fragments distributed among the impala executors? Is a way to load balance the query load among executors in case when we have dedicated executor and co-ordinator? What are the good practices to have proper utilisation of Impala cluster? Regards Parth

parthk · ‎10-15-2020

@Tim Armstrong it worked like charm after changing the gcc version. Thanks

parthk · ‎10-11-2020

@Tim Armstrong I am using gcc version 5.4.0 and OS is ubuntu 16.04 xenial. Will it work If I compile it with 4.9.2?

parthk · ‎10-08-2020

Hello Team, We are planning to Impala UDFs and UDAs. To try things out we started by exploring the examples given in the cloudera github repo. After building the .so file using the make utility when I try to create the function using the following create function has_vowels (string) returns boolean location '/user/hive/udfs/libudfsample.so' symbol='HasVowels'; We are getting the following error ERROR: AnalysisException: Could not load binary: /<hdfs_path>/udfs/libudfsample.so Unable to load /var/lib/impala/udfs/libudfsample.9909.1.so dlerror: /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/lib/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /var/lib/impala/udfs/libudfsample.9909.1.so) Following are the error logs org.apache.impala.common.AnalysisException: Could not load binary: /user/parth.khatwani/udfs/libudfsample.so Unable to load /var/lib/impala/udfs/libudfsample.9909.1.so dlerror: /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/lib/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /var/lib/impala/udfs/libudfsample.9909.1.so) at org.apache.impala.catalog.Function.lookupSymbol(Function.java:442) at org.apache.impala.analysis.CreateUdfStmt.analyze(CreateUdfStmt.java:92) at org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:451) at org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:421) at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1285) at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1252) at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1222) at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:167) I1009 02:42:52.638576 15136 status.cc:124] 3c48433373076c77:30eff6ff00000000] AnalysisException: Could not load binary: /user/parth.khatwani/udfs/libudfsample.so Unable to load /var/lib/impala/udfs/libudfsample.9909.1.so I am unable to figure out what's wrong here. Regards Parth

parthk · ‎05-13-2020

@Tim Armstrong thanks for detailed insights this will be very helpful.

parthk · ‎05-13-2020

Thanks for the insights @tmater . This is very helpful

parthk · ‎04-30-2020

Hello Team, We are using Impala to query data stored as parquet on s3. This has been an awesome feature. Recently Amazon S3 has announced a new feature called S3 Select which helps in speeding up the column projection when querying data stored on S3. As of now Hive and Presto support S3 select push down. Is Impala going to support S3 select push downs? Parth

parthk · ‎04-14-2020

@Tim Armstrong thanks for pointing this out. We have observed that in our case the memory usage on the coordinator is not that high having co-ordinator of same size as executor will lead to under utilisation of resources on co-ordinator. Or we can have multiple (8) executors of smaller size lets say 32 GB instead of two with 128GB. Please share your thoughts about it

Online	Offline
Last Visited	‎02-10-2021 07:08 AM

Member Since	‎04-08-2019 08:59 AM
Last Visited	‎02-10-2021 07:08 AM
Posts	37
Kudos received	1

Cloudera Community

Re: Impala UDF unable to load the .so file from HD...

Impala Queries which were previously working are n...

Re: Impala queries are not distributing to all the...

Impala queries are not distributing to all the exe...

Re: Impala UDF unable to load the .so file from HD...

Re: Impala UDF unable to load the .so file from HD...

Impala UDF unable to load the .so file from HDFS

Re: Does Impala support S3 select push down?

Re: Does Impala support S3 select push down?

Does Impala support S3 select push down?

Re: Impala admission control memory issues with im...