Member since
04-08-2019
37
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1237 | 10-11-2020 11:53 PM |
02-09-2021
07:30 AM
but CDH 6.3.2 and cloudera manager 6.3.1 are still not close source. There should be atleast some way to download it openly
... View more
02-08-2021
02:18 AM
@GangWar It worked. Thanks
... View more
02-08-2021
01:34 AM
@GangWar we will also have to update the following config to the local parcel repo right?
... View more
02-08-2021
01:10 AM
@GangWar This will remain unchanged, as the paywall is only affecting accessing the binaries. The catch is if you have local repo set you can distribute parcels and expand cluster, but downloading parcels over internet will fail in this case. By creating the local repo for parcels you mean that we will have to create a repo where we will have to host the .parcels file which is already downloaded on the nodes in the cluster? Parth
... View more
02-07-2021
10:19 PM
Hello Team, We are not able to add new nodes to CDH cluster, as it is asking for username and password for down loading the archives. Till 6.3.2 CDH is free and only an upgrade to 6.3.3 will require a license. Now even old parcel url requires username and password, where should I get them from?. Will it be required to pay for getting username and password even when using CDH 6.3.2? URL change link more information link over here is broken. Parth
... View more
Labels:
- Labels:
-
Cloudera Manager
12-22-2020
06:24 AM
@Tim Armstrong Thanks for helping out here. My apologies for mis-understanding w.r.t packing information.
... View more
12-19-2020
07:39 AM
@Tim Armstrong CDH 6.3.4 packaging information suggests the Impala is still at 3.2.0 . Am I looking at wrong page? We are eagerly waiting for Impala 3.3.0 & 3.4.0 to available in CDH package since September 2019 as they have very good features like Ranger Support, caching for remote filesystems liike S3. Is there a way to run the Impala 3.4.0 with cloudera manager 6.3.1 and by passing the Impala which is provided by CDH 6.3.2?
... View more
- Tags:
- Apache Impala
12-18-2020
09:43 AM
@Tim Armstrong Did the logs help?
... View more
12-16-2020
10:28 AM
@Tim Armstrong This is the stacktrace which I got from the impala demon INFO logs I1216 18:24:31.520996 5823 jni-util.cc:256] 5348abd63d19b026:621ffa8a00000000] java.lang.IllegalStateException
at com.google.common.base.Preconditions.checkState(Preconditions.java:133)
at org.apache.impala.planner.SortNode.computeNodeResourceProfile(SortNode.java:258)
at org.apache.impala.planner.PlanFragment.computeResourceProfile(PlanFragment.java:234)
at org.apache.impala.planner.Planner.computeResourceReqs(Planner.java:388)
at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1178)
at org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:1466)
at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1345)
at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1252)
at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1222)
at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:167)
I1216 18:24:31.521031 5823 status.cc:124] 5348abd63d19b026:621ffa8a00000000] IllegalStateException: null Parth
... View more
12-16-2020
05:07 AM
@Tim Armstrong got your point. Can you put some more light on How the Impala query fragments are distributed. When Impala and kudu services both are located on different nodes In this case data locality principle won't hold correct me if I am wrong. Parth
... View more
12-16-2020
04:52 AM
Hello Team, We are using Impala 3.2.0 there are some queries which were working 3-4 weeks back and now they have suddenly started throwing "IllegalStateException: null" exception. In last 3-4 nothing has changed on our Impala cluster. Error logs also does not suggest the cause of the error. Can anybody help us as to why this issue arising. Query which failing looks something like this WITH banner_test AS (WITH layout_id AS
(
SELECT <some columns>
FROM table_v1
WHERE
( layout_component_id = <some_id>)
),
layouts AS
(SELECT <some columns>,
coalesce() AS col1,
<some columns>
FROM table_v2 lch
WHERE lch.some_id IN
(SELECT some_id
FROM table_v3
WHERE some_field_v1 IN
(SELECT some_field from layout_id))
AND lch.start_time >= '2019-01-01T00:00:00.000+00:00'
),
full_data AS
(SELECT lctd.some_field_v1,
l.*,
lctd.some_field_v2,
min(l.start_time)over(PARTITION BY some_field_v1) AS min_start_time,
max(l.end_time)over(PARTITION BY some_field_v1) AS max_end_time
FROM tables_v4 l
INNER JOIN tables_v5 lctd ON l.some_id = lctd.some_id),
tables_v5 AS
(
SELECT <some_fields>,
case when some_field like 'val1%' then 'val1' else 'val2' end as business
FROM tables_v6
WHERE date_str >= '2019-01-01'
)
SELECT <some_fields>,
case when x3.some_field=x2.some_field then x3.some_field end as some_field,
x3.some_id as x3.some_id
FROM
(SELECT DISTINCT some_filed,
more_fields
FROM tables_v7 sve
INNER JOIN tables_v8 lfd ON sve.field1 = lfd.field2
AND sve.type = sve.type
AND coalesce(sve.field,"somestring") = lfd.field
AND sve.created_at BETWEEN lfd.min_start_time AND coalesce(lfd.max_end_time,'2030-12-31T00:00:00.000+00:00')
INNER JOIN tables_v9 aau ON sve.some_id = aau.some_id
WHERE aau.some_id IS NOT NULL) x1
INNER JOIN tables_v10 ild on ild.hash_value=x1.hash_value
LEFT JOIN
(SELECT DISTINCT some_filed,
<some_fields>
FROM tables_v10 sve
INNER JOIN full_data lfd ON sve.some_id = lfd.some_id)
x2 ON x1.some_id_1 = x2.some_id_1
AND x1.created_at < x2.created_at
and ild.lcid = x2.id
LEFT JOIN
(SELECT <some_fileds>
FROM table_v11 where status='SUCCESS')
x3 ON x2._id = x3._id
AND x2.created_at < x3.checked_out_at )
SELECT
field_1,
field_2,
field_3,
TO_DATE(FROM_UTC_TIMESTAMP(started_date ,'Asia/Bangkok')) AS started_date_1,
TO_DATE(FROM_UTC_TIMESTAMP(ended_date ,'Asia/Bangkok')) AS ended_date_1,
(((DATEDIFF(FROM_UTC_TIMESTAMP(banner_ended ,'Asia/Bangkok'), '1970-01-04')%7 + 7)%7 - 1 + 7)%(7)) AS ended_day_of_week_index_1,
DAYNAME(FROM_UTC_TIMESTAMP(banner_ended ,'Asia/Bangkok')) AS _ended_day_of_week_1,
_present_on_screen_id AS _test_screen_id_1,
count(distinct _test.x2_session_id)/count(distinct _test.x1_session_id) AS test_ctr_sessions_1,
count(distinct _test.x2_anonymous_id)/count(distinct _test.x1_anonymous_id) AS test_ctr_aid_1
FROM banner_test
WHERE ((_test.banner_started >= TO_UTC_TIMESTAMP('1900-01-01 00:00:00.000000','Asia/Bangkok'))) AND (banner_test.region REGEXP '$') AND (banner_test.business REGEXP '$')
GROUP BY 1,2,3,4,5,6,7,8
ORDER BY _banner_started_date_1 DESC
LIMIT 500; I have obfuscated some of the details here but information regarding joins, aggregations, group by etc is preserved. Regards Parth
... View more
- Tags:
- impala 3.2.0
Labels:
- Labels:
-
Apache Impala
12-15-2020
09:07 AM
@Tim Armstrong If we implement the Admission controls then we can reduce the memory exceptions, but we can still encounter the situations where queries are not admitted. With admission controls and resource we can prioritise that queries from a certain query pool to get the resources first, please correct me if I am wrong. And w.r.t scheduling In Impala we are reading data from Kudu. Impala and Kudu services both are located on different nodes. So how does scheduling work in this case? Parth
... View more
12-14-2020
12:26 AM
Hello Team, We have 5 node Impala cluster (1 co-ordinator and 4 executors) we are running Impala 3.2.0. Each Impala node is of size 32 GB and 4 cores. Now we are facing an issue sometimes 2-3 Impala executors out of 5 are over utilised (80 - 90 %) memory usage and other are not, for example if executor 1 and 3 have memory usage of more than 80% and some new query issued it fails saying could allocate space(512MB) on executor 1 even tough there is more than enough memory on the other executors (2,4 and 5) whose memory utilisation is under 20%. Following is the error which I receive Memory limit exceeded: Failed to allocate row batch EXCHANGE_NODE (id=148) could not allocate 16.00 KB without exceeding limit. Error occurred on backend impala-node-executor-1:22000 Memory left in process limit: 19.72 GB Query(2c4bc52a309929f9:2fa5f79d00000000): Reservation=998.62 MB ReservationLimit=21.60 GB OtherMemory=104.12 MB Total=1.08 GB Peak=1.08 GB Unclaimed reservations: Reservation=183.81 MB OtherMemory=0 Total=183.81 MB Peak=398.94 MB Fragment 2c4bc52a309929f9:2fa5f79d00000021: Reservation=0 How are query fragments distributed among the impala executors? Is a way to load balance the query load among executors in case when we have dedicated executor and co-ordinator? What are the good practices to have proper utilisation of Impala cluster? Regards Parth
... View more
- Tags:
- impala 3.2.0
- Node
Labels:
- Labels:
-
Apache Impala
10-15-2020
04:13 AM
@Tim Armstrong it worked like charm after changing the gcc version. Thanks
... View more
10-11-2020
11:53 PM
@Tim Armstrong I am using gcc version 5.4.0 and OS is ubuntu 16.04 xenial. Will it work If I compile it with 4.9.2?
... View more
10-08-2020
07:47 PM
Hello Team, We are planning to Impala UDFs and UDAs. To try things out we started by exploring the examples given in the cloudera github repo. After building the .so file using the make utility when I try to create the function using the following create function has_vowels (string) returns boolean location '/user/hive/udfs/libudfsample.so' symbol='HasVowels'; We are getting the following error ERROR: AnalysisException: Could not load binary: /<hdfs_path>/udfs/libudfsample.so
Unable to load /var/lib/impala/udfs/libudfsample.9909.1.so
dlerror: /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/lib/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /var/lib/impala/udfs/libudfsample.9909.1.so) Following are the error logs org.apache.impala.common.AnalysisException: Could not load binary: /user/parth.khatwani/udfs/libudfsample.so
Unable to load /var/lib/impala/udfs/libudfsample.9909.1.so
dlerror: /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/lib/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /var/lib/impala/udfs/libudfsample.9909.1.so)
at org.apache.impala.catalog.Function.lookupSymbol(Function.java:442)
at org.apache.impala.analysis.CreateUdfStmt.analyze(CreateUdfStmt.java:92)
at org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:451)
at org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:421)
at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1285)
at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1252)
at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1222)
at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:167)
I1009 02:42:52.638576 15136 status.cc:124] 3c48433373076c77:30eff6ff00000000] AnalysisException: Could not load binary: /user/parth.khatwani/udfs/libudfsample.so
Unable to load /var/lib/impala/udfs/libudfsample.9909.1.so I am unable to figure out what's wrong here. Regards Parth
... View more
Labels:
- Labels:
-
Apache Impala
10-08-2020
09:41 AM
@gilli Did you find any solution to this? I am facing the same issue.
... View more
05-13-2020
04:53 AM
@Tim Armstrong thanks for detailed insights this will be very helpful.
... View more
04-30-2020
06:24 AM
1 Kudo
Hello Team, We are using Impala to query data stored as parquet on s3. This has been an awesome feature. Recently Amazon S3 has announced a new feature called S3 Select which helps in speeding up the column projection when querying data stored on S3. As of now Hive and Presto support S3 select push down. Is Impala going to support S3 select push downs? Parth
... View more
- Tags:
- impala
Labels:
- Labels:
-
Apache Impala
04-14-2020
05:52 AM
@Tim Armstrong thanks for pointing this out. We have observed that in our case the memory usage on the coordinator is not that high having co-ordinator of same size as executor will lead to under utilisation of resources on co-ordinator. Or we can have multiple (8) executors of smaller size lets say 32 GB instead of two with 128GB. Please share your thoughts about it
... View more
04-09-2020
03:54 AM
We are using Impala version 3.2.0.
... View more
04-09-2020
02:23 AM
@EricL can you help me with this.
... View more
04-07-2020
04:56 AM
Hello Team, Following is our Impala cluster config Two dedicated executors with following hardware config 128GB, 16 core. One dedicated coordinator with following hardware config 8GB , 2 core. As most of the query processing happens on the executor we have kept the executor's size more and co-ordinator's size small. Now when we enable Impala admission controls we run into the situation where queries having memory requirements more than that of coordinator's memory gets rejected because of insufficient memory error, even though the cluster on the whole has enough memory to process the request. This is the error message which I get Rejected query from pool root.default: request memory needed 8.29 GB per node is greater than memory available for admission 6.00 GB of <coordinator's DNS>. Use the MEM_LIMIT query option to indicate how much memory is required per node. Following is the admission control config As of now we only one default pool Max Memory :- 200GB Default Memory Limit :- No default limit Max queued queries :- 200 Queue Timeout :- 5 Minutes What can be the reasons for this error? Does executor's and coordinator's hardware size need to be same? Can I exclude coordinator from the admission control? Regards Parth
... View more
- Tags:
- impala
Labels:
- Labels:
-
Apache Impala
02-25-2020
10:23 AM
Hello Team, Following is out Kudu Cluster configuration Kudu 1.8 Number of Master :- 3 Number of tablet servers :- 5 each tablet server has 64GB RAM and an 8 core CPU Total number of tables is 254 and total number of tablets is 1265 with replication factor of 3, so post replication there are 3795 replicas replica distribution across all the tablets is even 759 per tablet server --block-cache-size is 2 GB --memory-hard-limit 50GB For running queries on top of Kudu we use Impala. For Ingesting data to kudu we use a multi threaded Application built on top of Kudu Java Client We ingest data to kudu in real time using the application which we have built on Java Client. Under normal load memory usage on all the tablets is between 35-40 GB Issue arises when we backfill the data for some of the tables. Lets say we are back filling the data for a table say table_v1 from last two years. There can be around 200 - 300 million records which will get updated(we are using kudu clients upsert method). The rate of ingestion to kudu during this back fill process is around 5 million records per hour. Well after this the memory usage on all the kudu tablet servers increases by 7-10 GB which is expected, but it never comes down after the back fill is done. We have even tried to stop all the reads and writes to kudu but the memory usage on the tablet servers never come down. So what can be the possible reasons for memory usage not going down. What can I do to bring the memory usage down? Due of this we often hit the memory threshold and kudu stops accepting the writes and we are forced to restart kudu.
... View more
Labels:
- Labels:
-
Apache Kudu
11-28-2019
07:19 AM
Hey @EricL Apologies for terribly delayed response. The reason why we don't want to install kerberos is that we are using a third party BI tool which connect to our DW Infrastucture(consisting of Kudu, Hive Metastore and Impala) and the tool does not support kerberos authentication. We are heavily dependent on the on the BI tool which we use and cannot do away with it. Sentry enforces to use kerberos for authentication while Ranger does not enforce this as it supports other authentication mechanisms as well. That's why we are waiting eagerly for cloudera to release a version of Impala(supported by 3.3 latest released by cloudera is 3.2) which supports Ranger. Also it will be great if you can suggest any other solution enforcing the Access controls which does not involve kerberos. Regards Parth
... View more
09-26-2019
08:20 AM
Hello Team,
We were planning to implement Role-Based-Access-Control(RBAC) for our impala tables we tried Sentry but there is a prerequisite of installing Kerberos which we don't want to use. So Apache Ranger is the other option we have but the latest CDH version 6.3 has Impala 3.2 which does not support Ranger. So just wanted to if there is going to be release of new CDH bundle which will have Impala 3.3.0 and support for ranger? What will be approximate release date of the new CDH bundle?
Regards
Parth
... View more
Labels:
09-25-2019
07:59 AM
Hello Team,
We are planning to use Sentry for implementing authorization for Impala Tables. In prerequisites for installing Sentry last point suggest to implement Kerberos authentication but point number 6 mentions that either Kerberos or LDAP is required. This is a bit confusing, can you help in clarifying if it is necessary to implement kerberos authentication for Configuring Sentry.
Regards
Parth
... View more
Labels:
- Labels:
-
Apache Impala
-
Apache Sentry
09-03-2019
08:29 AM
Hello Team,
We use Impala and Hue to query the Kudu tables. We have now installed Sentry for access control, using which we are now able to restrict a users access to a particular column of the table. I wanted to know If we can put restriction to access a particular range partition of the table. For example if Table T1 has five columns say c1,c2, c3, c4 and c5 and 3 Range partitions p1, p2 and p3. As of now using sentry I am able to restrict access only to columns c1, c2 and c5. I also wanted to know if I can restrict access of user to only partition p1.
Regards
Parth
... View more
Labels:
04-29-2019
08:58 AM
Thanks wdberkeley I will try out the options suggested by you. Can you suggest the best practices or the some options to do the following. My kudu cluster managed through Cloudera Manager and CDH is currently hosted in AWS I want to move it from AWS to Azure or someother cloud service provider. I will try options suggested by you for the above scenario but is there any other thing that you would like to suggest. Regards Parth
... View more