Member since
04-08-2019
37
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2630 | 10-11-2020 11:53 PM |
02-09-2021
07:30 AM
but CDH 6.3.2 and cloudera manager 6.3.1 are still not close source. There should be atleast some way to download it openly
... View more
02-08-2021
02:18 AM
@GangWar It worked. Thanks
... View more
12-22-2020
06:24 AM
@Tim Armstrong Thanks for helping out here. My apologies for mis-understanding w.r.t packing information.
... View more
12-16-2020
12:11 PM
1 Kudo
In that case - scheduling of remote reads - for Kudu it's based on distributing the work for each scan across nodes as evenly as possible. For Kudu we randomize the assignment somewhat to even things out, but it's distribution is not based on resource availability. I.e. we generate the schedule and then wait for the resources to become available on the nodes we picked. I understand that reversing that (i.e. find available nodes, then distribute work on them) would be desirable in some cases but there are pros and cons of doing that. For remote reads from filesystems/object stores, on more recent versions, we do something a bit different - each file has affinity to a set of executors and we try to schedule it on those so that we're more likely to get hits in the remote data cache.
... View more
10-15-2020
04:13 AM
@Tim Armstrong it worked like charm after changing the gcc version. Thanks
... View more
05-14-2020
09:44 AM
Hi @parthk ,
I'm happy to see you have found the resolution to your issue. Can you kindly mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future?
Thanks,
Vidya
... View more
04-14-2020
05:52 AM
@Tim Armstrong thanks for pointing this out. We have observed that in our case the memory usage on the coordinator is not that high having co-ordinator of same size as executor will lead to under utilisation of resources on co-ordinator. Or we can have multiple (8) executors of smaller size lets say 32 GB instead of two with 128GB. Please share your thoughts about it
... View more
11-28-2019
05:19 PM
1 Kudo
Hi @parthk, No problems. Impala + Ranger is under construction for CDP release. From what I can see Phase one is done and there are a few more phases to go through. So it is still early stage and I do not have ETA. You probably just have to wait and ask the question again a few months down the track. Cheers Eric
... View more
09-25-2019
10:59 AM
1 Kudo
@parthk You can definitely use sentry for RBAC type of style in Impala you don't really need Kerberos but it's highly advised to have Kerberos why??? If you know historical sentry has been the weakest link in the security architecture of Cloudera that's the reason it was dropped in favor of Ranger in the upcoming new offering CDP. Having said that sentry role-based access Control (RBAC) is an approach to restricting system access to authorized users whereas Kerberos using keytabs is like a biometric passport where the password is only know to the keytab and principal that allows a process (a client) running on behalf of a principal (a user) to prove its identity to a verifier (an application server, or just server) without sending data across the network that might allow an attacker or the verifier to subsequently impersonate the principal. Kerberos optionally provides integrity and confidentiality for data sent between the client and the server. You can safely build your cluster without Kerberos especially for self-study and development but not for production. There are 2 types of Kerberos setup MIT and AD Active Directory is a directory services implementation that provides all sorts of functionality like authentication, group and user management, policy administration and more in a centralized manner. LDAP (Lightweight Directory Access Protocol) is an open and cross-platform protocol used for directory services authentication hence the pointer in the Cloudera documentation to use LDAP/LDAPS HTH Happy hadooping
... View more
04-29-2019
09:42 AM
For now. I think the two possible methods I outlined might work. Additionally, you could export the data to something like parquet or avro, using Spark or Impala, and then reload the data in the new cluster.
... View more