About saranvisa

DanielWhite · ‎08-27-2018

The fix for this was to upgrade CDH from 5.5.1 to 5.13.3

saranvisa · ‎08-21-2018

@rupertlssmith You have to initialize sc depends upon how you are executing your code. If you are using spark-shell command line then you don't need to initilize sc as it will be initialized by default when you login but if you are developing code in other 3rd party tools and executing then you have to initilialize as follows: You can add the below lines before you call rddFromParquetHdfsFile import org.apache.spark.SparkConf import org.apache.spark.SparkContext val conf = new SparkConf().setAppName("your topic").setMaster("yarn-client") val sc = new SparkContext(conf)

ebeb · ‎08-13-2018

This issue was resolved by following the instructions in this site: http://vijayjt.blogspot.com/2016/02/how-to-connect-to-kerberised-chd-hadoop.html We need to copy the Java JCE unlimited strength policy files and the krb5.conf file under jdk/jre/lib/security folder where SQL Developer is installed. After this the Hive connection via Kerberos was successful.

kratka · ‎08-13-2018

assigning resources under resources tab will be per host. We need to limit cpu at impala cluster level. And we are not using static pools we are using dynamic pools cpu.shares will not be helpful i believe.

saranvisa · ‎08-02-2018

@vratmuri Oh then you can use cloudera API Link for cloudera API reference: https://www.cloudera.com/documentation/enterprise/5-9-x/topics/cm_intro_api.html Link for specific to service properties (you may need to explore little for impala query). It may help you https://www.cloudera.com/documentation/enterprise/5-9-x/topics/cm_intro_api.html#xd_583c10bfdbd326ba--7f25092b-13fba2465e5--7f20__example_txn_qcw_yr

saranvisa · ‎07-31-2018

@chriswalton007 There are different types of latencies Namenode RPC latency Journal Node FSync latency network latency etc There are few points here, 1. Your network latency will vary based on the traffic in your cluster. It may create trouble during peak hours. 2. The latency issue may leads to follows: As we know, the master daemons will always wait for the update from child daemons in every few seconds, and master will consider the child is not available/dead in case of any delay and look for alternate. It is unnecessary unless it is a real issue with child. 3. As far as NN RPC is the concern, in a HA cluster, both active and standby NN has to talk each other and it should be in sync with in few seconds. If they are not in sync and if something went wrong on active NN, the standby become active but it may not be up to date and it will lead to confusion. end of the day, every seconds are matter in a distributted cluster. But in your use case, not sure you are going to use cloudera director if so, the link that you have shared says it will not allow to create a mixed cloud/on-premise cluster. But if you are going to use a different tool and it will allow you to configure the mixed cloud/on-prem then you can go ahead based on the below... 1. If you are going to try this for a non-prod environment first 2. if you have less work loads

HJ · ‎07-27-2018

thank you for the solution, i have a sql query which has group by roll up clause and it goes like following. from tablename group by rollup(decode(-----some expression here-----),decode(some expression here),decode (some expression here)) as i said i have to modify the query where the query work on kudu using impala shell, when i performed the query i got the follwing exception, ERROR: AnalysisException: default.rollup() unknown please help me with what has to be done in this case thank you

bgooley · ‎07-24-2018

@martinbo, As mentioned by others, there are some options to ease the management of users and groups. Common ones are: 1 - SSSD, IPA, Centrify OS level integration so that application calls to the OS are handled by those apps to make queries to a central LDAP source. This requires a good deal of configuration, but it is a robust, enterprise-grade solution 2 - Manage your group and passwd files with automation tools like puppet, chef, etc. (mod once, "push out" changes to all hosts) 3 - Configure LdapGroupsMapping in HDFS so that hadoop services will do group lookups directly to LDAP. NOTE: If you intend on letting users run jobs directly on YARN, you will still need to create local users on each host with a NodeManager since contains require the os user to be present.

saranvisa · ‎07-19-2018

@scratch28 You can use clouder navigator to generate this report Login as full admin to CM Cloudera Management Service -> Navigator Metadata Server -> Cloudera Navigator (menu) -> search for 'impala' -> choose from left side options. You can choose upto last 365 days (or) custom period

omaritec · ‎07-18-2018

Ok I understand your point but what if mappers are failing ? Yarn already sets up as many mappers as files number, should I increase this more ? Since only a minority of my jobs are failing, how can I tune yarn to use more mappers for these particular jobs?

Online	Offline
Last Visited	‎08-10-2019 05:12 PM

Member Since	‎09-02-2016 11:35 AM
Last Visited	‎08-10-2019 05:12 PM
Posts	523
Kudos received	96

Cloudera Community

Re: Promoting Metadata

Re: Mix on premise and cloud nodes

Re: impala-shell

Re: How do I see user usage stats by table in Impa...

Re: Replica Not FoundException

Re: BDR jobs don't replicate or replicate very lit...

Re: Tutorial Exercise 3 - not found: value sc

Re: Kerberos authentication with hive JDBC driver

Re: control cpu usage by impala

Re: Create Hive Table from JSON Files

Re: Mix on premise and cloud nodes

Re: impala-shell

Re: Sentry + Kerberos + Impala : manage users

Re: How do I see user usage stats by table in Impa...

Re: Map and Reduce Error: Java heap space