About EricL

EricL · ‎08-29-2019

Running Impala query over driver from Spark is not currently supported by Cloudera. Why don't you just use SparkSQL instead? Why need to have extra layer of impala here? Cheers Eric

Nekkanti · ‎08-28-2019

Hi, yes,configured Sqoop gateway on both the hosts. please tell me how to run the Sqoop saved jobs in master node itself. Thanks, Akhila.

EricL · ‎08-28-2019

@ChineduLB If you go to CM > Sentry > Configuration > search for "database", you should be able to see those database options, the one you need is "Sentry Server Database Password". Plus, you also need to make sure that the username and password you used here can connect to Sentry database. Cheers Eric

jsensharma · ‎08-28-2019

@vinodnerella It depends based on the scenario that how much Heap you should be allocating for the Zookeeper. In your case if you are keep noticing that the Zookeeper heap is reaching to its max 1GB then it is better to increase the Zookeeper heap to a larger value and if needed then enable GC logging for zookeeper to monitor the gc usages in a period of time to findout the approximate heap that you need to setup for your zookeeper based on the environment requirement. As you have already set the Zookeeper heap to 4GB it should be good for now. We can monitor it for some time. The common cause of Zookeeper OutOfMemory can be when clients submit requests faster than ZooKeeper can process them, especially if there are a lot of clients. The it can lead to OOM errors. You can also take a look into parameters like "zookeeper.snapCount" but better to monitor Zookeeper with 4GB heap for some time before tuning such parameters.

pollard · ‎08-19-2019

>"Your cluster does sound unhappy" LOL. I'd say more like pi$$ed. 🙂 > If it's a JVM issue, we've seen in some cases that increasing heap sizes helps. Setting ipc.client.rpc-timeout.ms to 60000 I'd say it's more like a set of overloaded namenodes. And, according to the research I've done so far, for another problem, I have a theory that I can use a more up to date GC and it should increase the performance and reduce the number of Zookeeper failures we have. Our ZK failures are sporadic and happen every 2 to 3 days. Sometimes more sometimes less. Moving ZK to separate nodes is not an option at this point and I'm doing all I can to try to reduce the number of failures short of moving the service. I'll check our settings on this and see if we can do one or both. I suspect we have increased JVM heap already, but not sure? >We've also seen the file handle cache that got enabled by default in CDH5.15 help a lot in reducing namenode load, I assume this is available before this version but was not enabled by default??? I'll look it up and see... > I agree 100%. I think whoever named it was either overly optimistic and assumed there wouldn't be a significant gap in time, or it was named from the point of view of the code rather than the external system So, my question is, is there an indicator in the Query Details that indicates something was returned? I know I get an initial set of results back. Without that "fetch" metric meaning what the word actually says, I don't know what indicates how long it took to get the first set of records back??? Back to the original issue... Given that the issue appears to be the last query issued in Hue tends to show up as still executing 2.2 hours later and has already returned a count almost immediately. Obviously, the parameters for idle timeouts for sessions and queries is not marking the session as closed. Therefore appearing to still be executing: Is this causing resource issues because the session is being held open and appearing to be still executing? I would assume so as it is waiting on a fetch of subsequent sets of records??? What parameter(s) will close the session from the last query executed? Just to let you know, I've come in late to the game and am still learning CM and Cloudera Manager. I understand a lot but with 1000s of parameters for all the apps and an ample supply of problems, it'll take a while. 🙂 Thanks for all your help. It is nice to have a response on this forum. The last couple posts were not very useful. We do have a service contract and although I am one of the 3 admins, they are working on adding me to the support contract so I can put in tickets and get support directly. Until then, I appreciate the help!

nanda_bigdata · ‎08-14-2019

Hi Andre, Your solution is right. But my situation was little different. Below are the checks and fix I did with cloudera support helping me in the process: 1. From Hive-server2 logs we found that one of the Hiveserver2 instance is not talking to zookeeper quorum(only in case of querying Hbase data) 2. Installed Hbase-gateway services on all the Hue instances and Hiveserver2 instances. 3. restart Hbase services and Deploy client configuration. 4. Restart the Hiveserver2 instance which had the problem of trying to connect to localhost:2181 as zookeeper quorum Then tried to submit the query from beeline and Hue . All worked as expected this time.

EricL · ‎08-14-2019

For number 2, ANY changes outside of Impala, you will need INVALIDATE METADATA, or if new data added, then REFRESH will do. Work is underway to improve it: https://issues.apache.org/jira/browse/IMPALA-3124 Cheers Eric

EricL · ‎08-13-2019

Hi @Harish19, There is SSL Options button somewhere in the ODBC driver configuration window, please click through and confirm if you have SSL enabled on the client side. Cheers Eric

EricL · ‎08-12-2019

@Sona, Sorry I missed your question in May. For (1), please refer to my previous update. For (2), yes all paths that store Hive databases/tables should be managed by Hive/Sentry, so those paths should be configured under Sentry Synchronization Path Prefixes setting and need to be owned by "hive:hive". The idea of Sentry is to have everything managed by "hive" so that no one can do direct modifications without going through Hive/Sentry. Cheers Eric

EricL · ‎08-12-2019

Go to RM web UI to see the amount of resources you have in your cluster and check if your job requires more than that. This can confirm you are out of resources. Cheers Eric

Online	Offline
Last Visited	‎08-12-2020 03:17 AM

Member Since	‎03-23-2015 01:24 PM
Last Visited	‎08-12-2020 03:17 AM
Posts	1,288
Kudos received	113

Cloudera Community

Re: max() function generating an error in sqoop

Re: Add a dynamic variable to a Hive view

Re: Hive Server 2 failing to start CDP ,Cloudera M...

Re: Sqoop export from hive to teradata - > issue ...

Re: Cloudera Hadoop internal workings

Re: Spark sql with impala on kerberos returning on...

Re: How to do Sqoop Incremental Import through Ooz...

Re: Upgrade Sentry Database Tables fails after upg...

Re: Zookeeper Heap Space issue

Re: impala running jobs does not finish in time

Re: Zookeeper client connecting to localhost:2181 ...

Re: When I have to Refresh / Invalidate Metadata a...

Re: Impala ODBC connection fails with ssl errors.

Re: hive impersonation and sentry

Re: mapreduce job stuck at map 0% reduce 0%