Member since
03-23-2015
1288
Posts
114
Kudos Received
98
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4331 | 06-11-2020 02:45 PM | |
5933 | 05-01-2020 12:23 AM | |
3765 | 04-21-2020 03:38 PM | |
4038 | 04-14-2020 12:26 AM | |
2986 | 02-27-2020 05:51 PM |
08-29-2019
04:13 PM
Running Impala query over driver from Spark is not currently supported by Cloudera. Why don't you just use SparkSQL instead? Why need to have extra layer of impala here? Cheers Eric
... View more
08-28-2019
02:16 AM
Hi, yes,configured Sqoop gateway on both the hosts. please tell me how to run the Sqoop saved jobs in master node itself. Thanks, Akhila.
... View more
08-28-2019
12:44 AM
@ChineduLB If you go to CM > Sentry > Configuration > search for "database", you should be able to see those database options, the one you need is "Sentry Server Database Password". Plus, you also need to make sure that the username and password you used here can connect to Sentry database. Cheers Eric
... View more
08-28-2019
12:36 AM
@vinodnerella It depends based on the scenario that how much Heap you should be allocating for the Zookeeper. In your case if you are keep noticing that the Zookeeper heap is reaching to its max 1GB then it is better to increase the Zookeeper heap to a larger value and if needed then enable GC logging for zookeeper to monitor the gc usages in a period of time to findout the approximate heap that you need to setup for your zookeeper based on the environment requirement. As you have already set the Zookeeper heap to 4GB it should be good for now. We can monitor it for some time. The common cause of Zookeeper OutOfMemory can be when clients submit requests faster than ZooKeeper can process them, especially if there are a lot of clients. The it can lead to OOM errors. You can also take a look into parameters like "zookeeper.snapCount" but better to monitor Zookeeper with 4GB heap for some time before tuning such parameters.
... View more
08-19-2019
06:22 AM
>"Your cluster does sound unhappy" LOL. I'd say more like pi$$ed. 🙂 > If it's a JVM issue, we've seen in some cases that increasing heap sizes helps. Setting ipc.client.rpc-timeout.ms to 60000 I'd say it's more like a set of overloaded namenodes. And, according to the research I've done so far, for another problem, I have a theory that I can use a more up to date GC and it should increase the performance and reduce the number of Zookeeper failures we have. Our ZK failures are sporadic and happen every 2 to 3 days. Sometimes more sometimes less. Moving ZK to separate nodes is not an option at this point and I'm doing all I can to try to reduce the number of failures short of moving the service. I'll check our settings on this and see if we can do one or both. I suspect we have increased JVM heap already, but not sure? >We've also seen the file handle cache that got enabled by default in CDH5.15 help a lot in reducing namenode load, I assume this is available before this version but was not enabled by default??? I'll look it up and see... > I agree 100%. I think whoever named it was either overly optimistic and assumed there wouldn't be a significant gap in time, or it was named from the point of view of the code rather than the external system So, my question is, is there an indicator in the Query Details that indicates something was returned? I know I get an initial set of results back. Without that "fetch" metric meaning what the word actually says, I don't know what indicates how long it took to get the first set of records back??? Back to the original issue... Given that the issue appears to be the last query issued in Hue tends to show up as still executing 2.2 hours later and has already returned a count almost immediately. Obviously, the parameters for idle timeouts for sessions and queries is not marking the session as closed. Therefore appearing to still be executing: Is this causing resource issues because the session is being held open and appearing to be still executing? I would assume so as it is waiting on a fetch of subsequent sets of records??? What parameter(s) will close the session from the last query executed? Just to let you know, I've come in late to the game and am still learning CM and Cloudera Manager. I understand a lot but with 1000s of parameters for all the apps and an ample supply of problems, it'll take a while. 🙂 Thanks for all your help. It is nice to have a response on this forum. The last couple posts were not very useful. We do have a service contract and although I am one of the 3 admins, they are working on adding me to the support contract so I can put in tickets and get support directly. Until then, I appreciate the help!
... View more
08-14-2019
06:46 PM
1 Kudo
Hi Andre, Your solution is right. But my situation was little different. Below are the checks and fix I did with cloudera support helping me in the process: 1. From Hive-server2 logs we found that one of the Hiveserver2 instance is not talking to zookeeper quorum(only in case of querying Hbase data) 2. Installed Hbase-gateway services on all the Hue instances and Hiveserver2 instances. 3. restart Hbase services and Deploy client configuration. 4. Restart the Hiveserver2 instance which had the problem of trying to connect to localhost:2181 as zookeeper quorum Then tried to submit the query from beeline and Hue . All worked as expected this time.
... View more
08-14-2019
05:27 PM
For number 2, ANY changes outside of Impala, you will need INVALIDATE METADATA, or if new data added, then REFRESH will do. Work is underway to improve it: https://issues.apache.org/jira/browse/IMPALA-3124 Cheers Eric
... View more
08-13-2019
04:44 PM
Hi @Harish19, There is SSL Options button somewhere in the ODBC driver configuration window, please click through and confirm if you have SSL enabled on the client side. Cheers Eric
... View more
08-12-2019
04:50 PM
@Sona, Sorry I missed your question in May. For (1), please refer to my previous update. For (2), yes all paths that store Hive databases/tables should be managed by Hive/Sentry, so those paths should be configured under Sentry Synchronization Path Prefixes setting and need to be owned by "hive:hive". The idea of Sentry is to have everything managed by "hive" so that no one can do direct modifications without going through Hive/Sentry. Cheers Eric
... View more
08-12-2019
04:04 PM
1 Kudo
Go to RM web UI to see the amount of resources you have in your cluster and check if your job requires more than that. This can confirm you are out of resources. Cheers Eric
... View more