Member since
08-16-2016
642
Posts
131
Kudos Received
68
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3157 | 10-13-2017 09:42 PM | |
5407 | 09-14-2017 11:15 AM | |
2808 | 09-13-2017 10:35 PM | |
4461 | 09-13-2017 10:25 PM | |
5083 | 09-13-2017 10:05 PM |
05-30-2017
03:01 PM
2 Kudos
Well, that topic is pretty broad. Let me try to help get you going. Hadoop and all the tools that run on it will use Kerberos authentication now that it is configured. User and group mappings will still be handled at the OS level though. So if you do not have LDAP and have it integrated at the OS level you will need to create local users and groups in the OS on all nodes. Second, both CM and HUE have their own auth backend configuration and authorization. You can integrate both with LDAP, if you have it, or SAML/SSO. If not, you will need to create the users in those systems as well. The HUE users need to match the first portion, the username, of the principal in the KDC. Now for the specific item you mentioned the HUE admin not having access to the Security app. HUE has its own groups and permissions. So you will need to add that user to have access to the Security app. You will need a HUE superuser account to do this (this is probably the HUE admin you mentioned).
... View more
05-30-2017
02:05 PM
@csguna The ZKFC is the NN HA Failover Controller that uses ZK for fencing. The timeout would be for the ZK client within ZKFC. In my experience I have found that I need to set all ZK clients timeouts to 30 seconds, up from the default of 15 seconds. I believe that future releases have 30 seconds as the default anyway. I am having trouble finding the exact setting name. I'll keep looking and come back when I find it, if someone else hasn't first.
... View more
05-30-2017
01:57 PM
I'll look at this some more later, but it at least seems like your DN heap may be too low for the workload. What is it set to? How many total blocks do you have on the cluster? How many blocks worth of data are touched by this job on both the read and write side? You also could just try increasing the heap to see if the GC pauses disappear and if the performance stabilizes. Those pauses could be the source as they are 3, 6, and 1 second pauses were nothing else was occurring. That isn't all of the difference but it does indicate that HDFS performance was degraded overall.
... View more
05-28-2017
11:48 PM
You should set the location for the table. If you don't want to move the data then set it to /user/Cloudera/QSM. Setting it to another location but still outside of the warehouse will still cause the data to move from the original to the tables location. On the last statement, are you saying that after loading the data in the table it was no longer in the original location but you also weren't getting data returned from the table?
... View more
05-26-2017
12:50 PM
This post covers it pretty well. The short is that your cluster entered an inconsistent state for NN HA due to some other issue, which the post details. https://community.hortonworks.com/questions/41255/how-to-debug-the-issue-ipcs-epoch-x-is-less-than-t.html
... View more
05-26-2017
12:32 PM
Have you examined the job metrics in depth? Where is it spending more time? On a specific stage? All stages? I know you mentioned it operating on the same amount of data. It can still end up shuffling more data around. Is it? Is it spending more time in GC? That should help narrow down where on the cluster you should be looking. You just call it a Spark transformation but what is done; just the high level work down during each stage. What services does it interact with. i.e. is it just reading and writing to HDFS?
... View more
05-25-2017
10:47 AM
Where did you load the files in HDFS in the first step? You did not specify a location in the create external table.. I believe it then defaults to the warehouse directory. The load data inpath does move the data from the path specified to the tables location. I think it move it from /user/cloudera/CPC/QSM/QSM_MarToApr2016.csv to /user/hive/warehouse/abc/...
... View more
05-25-2017
10:34 AM
Best practice is to include some information from the link provide as it may not work or not exist in the future. I have only ever seen this type of behavior when dynamic allocation is enabled and attributed it to that. I believe that setting the number of executors should override that and not use dynamic allocation. I may be wrong in that or it may be misbehaving. Have you tried turning off dynamic allocation altogether? Are other jobs being launched around the time of the start of the decay?
... View more
05-24-2017
02:37 PM
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html I recommend opening a new topic you have any other questions on storage pools. That way this discussion can stay on topic.
... View more
05-24-2017
02:10 PM
@naveen this is a threshold within Cloudera Manager and has no effect on how HDFS writes and replicas the data. This threshold just triggers a Warning and Critical alert in CM once it is exceeded. A cluster with different sized nodes will always have the smaller DNs fill up faster as HDFS doesn't know better. You could use storage pools to manage it and keep the distribution in check.
... View more