Member since
01-16-2014
336
Posts
43
Kudos Received
31
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3393 | 12-20-2017 08:26 PM | |
3371 | 03-09-2017 03:47 PM | |
2841 | 11-18-2016 09:00 AM | |
5007 | 05-18-2016 08:29 PM | |
3850 | 02-29-2016 01:14 AM |
06-09-2015
12:41 AM
Good to hear that this has been fixed! We have seen this issue in early CDH 5 releases but this was fixed in CMC/CDH 5.2 and later. Cloudera Manager should have deployed that configuration setting for you in the client config on all nodes. If you did not use CM then that could explain it, otherwise I am would not know how that could have happened. Wilfred
... View more
06-01-2015
05:24 AM
If you are not running the yarn command as the owner of the application you might need to add: -appOwner <username> To the yarn logs command line. If you do not have access the error you showed could be thrown. We do not distinguish between not getting access and not finishing the aggregation. Wilfred
... View more
06-01-2015
04:35 AM
There is an existing issue in releases before CDH 5.3.3 which could cause the issue to show. That issue was introduced to fix a similar issue in an earlier release. Both issues were intermittent and related to HA. Unless you are on CDH 5.3.3 or later you could be seeing one of those. Wilfred
... View more
05-28-2015
03:50 AM
Sorry, this slipt through the cracks. If you have already turned of the ACL then you should be able to get the logs via the command line. Run yarn logs -applicationId <APPLICATION ID> That should return the full log and also follow the normal process through all the proxies and checks to get the files and we should be able to hopefully tell what is going on in more detail. Wilfred
... View more
05-26-2015
06:11 PM
In CM & CDH 5.4 you should unset it and let it use the one that is there on the nodes. Much faster. Wilfred
... View more
05-25-2015
06:45 PM
Why are you using SparkFiles? The path that you try to open is not defined because SparkFiles expects paths to files added through SparkContext.addFile(). Unless you have done that you should be using sc.textFile() and pass in the URI for the file (hdfs://... or something like it) Wilfred
... View more
05-25-2015
06:34 PM
In a recent version (CM/CDH 5.4 as an example) the directory should just look like what you have now. We do not push the assembly separately any more. It uses the assembly installed on the nodes, by default, that is faster than using the one from HDFS.The setting is still there to allow custom assemblies to be used. The setting should be entered without the HDFS in front and the path will be pushed out with HDFS in front (CM will handle that for you). Which version of CDH and CM are you using? Wilfred
... View more
05-25-2015
05:13 PM
1 Kudo
A1: check the Hdfs Design page for details on what is stored were. The edits log and file system image are on the NN. Look for the section on persistence on file system data. For more detail on setting up the cluster follow Cluster Setup. A2: if you have the disk then having a mirrored disk will make it more resilient. Making a backup is still a good idea 😉 Wilfred
... View more
05-22-2015
08:42 AM
1 Kudo
On the master node HDFS will store things like the FSImage, edit file and other relevant files on the disk. Not huge but it needs quick access. For the DN: - Even or odd does not matter, it can handle what you give it. - The number of spindles (disks) is important for the number of containers you can run on the host. We normally say about 2 containers per disk can be supported. Since you have a large number of cpu cores and a lot of memory having a larger number of disks will allow you to run more containers on the node. Decreasing the number of disks means you also should lower the number of containers. Looking at the cpu cores and disks: they seem to be nicely balanaced the way you have it now with the 300GB disks. Wilfred
... View more
05-21-2015
11:18 PM
1 Kudo
You do not need to mirror the disks (beside OS) if you are running HDFS HA. On the master nodes: get one disk just for HDFS and you can store all logs on the other disk. One disk for HDFS will get you the best performance since writes are synchronous to that disk. Also make sure that the CM services store logs and DB's on the disk that does not have HDFS on it. On the DATA nodes If you have 2 disks for OS (mirrored) and you thus have 300 GB available I would not use the other 300 GB for apps and logs. Add those 2 disks to your HDFS disks. The logs and apps can live on the OS disk on those nodes. If you are going to use Spark make sure that you use Spark on YARN. We recommend using that instead of using the stand alone mode saves resources and it has been tested far better. We do have recommendations about vcores/mem/disks in our yarn tuning documentation Wilfred
... View more