Member since
07-15-2014
57
Posts
9
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6733 | 06-05-2015 05:09 PM | |
1625 | 12-19-2014 01:03 PM | |
3163 | 12-17-2014 08:23 PM | |
8102 | 12-16-2014 03:07 PM | |
13603 | 08-30-2014 11:14 PM |
12-09-2015
07:32 AM
The role level APIs carry the state, but you're querying the service level. Use the role IDs from the service level to then query the roles directly.
... View more
01-17-2015
03:59 PM
1 Kudo
Cloudera doesn't have an official position on this tool yet. The project page does mention that Hadoop 2.0 is not supported, so I doubt it will work with CDH 5.2.1. You're of course welcome to provide feedback if you do try it. Note that Cloudera Manager does allow you to monitor several metrics across the cluster. Maybe if you had specific questions on tasks, we can assist without having to use an extra tool.
... View more
01-06-2015
11:56 PM
I looked at the URL which you gave me and its still very thin for Service providers. it just says Service providers deliver Apache Hadoop-based solutions and services to end users interested in leveraging Cloudera’s training resources, support and brand. * How is the support offered to a partner any different from normal cludera support which I get if I purchase Data Hub Edition? * Are there regular training events? or just 1 training which you give at the time of signing up the partner agreement? If there are perioidic trainings then is there a schedule for these partner trainings? * Can you eloborate on what "brand" means above?
... View more
12-29-2014
06:22 PM
When managing multiple clusters with the same instance of Cloudera Manager, each cluster can be on a different version and can be upgraded independent of each other. While having all three environments managed by the same Cloudera Manager seems attractive, it doesn't offer the flexibility of having three truly independent clusters, each with its own Cloudera Manager instance.
... View more
12-24-2014
02:57 PM
I had to delete the directories in HDFS manually It could be that kite-dataset delete command only does a logical delete. this means it only removes the metadata. anyways. doing kite-dataset delete and then a manual delete in HDFS works for me.
... View more
12-19-2014
01:03 PM
1 Kudo
I figured out. there is a directory called /dfs I deleted this directory on each machine and then retried. it worked perfectly
... View more
12-17-2014
08:23 PM
OK. I resolved the problem. on hd3work the call to python -c 'import socket; socket.gethostbyname(socket.getfqdn()) was failing. I troubleshooted for a very long time and could not figure out why this problem is occuring. so I deleted the network adapter of the VM and recreated the network adapter. then suddenly it started to work.
... View more
12-16-2014
03:07 PM
this command did not solve the issue. So I deleted my cluster and rebuilt it using the right IP addresses.
... View more
07-29-2014
04:35 PM
Thank you so much. Your answer is absolutely correct. I went to each server and did nn1: service zookeeper-server init --myid=1 --force nn2: service zookeeper-server init --myid=2 --force jt1: service zookeeper-server init --myid=3 --force earlier I had chosen an ID of 1 on every machine. I also corrected my zoo.cfg. to ensure right entries. Now it works and I am able to do sudo -u hdfs hdfs zkfc -formatZK Thank you so much!
... View more
07-27-2014
06:43 AM
1 Kudo
(1) The "driver" part of run/main code that sets up and submits a job executes where you invoke it. It does not execute remotely. (2) See (1), cause it invalidates the supposition. But for the actual Map and Reduce code execution instead, the point is true. (3) This is true as well. (4) This is incorrect. All output "collector" received data is stored to disk (in an MR-provided storage termed 'intermediate storage') after it runs through the partitioner (which divides them into individual local files pertaining to each target reducer), and the sorter (which runs quick sorts on the whole individual partition segments). (5) Functionally true, but it is actually the Reduce that "pulls" the map outputs stored across the cluster, instead of something sending reducers the data (i.e. push). The reducer fetches its specific partition file from all executed maps that produced one such file, and merge sorts all these segments before invoking the user API of reduce(…) function. The merge sorter does not require that the entire set of segments fit into memory at once - it does the work in phases if it does not have adequate memory. However, if the entire fetched output does not fit into the alloted disk of the reduce task host, the reduce task will fail. We try a bit to approximate and not schedule reduces on such a host, but if no host can fit the aggregate data, then you likely will want to increase the number of reducers (partitions) to divide up the amount of data received per reduce task as a natural solution.
... View more
- « Previous
-
- 1
- 2
- Next »