Member since
07-31-2013
1924
Posts
462
Kudos Received
311
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1542 | 07-09-2019 12:53 AM | |
9287 | 06-23-2019 08:37 PM | |
8049 | 06-18-2019 11:28 PM | |
8676 | 05-23-2019 08:46 PM | |
3473 | 05-20-2019 01:14 AM |
08-21-2014
09:52 PM
1 Kudo
Yes, the user group of impala node and hive nodes are the same. I finally got the answer of my question. If I set "hive.sentry.restrict.defaultDB" to true in sentry-site.xml, the behavior of impala and hive will be the same. Because the default value of "hive.sentry.restrict.defaultDB" is false by default. Refer line 48 of HiveAuthzConf.java of sentry source code.
... View more
08-18-2014
03:29 PM
We are still experiencing periodic problems with applications hanging when a number of jobs are submitted in parallel. We have reduced 'maxRunningApps', increased the virtual core count, and also increased 'oozie.service.callablequeueservice.threads' to 40. In many cases, the applications do not hang, however this is not consistent. Regarding YARN issue number 1913 (https://issues.apache.org/jira/browse/YARN-1913), is this patch incorporated in CDH 5.1.0, the version we are using? YARN-1913 indicates the affected version is 2.3.0, and is fixed in 2.5.0. Our Hadoop version in 5.1.0 is 2.3.0. Thank you, Michael Reynolds
... View more
08-12-2014
09:43 PM
I fix the problem. I come to know that it is not a flume issue it is purely HDFS issue, then i done the below steps step1: stop all the services Step2: started name node then when am trying to start the data nodes on the 3 servers,one of the server throwig the error message /var/log/ ----No such file/directory /var/run --No such file/directory But these files are existing so i check the permissions on those two differ from second server to third server So given the permission to those directories to be in sink and then started all the services then flume working fine, thats it. -Thankyou
... View more
08-06-2014
01:09 AM
We upgraded ou cluster to CDH 5.1.1 and the problem disappeared.So currently I cannot reproduce the problem. Thanks for the tip with webhdfs.
... View more
08-05-2014
08:34 PM
You are correct though that this does not exist as a current feature. Please consider filing a HBASE project JIRA upstream requesting (implementation patches welcome too!) this at https://issues.apache.org/jira/browse/HBASE.
... View more
07-27-2014
06:43 AM
1 Kudo
(1) The "driver" part of run/main code that sets up and submits a job executes where you invoke it. It does not execute remotely. (2) See (1), cause it invalidates the supposition. But for the actual Map and Reduce code execution instead, the point is true. (3) This is true as well. (4) This is incorrect. All output "collector" received data is stored to disk (in an MR-provided storage termed 'intermediate storage') after it runs through the partitioner (which divides them into individual local files pertaining to each target reducer), and the sorter (which runs quick sorts on the whole individual partition segments). (5) Functionally true, but it is actually the Reduce that "pulls" the map outputs stored across the cluster, instead of something sending reducers the data (i.e. push). The reducer fetches its specific partition file from all executed maps that produced one such file, and merge sorts all these segments before invoking the user API of reduce(…) function. The merge sorter does not require that the entire set of segments fit into memory at once - it does the work in phases if it does not have adequate memory. However, if the entire fetched output does not fit into the alloted disk of the reduce task host, the reduce task will fail. We try a bit to approximate and not schedule reduces on such a host, but if no host can fit the aggregate data, then you likely will want to increase the number of reducers (partitions) to divide up the amount of data received per reduce task as a natural solution.
... View more
07-21-2014
01:38 PM
I initially found this confusing, because the Python library for the Cloudera Manager API lacks helper functions for this API endpoint. Nonetheless, it is easy to implement the API call in Python. I will look into adding a helper class to the open-source Python library. HOST = 'myhost'
CLUSTER_NAME = 'mycluster'
SERVICE = 'mapreduce1'
ACTIVITY_ID = 'your_activity_job_id'
parameters = 'clusters/%s/services/%s/activities/%s/metrics' % (
CLUSTER_NAME, SERVICE, ACTIVITY_ID)
url = '%s:7180/api/v1/%s' % (HOST, urllib.quote(parameters))
r = requests.get(url,auth=(USERNAME, PASSWORD))
print r.json()
... View more
07-21-2014
10:36 AM
Thank you Harsh for your email !!! i was hitting below issue, I increased this "dfs.image.transfer.timeout" and it fixed the issue. https://issues.apache.org/jira/browse/HDFS-4301 Checkpoint was working fine but the issue started when my fsimage size reached 2.1GB. Best Regards, Bommuraj
... View more
07-21-2014
10:27 AM
Thank you Harsh. its working !!!
... View more
07-20-2014
09:30 AM
5 Kudos
With CDH4 and CDH5 there's no longer a 'HADOOP_HOME' env-var. It has been instead renamed to 'HADOOP_PREFIX', which for a default parcel environment can be set to /opt/cloudera/parcels/CDH/lib/hadoop.
... View more