Member since
07-31-2013
1924
Posts
462
Kudos Received
311
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1969 | 07-09-2019 12:53 AM | |
| 11881 | 06-23-2019 08:37 PM | |
| 9147 | 06-18-2019 11:28 PM | |
| 10136 | 05-23-2019 08:46 PM | |
| 4581 | 05-20-2019 01:14 AM |
09-27-2015
07:13 AM
> How can I have only 68 blocks? That depends on how much data your HDFS is carrying. Is the number much less than expected, and not match the output of 'hadoop fs -ls -R /' list of all files? The space report says only about 23 MB used by HDFS, so the number of blocks look OK to me. > Also, when I run hive job, it does not go beyond "Running job: job_1443147339086_0002". Could it be related? This would be unrelated, but to resolve the issue consider raising the values under YARN -> Configuration -> Container Memory (NodeManager) and Container Virtual CPUs (NodeManager)
... View more
09-24-2015
09:56 PM
1 Kudo
(1) - While the CDH codebase does carry the 2.6 initial node-label implementation features, a lot many more changes and enhancements for node-labelling made it upstream only to 2.8, so its a feature still under some development. You can utilise the 2.6 features for certain in CDH 5.4.x, but only via the CapacityScheduler (following the upstream docs), cause the code support does exist in the sources: https://github.com/cloudera/hadoop-common/tree/cdh5.4.7-release/ (2) FairScheduler support is not in upstream yet. We do have node-labelling for FairScheduler on our roadmap for a future release, but I don't have a shareable ETA for it yet.
... View more
09-24-2015
03:58 AM
1 Kudo
The TTL values are stored as Cell-level tags [1]. To retrieve them back from their Cell, fetch the Cell via Get/etc. and then use the available Tags-relevant APIs on the Cell object: http://archive.cloudera.com/cdh5/cdh/5/hbase/apidocs/org/apache/hadoop/hbase/Cell.html#getTagsArray(), and deserialise the array of tags via http://archive.cloudera.com/cdh5/cdh/5/hbase/apidocs/org/apache/hadoop/hbase/CellUtil.html#tagsIterator(byte[],%20int,%20int) [1] - https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3483-L3486
... View more
09-24-2015
03:11 AM
You can edit this via API just as you would with the Pools API, by using JSON edits to the existing property the UI itself writes into. See https://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_props_cdh540_yarn_mr2included_.html#concept_5v9_49n_yk_unique_1__table_kqj_eb1_wk_unique_1, specifically the entry with name "Fair Scheduler Allocations" and description "JSON representation of all the configurations that the Fair Scheduler can take on across all schedules. Typically edited using the Pools configuration UI.". Use http://cloudera.github.io/cm_api/apidocs/v10/path__clusters_-clusterName-_services_-serviceName-_config.html to update this like any other config. Subsequently, call http://cloudera.github.io/cm_api/apidocs/v10/path__clusters_-clusterName-_commands_poolsRefresh.html, to update the pool configs (if you use Dynamic Resource Pools).
... View more
09-24-2015
02:34 AM
Does your WebSphere app load a custom set of configs to talk to the remote cluster? Are the JHS configs part of the config set, if so? The below properties are all necessary in having the MR2 job register itself with the JHS for post-job persistence - get these property values to precisely match with the working 'hadoop jar' command host's /etc/hadoop/conf/mapred-site.xml: mapreduce.jobhistory.address mapreduce.jobhistory.webapp.address (OR) mapreduce.jobhistory.webapp.https.address yarn.app.mapreduce.am.staging-dir
... View more
09-23-2015
07:15 AM
> Wildcard addresses is being used on datanode/namenode > dfs.client.use.datanode.hostname This is your solution here, iff your client hosts will resolve the very same DN hostname but over a different IP. Is that true in your environment? You mention you've tried this - could you elaborate? This setting needs to be applied at the HDFS client configuration, for it to be properly in effect. Is your 'edge host' that lies out of the cluster, or your Java application (if it is run standalone), configured with this set to true in its hdfs-site.xm/Configuration object?
... View more
09-22-2015
10:17 AM
Glad to know! Please consider marking the thread resolved, so others with a similar question can find a solution quicker. Feel free to post a new thread with any further questions.
... View more
09-22-2015
06:19 AM
Right, it was suggested as an optimisation aside of the summing question, given the described example. Does the bc command not solve your original question?
... View more
09-22-2015
02:26 AM
Yes that could work too (or a file with them, passed via -f or such).
... View more
09-21-2015
11:56 PM
To add onto Wilfred's response, what is your CDH version? HDFS does cache all positive entries for 5 minutes, but negative caching wasn't supported until CDH 5.2.0 onwards (via HADOOP-10755). See also http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-common/core-default.xml#hadoop.security.groups.negative-cache.secs (which lists negative caching's TTL default being 30s, vs. positive's 300s). NSCD does also do negative caching by default, which could explain why the problem is gone, depending on how many negative, WARN group-lookup failure entries you observe in the log.
... View more