Member since
06-07-2016
923
Posts
322
Kudos Received
115
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4080 | 10-18-2017 10:19 PM | |
| 4335 | 10-18-2017 09:51 PM | |
| 14829 | 09-21-2017 01:35 PM | |
| 1837 | 08-04-2017 02:00 PM | |
| 2414 | 07-31-2017 03:02 PM |
08-04-2016
07:13 PM
@Ram D What is the value of dfs.balance.bandwidthPerSec? What version of Hadoop are you using? There is also protection built in to limit how many blocks can be moved. This is important because you may have other processes running and moving all the data fast will impact those processes. What is the value of dfs.datanode.balance.max.concurrent.moves? A low value will impact how many threads can concurrently move data. You can increase concurrency but be mindful of the bandwidth you will use and that it is less than your configured dfs.balance.bandwidthPerSec.
... View more
08-04-2016
06:39 PM
2 Kudos
@John Jackson This feature is currently not available but will be shipped with Hadoop 3.0. You will be able to have up to 5 name nodes. Please see the following JIRA. https://issues.apache.org/jira/browse/HDFS-6440
... View more
08-04-2016
03:42 AM
1 Kudo
@Charles Chen Well, I cannot write code for you, but have you looked at this? Everything you are looking for is available at this link below. https://hbase.apache.org/book.html#spark
... View more
08-04-2016
02:47 AM
@Eon kitex I cannot speak on behalf of product team. If you are a Hortonworks customer, this is something you can discuss with your account team. I am sure, they will be able to answer your questions.
... View more
08-04-2016
02:35 AM
1 Kudo
@SBandaru Try adding the following to your run -D mapred.output.compress=false
... View more
08-04-2016
01:52 AM
2 Kudos
@SBandaru The user who is running this job, does he have permissions to write to this location? Does directory /benchmark/TestDFSIO exists in hdfs? /benchmarks/TestDFSIO/io_write/part-00000
... View more
08-04-2016
01:45 AM
@Amila De Silva Please check under /var/log/hive. Log for all other engines can also be similarly found under /var/log/<hdfs,spark,hbase etc>
... View more
08-03-2016
09:44 PM
@Adel Ouazani As you already know edge nodes are for running your client processes. They are not running your cluster processes and usually not storing data, unless you are using edge node data ingestion and staging your data in edge node. So edge node configuration can be customized quite a bit based on your needs. I have not seen customers having separate edge nodes for each project but I don't see anything particularly wrong except that it increases the number of ways your cluster can be accessed which means increasing chances of security holes. One main consideration, however will be to make sure you have good network and bandwidth support between your cluster and all of the edge nodes. Other than that, provided reasonable resources (like CPU, disk specially if you are staging data for ingest and memory), this should be fine. I would also recommend reading the accepted answer on this thread for more details to help you make decision. https://community.hortonworks.com/questions/34872/staging-on-edge-nodes.html
... View more
08-03-2016
07:58 PM
@Karthik Rajamanickam I think you should be able to use the Metrics Collector API. Did you try this? https://cwiki.apache.org/confluence/display/AMBARI/Metrics+Collector+API+Specification
... View more