About mqureshi

mqureshi · ‎08-04-2016

@Ram D What is the value of dfs.balance.bandwidthPerSec? What version of Hadoop are you using? There is also protection built in to limit how many blocks can be moved. This is important because you may have other processes running and moving all the data fast will impact those processes. What is the value of dfs.datanode.balance.max.concurrent.moves? A low value will impact how many threads can concurrently move data. You can increase concurrency but be mindful of the bandwidth you will use and that it is less than your configured dfs.balance.bandwidthPerSec.

mqureshi · ‎08-04-2016

@John Jackson This feature is currently not available but will be shipped with Hadoop 3.0. You will be able to have up to 5 name nodes. Please see the following JIRA. https://issues.apache.org/jira/browse/HDFS-6440

mqureshi · ‎08-04-2016

@Charles Chen Well, I cannot write code for you, but have you looked at this? Everything you are looking for is available at this link below. https://hbase.apache.org/book.html#spark

mqureshi · ‎08-04-2016

@Eon kitex I cannot speak on behalf of product team. If you are a Hortonworks customer, this is something you can discuss with your account team. I am sure, they will be able to answer your questions.

mqureshi · ‎08-04-2016

@SBandaru Try adding the following to your run -D mapred.output.compress=false

mqureshi · ‎08-04-2016

@SBandaru The user who is running this job, does he have permissions to write to this location? Does directory /benchmark/TestDFSIO exists in hdfs? /benchmarks/TestDFSIO/io_write/part-00000

mqureshi · ‎08-04-2016

@Amila De Silva Please check under /var/log/hive. Log for all other engines can also be similarly found under /var/log/<hdfs,spark,hbase etc>

mqureshi · ‎08-03-2016

@Adel Ouazani As you already know edge nodes are for running your client processes. They are not running your cluster processes and usually not storing data, unless you are using edge node data ingestion and staging your data in edge node. So edge node configuration can be customized quite a bit based on your needs. I have not seen customers having separate edge nodes for each project but I don't see anything particularly wrong except that it increases the number of ways your cluster can be accessed which means increasing chances of security holes. One main consideration, however will be to make sure you have good network and bandwidth support between your cluster and all of the edge nodes. Other than that, provided reasonable resources (like CPU, disk specially if you are staging data for ingest and memory), this should be fine. I would also recommend reading the accepted answer on this thread for more details to help you make decision. https://community.hortonworks.com/questions/34872/staging-on-edge-nodes.html

mqureshi · ‎08-03-2016

@Karthik Rajamanickam I think you should be able to use the Metrics Collector API. Did you try this? https://cwiki.apache.org/confluence/display/AMBARI/Metrics+Collector+API+Specification

mqureshi · ‎08-03-2016

yes, you are using version 1.3.1.

Online	Offline
Last Visited	‎10-31-2017 03:17 AM

Member Since	‎06-07-2016 09:05 AM
Last Visited	‎10-31-2017 03:17 AM
Posts	923
Kudos received	310

Cloudera Community

Re: YARN recommended configuration

Re: How to resolve for NULL values when they are c...

Re: Why is spark has better speed than Hadoop

Re: Is it possible to assign Hadoop queues to Hado...

Re: Kafka NiFi HDF Installation

Re: Even when i ran balancer, load one data node i...

Re: Can you have multiple (more than 1) secondary ...

Re: Read HBase Table by using Spark/Scala

Re: How to upgrade to spark 2?

Re: TestDFSIO Output Error

Re: TestDFSIO Output Error

Re: Viewing logs for Hive query Executions

Re: Dedicated edge nodes

Re: How to Access / Read the Embedded Ambari Metr...

Re: Question on Spark Versioning