About mqureshi

mqureshi · ‎02-03-2017

HDP is Apache Hadoop and its suite of products (HDFS, MR, YARN, Zookeeper, HBase, Hive etc.). In the manual install you don't need Ambari. That's why it's all manual. Ambari manages HDP that it installs itself. If you do manual, then you need to other tools to monitor and manage it. When you do manual install and say "yum install hadoop hadoop-hdfs hadoop-libhdfs hadoop-yarn hadoop-mapreduce hadoop-client openssl", where do you think these packages are being installed from? It is the repo that you setup in the following step when you configure remote HDP repositories. So all this is actually part of HDP. http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_installing_manually_book/content/config-remote-repositories.html If you have a followup question, please add a comment instead of posting as a new answer.

mqureshi · ‎02-03-2017

Starting from version 2.1, you should see better read performance for colocated clients. For write, not so much. It's the read that will be faster because the client is on the same machine as the data block. If my answer helped, please accept.

mqureshi · ‎02-03-2017

whether the 'edge node' is a datanode? No. You can if you want, put edge processes like client configs to run client programs on the same node as data node but that doesn't make data node an edge node. Ideally this is not recommended but if you have very small cluster, then sure, no problem with that.

mqureshi · ‎02-03-2017

is it simply a machine with hadoop software to facilitate interaction with hdfs? yes.

mqureshi · ‎02-03-2017

@Avijeet Dash I would recommend reading the following link: http://www.solrtutorial.com/basic-solr-concepts.html First to answer your question, you cannot keep your data in HBase/HDFS and create an index in SOLR to search that data. SOLR will search its own index. Here is the concept: Data stored in SOLR is called documents (an analogy from database world is that each document is a row in a table). Before you can store data in SOLR, you will have to define a schema in a file called schema.xml (similar to a table schema in a database). This is where you specify whether your field (think like a column in a database) is indexed as well as stored. I know you understand index which is what SOLR uses to search. Bu what the hell is "stored". Well, are you only going to get back the indexed fields? Assume a document with 50 fields. May be you want to search only on 5 of the fields. And when you get the result back of your search, you probably want more than the indexed field. So you get back your stored fields. The more fields you store and index, the higher storage requirements. Read that link and you'll have a good idea. And to reiterate my earlier point, no, you cannot have data in HDFS/HBase and index from SOLR. SOLR is a complete solution. SOLR can use HDFS to store and index its own data, but it's not going to create an index on your HBase file or your ORC/Text etc files on HDFS.

mqureshi · ‎02-03-2017

@Ganesan Vetri Like Michael mentions, files are not deleted immediately and rather moved to trash folder if you did not use "-skiptrash" option when deleting the folder. You can call the "hadoop fs -expunge" explicitly to empty trash. Even better, the folder you are trying to delete from has a subfolder called ".Trash". Just clear that up using "rm" command you'll reclaim the space. hdfs dfs -rm /path/to/trash/folder ///just like any other path. See how Trash works for better understanding: http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Space_Reclamation

mqureshi · ‎02-03-2017

Yes, your understanding is correct. Automated and recommended way is to first install Ambari and then setup cluster through Ambari.

mqureshi · ‎02-03-2017

@Avijeet Dash I agree with you. It is much more reliable if after your streaming job, your data lands in Kafka and then written to HBase/HDFS. This decouples your streaming job from writing. I wouldn't recommend using Flume. Go with the combination of Nifi and Kafka.

mqureshi · ‎02-02-2017

@Divakar Annapureddy First thing first. Yes it is possible and supported. here is the link. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_hadoop-ha/content/ch_HA-HiveServer2.html This is going to be my personal preference and opinion, so take it based on how you do things in your organization. First, if it's not broken, why fix it. Second, and this may be important depending on your utilization and number of requests per second, especially if same zookeeper is being used for things like HBase or even Kafka (Kafka should have its own Zookeeper regardless). Zookeeper is very sensitive to timeouts. That is why one best practice is to give zookeeper its own dedicated disk. If your namenode is the only thing that's being managed by Zookeeper then it's fine but if you have HBase or Kafka already pointing to same Zookeeper, why add one more component especially if that component is working just fine? As for what others are doing, I am not sure about Zookeeper because I have only seen customers use some load balancer like F5. I can say confidently that Zookeeper approach is less deployed in industry probably because its a new feature.

mqureshi · ‎02-02-2017

I see following in your hbase-site.xml when I open it. <script data-x-lastpass=""> (function(){var c=0;if("undefined"!==typeof CustomEvent&&"function"===typeof window.dispatchEvent){var a=function(a){try{if("object"===typeof a&&(a=JSON.stringify(a)),"string"===typeof a)return window.dispatchEvent(new CustomEvent("lprequeststart",{detail:{data:a,requestID:++c}})),c}catch(f){}},b=function(a){try{window.dispatchEvent(new CustomEvent("lprequestend",{detail:a}))}catch(f){}};"undefined"!==typeof XMLHttpRequest&&XMLHttpRequest.prototype&&XMLHttpRequest.prototype.send&&(XMLHttpRequest.prototype.send= function(c){return function(f){var d=this,e=a(f);e&&d.addEventListener("loadend",function(){b({requestID:e,statusCode:d.status})});return c.apply(d,arguments)}}(XMLHttpRequest.prototype.send));"function"===typeof fetch&&(fetch=function(c){return function(f,d){var e=a(d),g=c.apply(this,arguments);if(e){var h=function(a){b({requestID:e,statusCode:a&&a.status})};g.then(h)["catch"](h)}return g}}(fetch))}})(); (function(){if("undefined"!==typeof CustomEvent){var c=function(a){if(a.lpsubmit)return a;var b=function(){try{this.dispatchEvent(new CustomEvent("lpsubmit"))}catch(k){}return a.apply(this,arguments)};b.lpsubmit=!0;return b};window.addEventListener("DOMContentLoaded",function(){if(document&&document.forms&&0<document.forms.length)for(var a=0;a<document.forms.length;++a)document.forms[a].submit=c(document.forms[a].submit)},!0);document.createElement=function(a){return function(){var b=a.apply(this, arguments);b&&"FORM"===b.nodeName&&b.submit&&(b.submit=c(b.submit));return b}}(document.createElement)}})(); </script>

Online	Offline
Last Visited	‎10-31-2017 03:17 AM

Member Since	‎06-07-2016 09:05 AM
Last Visited	‎10-31-2017 03:17 AM
Posts	923
Kudos received	310

Cloudera Community

Re: YARN recommended configuration

Re: How to resolve for NULL values when they are c...

Re: Why is spark has better speed than Hadoop

Re: Is it possible to assign Hadoop queues to Hado...

Re: Kafka NiFi HDF Installation

Re: Hortonworks Installation

Re: Co-located client

Re: Co-located client

Re: Co-located client

Re: SOLR - how to use it

Re: After removed files in HDFS still it shows sam...

Re: Hortonworks Installation

Re: streaming ingest to hdfs

Re: Can Zookeeper handle Hs2 HA for ODBC connectio...

Re: Spark HBase Connector (SHC) job fails to conne...