About nsabharwal

nsabharwal · ‎01-31-2016

Tools in use: HBase shell and Zeppelin User demouser needs access to HBase table called PRICES. User zeppelin needs the same access to run few queries. You can run this demo by using Hortonworks Sandbox

nsabharwal · ‎01-30-2016

@Shaofeng Shi Thanks for sharing all the comments. I wonder if it;s possible to post them as an article...Please

nsabharwal · ‎01-29-2016

@rgarcia Do you have information on latest version of Ambari and HDP that is compatible with Centrify?

nsabharwal · ‎01-25-2016

@Hajime It will go to Rack1 and Step 3 ...I won't do that 🙂

nsabharwal · ‎01-23-2016

Node labels enable you partition a cluster into sub-clusters so that jobs can be run on nodes with specific characteristics. For example, you can use node labels to run memory-intensive jobs only on nodes with a larger amount of RAM. Node labels can be assigned to cluster nodes, and specified as exclusive or shareable. You can then associate node labels with capacity scheduler queues. Each node can have only one node label. Demo: Use case 2 node labels : node1 & node2 + Default & Spark queue Submit job to node1 Node labels added : yarn rmadmin -addToClusterNodeLabels "node1(exclusive=true),node2(exclusive=false)" Label assigned: yarn rmadmin -replaceLabelsOnNode "phdns02.cloud.hortonworks.com=node2 phdns01.cloud.hortonworks.com=node1" Job Submission: Job send to node1 only and assign to queue spark hadoop jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -shell_command "sleep 100" -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar-queue spark -node_label_expression node1 Job send to node2 only and assign to queue spark hadoop jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -shell_command "sleep 100" -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar-queue spark -node_label_expression node2 Job send to node1 only and assign to queue default hadoop jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -shell_command "sleep 100" -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar-queue default -node_label_expression node1 More details - Doc link SlideShare

nsabharwal · ‎01-20-2016

First 15 seconds - Queue Mapping Parent article CS view

nsabharwal · ‎01-20-2016

Apache Ranger let's you control the users who can submit the jobs to Yarn queues. Demo 1 ) Default queue User demouser can submit jobs to default queue and can be controlled from Ranger UI Demo 2) Non-Default queue User demouser can submit jobs to spark queue and can be controlled from Ranger UI Happy Hadooping!! More information

nsabharwal · ‎01-19-2016

Linkedin Post Have you heard of computing on traditional disk-based or flash-based technologies? We all use disks/flash storage in our laptops, desktop or servers sitting in the data center. What is Apache Ignite In-Memory Data Fabric ? It's high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time "orders of magnitude faster than possible with traditional disk-based or flash-based technologies." Fabric: In information technology, fabric is a synonym for the words framework or platform. You can view Ignite as a collection of independent, well-integrated, in-memory components geared to improve performance and scalability of your application.Source How Ignite fits with HDFS? (Ignite file system , IGFS) IGFS shakes hand with HDFS. Hadoop can run over IGFS in plug-n-play fashion and significantly reduce I/O and improve both, latency and throughput. Why do we need another layer on the top of HDFS? IGFS supports dual-mode. As you can see in the above pic, it can be deployed as main file system or it can sit on top of HDFS to provide caching layer, provides highly configurable read-through and write-through behaviour. IGFS serve as an in-memory caching layer over disk-based HDFS. Installation scroll down to "In-Memory Hadoop Accelerator:" The best part of the install is HADOOP_README.txt and useful documentation. Check out HDP and Ignite setup guide Hive and Ignite Spark and Ignite Zeppelin and Ignite Ignite and DataGrid

nsabharwal · ‎01-19-2016

Apache Drill is an open source, low-latency query engine for Hadoop that delivers secure, interactive SQL analytics at petabyte scale. With the ability to discover schemas on-the-fly, Drill is a pioneer in delivering self-service data exploration capabilities on data stored in multiple formats in files or NoSQL databases. By adhering to ANSI SQL standards, Drill does not require a learning curve and integrates seamlessly with visualization tools. Source Tutorial: Grab latest version of drill wget http://getdrill.org/drill/download/apache-drill-1.2.0.tar.gz tar xvfz apache-drill-1.2.0.tar.gz /root/drill/apache-drill-1.2.0 [root@node1 apache-drill-1.2.0]# cd bin/ Start Drill in distributed mode ( You can start in embedded mode too) [root@node1 conf]# ../bin/drillbit.shstart starting drillbit, logging to /root/drill/apache-drill-1.2.0/log/drillbit.out Drill Web Console - http://host:8047 Enable storage plugins Click Storage --> Enable hbase,hive,mongo Modify storage plugin for Hive and HBase as per your Hadoop cluster setup for example: click update for hive under storage plugins modify hive.metastore.uris Hive Test: launch hive shell hive> create table drill_hive ( info string); hive> insert into table drill_hive values ('This is Hive and you are using Drill'); [root@node1 bin]# ./drill-conf apache drill 1.2.0 "start your sql engine" 0: jdbc:drill:> use hive; 0: jdbc:drill:> select info from drill_hive; HBase Change storegae properties in case HBase zookeeper.znode.parent points to /hbase-unsecure add "zookeeper.znode.parent": "/hbase-unsecure" Let's check the query plan and metrics Click Profile You will see queries under completed queries. Click the Query to see the query execution stats. What is foreman? Link More information For ODBC/JDBC setup Happy Hadoooping!!!!

nsabharwal · ‎01-19-2016

Linkedin Post Presto is a tool designed to efficiently query vast amounts of data using distributed queries. We will be installing Presto in single server mode, Access Hive and then add worker node. Cross query - RBDMS, Hive, NoSql Tutorial **Java 8 must ** Install - link (for the latest versions) wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.122/presto-server-0.122.tar.gz tar xvfz presto-server-0.122.tar.gz Let's start with Single node setup (master and worker on the same node) cd presto-server-0.122 mkdir etc [root@ns2 presto-server-0.122]# cd etc/ mkdir catalog and we will create 3 files as shown below [root@ns2 etc]# ls catalog config.properties jvm.config log.properties node.properties [root@ns etc]# cat config.properties coordinator=true node-scheduler.include-coordinator=true http-server.http.port=9080 query.max-memory=10GB query.max-memory-per-node=1GB discovery-server.enabled=true discovery.uri=http://ns2:9080 [root@ns2 etc]# cat log.properties com.facebook.presto=INFO [root@ns2 etc]# cat node.properties node.environment=production node.id=presto1 node.data-dir=/var/presto/data Details on the properties are here Now , let's create hive properties file (I have create hive.properties already) cd catalog/ [root@ns2 catalog]# ls hive.properties jmx.properties [root@ns2 catalog]# cat hive.properties connector.name=hive-hadoop2 hive.metastore.uri=thrift://ns3:9083 hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml All set to start presto server [root@ns2 bin]# pwd /root/presto-server-0.122/bin [root@ns2 bin]# nohup ./launcher run & [1] 11722 [root@ns2 bin]# nohup: ignoring input and appending output to `nohup.out' [root@ns2 bin]# tail -f nohup.out last line will be 2015-10-18T16:49:49.935-0400 INFO main com.facebook.presto.metadata.CatalogManager -- Added catalog hive using connector hive-hadoop2 -- 2015-10-18T16:49:50.005-0400 INFO main com.facebook.presto.server.PrestoServer ======== SERVER STARTED ======== hit http://host:9080 Let's access Hive tables Download presto cli ( link for the latest release) mv presto-cli-0.122-executable.jar presto [root@ns2 bin]# ./presto --server ns2:9080 --catalog hive presto> show tables from default; Create a table in Hive Presto UI click one of the queries to check the stats. Click Execution link to get execution plan Let's add worker node and remove master from the worker Node name - ns4 Repeat installation steps in new node as mentioned above then make following changes /root/presto-server-0.122/etc [root@ns4 etc]# cat config.properties coordinator=false http-server.http.port=9080 query.max-memory=10GB query.max-memory-per-node=1GB discovery.uri=http://ns2:9080 (It points to master server) [root@ns4 etc]# cat node.properties (node.id needs to be unique) node.environment=production node.id=presto2 node.data-dir=/var/presto/data [root@ns4 etc]# cd .. [root@ns4 presto-server-0.122]# cd bin/ [root@ns4 bin]# nohup ./launcher run & [root@ns02 bin]# ./presto --server ns2:9080 --catalog hive Happy Hadooping!!! Read Presto: Interacting with petabytes of data at Facebook

Online	Offline
Last Visited	‎07-18-2019 05:10 PM

Member Since	‎09-18-2015 05:49 PM
Last Visited	‎07-18-2019 05:10 PM
Posts	3,274
Kudos received	1119

Cloudera Community

Re: Datanode starts but doesn't connect to namenod...

Apache Ranger and HBase

Re: A quick skinny on Apache Kylin

Re: Centrify Integration With HDP

Re: Yarn Node Labels

Yarn Node Labels

Yarn queues and CS view - Queue Mapping

Apache Ranger and Yarn setup - Security

Apache Ignite "In-Memory Data Fabric"

Apache Drill (unofficial) - Introduction

Introduction to Presto