About nsabharwal

nsabharwal · ‎01-19-2016

Apache Drill is an open source, low-latency query engine for Hadoop that delivers secure, interactive SQL analytics at petabyte scale. With the ability to discover schemas on-the-fly, Drill is a pioneer in delivering self-service data exploration capabilities on data stored in multiple formats in files or NoSQL databases. By adhering to ANSI SQL standards, Drill does not require a learning curve and integrates seamlessly with visualization tools. Source Tutorial: Grab latest version of drill wget http://getdrill.org/drill/download/apache-drill-1.2.0.tar.gz tar xvfz apache-drill-1.2.0.tar.gz /root/drill/apache-drill-1.2.0 [root@node1 apache-drill-1.2.0]# cd bin/ Start Drill in distributed mode ( You can start in embedded mode too) [root@node1 conf]# ../bin/drillbit.shstart starting drillbit, logging to /root/drill/apache-drill-1.2.0/log/drillbit.out Drill Web Console - http://host:8047 Enable storage plugins Click Storage --> Enable hbase,hive,mongo Modify storage plugin for Hive and HBase as per your Hadoop cluster setup for example: click update for hive under storage plugins modify hive.metastore.uris Hive Test: launch hive shell hive> create table drill_hive ( info string); hive> insert into table drill_hive values ('This is Hive and you are using Drill'); [root@node1 bin]# ./drill-conf apache drill 1.2.0 "start your sql engine" 0: jdbc:drill:> use hive; 0: jdbc:drill:> select info from drill_hive; HBase Change storegae properties in case HBase zookeeper.znode.parent points to /hbase-unsecure add "zookeeper.znode.parent": "/hbase-unsecure" Let's check the query plan and metrics Click Profile You will see queries under completed queries. Click the Query to see the query execution stats. What is foreman? Link More information For ODBC/JDBC setup Happy Hadoooping!!!!

nsabharwal · ‎01-19-2016

@Mehdi TAZI Very good point. It goes back to ELT ..Source of truth "raw data" lands in HDFS, we run transformations on that data and load into Hive or HBASE based on used case. There is significant cost difference in storing the source of truth in Hadoop vs. Expensive SAN or EDW. You don't have to store in HDFS. You can load data directly into Hive or HBase tables. The very basic use case i,e Data archival. You can "move" data from EDW into Hive using sqoop. Data goes directly into hive tables.

nsabharwal · ‎01-19-2016

@Andrea Squizzato You will have to open support ticket for this. Can share entries from log files to look further?

nsabharwal · ‎01-19-2016

@Mehdi TAZI No and I never heard of duplicating the data with Parquet. I hope you are not referring to HDFS replication factor. If you are then please see this

nsabharwal · ‎01-19-2016

@Zeev Lazarev Use root does not have access to create directory under / You can copy and paste this in your ssh window su - hdfs hdfs dfs -mkdir -p /mp2/links hdfs dfs -chown -R root:hdfs /mp2/links exit

nsabharwal · ‎01-19-2016

@Mehdi TAZI I am big fan of orc http://hortonworks.com/blog/orcfile-in-hdp-2-better-compression-better-performance/

nsabharwal · ‎01-19-2016

@rmolina

nsabharwal · ‎01-19-2016

@John Smith Generally, you don't have to do anything except importing the sandbox image. 1) Networks 2) This is setup during the install.

nsabharwal · ‎01-19-2016

@Kumar Ratan System is running out of memory Hive is trying to create tez container and system does not have enough Memory Check the vm memory and see if you can increase it

nsabharwal · ‎01-19-2016

@Benson Shih check this https://github.com/abajwa-hw/security-workshops/blob/master/Setup-ranger-23.md#setup-kafka-plugin-for-ranger

Online	Offline
Last Visited	‎07-18-2019 05:10 PM

Member Since	‎09-18-2015 05:49 PM
Last Visited	‎07-18-2019 05:10 PM
Posts	3,274
Kudos received	1129

Cloudera Community

Re: Is Ranger KMS Encryption FIPS 140-2 compliant ...

Re: How to add another HiveServer for current meta...

Re: FQDNs - are they necessary?

Re: java.io.FileNotFoundException: (Is a director...

Re: Need Design/Architecture Suggestion on HDP & H...

Apache Drill (unofficial) - Introduction

Re: Parquet data duplication

Re: HDP Windows services continuous restarting

Re: Parquet data duplication

Re: mkdir: Permission denied: user=root, access=WR...

Re: Parquet data duplication

Re: Sandbox - web ui problem

Re: Sandbox - web ui problem

Re: Java heap space

Re: Ranger policy malfunction in kafka