About nsabharwal

nsabharwal · ‎07-05-2016

DC/OS - a new kind of operating system that spans all of the servers in a physical or cloud-based datacenter, and runs on top of any Linux distribution. Source Projects More details https://docs.mesosphere.com/overview/components/ Let's cover Mesos in this post Frameworks (Application running on mesos) http://mesos.apache.org/documentation/latest/frameworks/ I used http://mesos.apache.org/gettingstarted/ to install Mesos in my local machine. I am launching c++, java and python framework in this demo. Mesos demo More reading

nsabharwal · ‎07-04-2016

Hive: Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. HBase: Apache HBase™ is the Hadoop database, a distributed, scalable, big data store Hawq: http://hawq.incubator.apache.org/ PXF: PXF is an extensible framework that allows HAWQ to query external system data Let's learn Query federation This topic describes how to access Hive data using PXF. Link Previously, in order to query Hive tables using HAWQ and PXF, you needed to create an external table in PXF that described the target table's Hive metadata. Since HAWQ is now integrated with HCatalog, HAWQ can use metadata stored in HCatalog instead of external tables created for PXF. HCatalog is built on top of the Hive metastore and incorporates Hive's DDL. This provides several advantages: You do not need to know the table schema of your Hive tables You do not need to manually enter information about Hive table location or format If Hive table metadata changes, HCatalog provides updated metadata. This is in contrast to the use of static external PXF tables to define Hive table metadata for HAWQ. HAWQ retrieves table metadata from HCatalog using PXF. HAWQ creates in-memory catalog tables from the retrieved metadata. If a table is referenced multiple times in a transaction, HAWQ uses its in-memory metadata to reduce external calls to HCatalog. PXF queries Hive using table metadata that is stored in the HAWQ in-memory catalog tables. Table metadata is dropped at the end of the transaction. Demo Tools used Hive,Hawq,Zeppelin HBase tables Follow this to create hbase tables perl create_hbase_tables.pl Create table in HAWQ to access HBASE table Note: Port is 51200 not 50070 Links Gist PXF docs Must see this Zeppelin interpreter settings

nsabharwal · ‎07-04-2016

I found this article "For mobile analytics, Yahoo is in the process of replacing HBase with Druid" History : 24th Oct, 2012 To test out the setup, I have deployed druid in 2 clusters. first deployment is in my multi node cluster and 2nd deployment is using this repo. Details are on this blog Demo - PS: It's 10 minutes demo We are loading pageviews and then executing queries. See links at the bottom to download the git and code. Gif Download I use this to control gif movement Links: Page view queries and data Spin up the environment in your mac or windows "not sure about windows" Git link . This will spin up Druid, ZK, Hadoop, Postgres Gist Happy Hadooping!!!!

nsabharwal · ‎07-03-2016

"Druid is fast column-oriented distributed data store". Druid is an open source data store designed for OLAP queries on event data. Architecture Historical nodes are the workhorses that handle storage and querying on "historical" data (non-realtime). Historical nodes download segments from deep storage, respond to the queries from broker nodes about these segments, and return results to the broker nodes. They announce themselves and the segments they are serving in Zookeeper, and also use Zookeeper to monitor for signals to load or drop new segments. Coordinator nodes monitor the grouping of historical nodes to ensure that data is available, replicated and in a generally "optimal" configuration. They do this by reading segment metadata information from metadata storage to determine what segments should be loaded in the cluster, using Zookeeper to determine what Historical nodes exist, and creating Zookeeper entries to tell Historical nodes to load and drop new segments. Broker nodes receive queries from external clients and forward those queries toRealtime and Historical nodes. When Broker nodes receive results, they merge these results and return them to the caller. For knowing topology, Broker nodes use Zookeeper to determine what Realtime and Historical nodes exist. Indexing Service nodes form a cluster of workers to load batch and real-time data into the system as well as allow for alterations to the data stored in the system. Realtime nodes also load real-time data into the system. They are simpler to set up than the indexing service, at the cost of several limitations for production use. Segments are stored in deep storage. You can use S3, HDFS or local mount. Queries are going from client to broker to Realtime or Historical nodes. LAMBDA Architecture Dependencies Indexing service - Source ZK, Storage and Metadata A running ZooKeeper cluster for cluster service discovery and maintenance of current data topology A metadata storage instance for maintenance of metadata about the data segments that should be served by the system A "deep storage" LOB store/file system to hold the stored segments Source Part 2 - Demo Druid and HDFS as deep storage.

nsabharwal · ‎07-01-2016

@roy p See this if it helps. Link

nsabharwal · ‎07-01-2016

@ed day You can manage this by maintaining /etc/hosts but whenever IP changes then you have to update the entries in the host file. FQDN is recommended for reasons like changing IP in the environment does not require any changes in the cluster or users does not need to have local /etc/hosts in their env to reach the cluster.

nsabharwal · ‎05-25-2016

I was able to fix the above issue by adding hadoop jars in the class path while starting the components Start Coordinator, Overlord ns03 java `cat conf/druid/coordinator/jvm.config | xargs` -cp conf/druid/_common:conf/druid/coordinator:lib/*:/usr/hdp/2.4.2.0-258/hadoop/lib/*:/usr/hdp/2.4.2.0-258/hadoop-yarn/lib/*:/usr/hdp/2.4.2.0-258/hadoop-yarn/*:/usr/hdp/2.4.2.0-258/hadoop/client/*:/usr/hdp/2.4.2.0-258/hadoop-mapreduce/*:/usr/hdp/2.4.2.0-258/hadoop-mapreduce/lib/* io.druid.cli.Main server coordinator & java `cat conf/druid/overlord/jvm.config | xargs` -cp conf/druid/_common:conf/druid/overlord:lib/*:/usr/hdp/2.4.2.0-258/hadoop/lib/*:/usr/hdp/2.4.2.0-258/hadoop-yarn/lib/*:/usr/hdp/2.4.2.0-258/hadoop-yarn/*:/usr/hdp/2.4.2.0-258/hadoop/client/*:/usr/hdp/2.4.2.0-258/hadoop-mapreduce/*:/usr/hdp/2.4.2.0-258/hadoop-mapreduce/lib/* io.druid.cli.Main server overlord & Start Historicals and MiddleManagers ns02 java `cat conf/druid/historical/jvm.config | xargs` -cp conf/druid/_common:conf/druid/historical:lib/*:/usr/hdp/2.4.2.0-258/hadoop/lib/*:/usr/hdp/2.4.2.0-258/hadoop-yarn/lib/*:/usr/hdp/2.4.2.0-258/hadoop-yarn/*:/usr/hdp/2.4.2.0-258/hadoop/client/*:/usr/hdp/2.4.2.0-258/hadoop-mapreduce/*:/usr/hdp/2.4.2.0-258/hadoop-mapreduce/lib/* io.druid.cli.Main server historical & java `cat conf/druid/middleManager/jvm.config | xargs` -cp conf/druid/_common:conf/druid/middleManager:lib/*:/usr/hdp/2.4.2.0-258/hadoop/lib/*:/usr/hdp/2.4.2.0-258/hadoop-yarn/lib/*:/usr/hdp/2.4.2.0-258/hadoop-yarn/*:/usr/hdp/2.4.2.0-258/hadoop/client/*:/usr/hdp/2.4.2.0-258/hadoop-mapreduce/*:/usr/hdp/2.4.2.0-258/hadoop-mapreduce/lib/* io.druid.cli.Main server middleManager & Start Druid Broker java `cat conf/druid/broker/jvm.config | xargs` -cp conf/druid/_common:conf/druid/broker:lib/*:/usr/hdp/2.4.2.0-258/hadoop/lib/*:/usr/hdp/2.4.2.0-258/hadoop-yarn/lib/*:/usr/hdp/2.4.2.0-258/hadoop-yarn/*:/usr/hdp/2.4.2.0-258/hadoop/client/*:/usr/hdp/2.4.2.0-258/hadoop-mapreduce/*:/usr/hdp/2.4.2.0-258/hadoop-mapreduce/lib/* io.druid.cli.Main server broker &

nsabharwal · ‎05-24-2016

@karthik sai Make this your landing point http://docs.hortonworks.com/HDPDocuments/HDF1/HDF-1.2.0.1/index.html Release notes http://docs.hortonworks.com/HDPDocuments/HDF1/HDF-1.2.0.1/bk_HDF_RelNotes/content/ch_hdf_relnotes.html#release_summary Supported Operating Systems Red Hat Enterprise Linux / CentOS 6 (64-bit) Red Hat Enterprise Linux / CentOS 7 (64-bit) Ubuntu Precise (12.04) (64-bit) Ubuntu Trusty (14.04) (64-bit) Debian 6 Debian 7 SUSE Enterprise Linux 11 - SP3 (64-bit) 2) http://docs.hortonworks.com/HDPDocuments/HDF1/HDF-1.2.0.1/bk_HDF_InstallSetup/content/hdf_supported_hdp.html 3) HDF 1.2 Hardware recommendation https://docs.hortonworks.com/HDPDocuments/HDF1/HDF-1.2/bk_HDF_InstallSetup/content/hdf_isg_hardware.html Demo idea link https://community.hortonworks.com/articles/961/a-collection-of-nifi-examples.html

nsabharwal · ‎05-24-2016

@Alex Raj Hive tuning http://hortonworks.com/blog/5-ways-make-hive-queries-run-faster/ SparkSql - http://hortonworks.com/hadoop-tutorial/using-hive-with-orc-from-apache-spark/

nsabharwal · ‎05-24-2016

HDP 2.4.2 Ambari 2.2.2 druid-0.9.0 I am following this http://druid.io/docs/latest/tutorials/quickstart.html and running [root@nss03 druid-0.9.0]# curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/wikiticker-index.json http://overlordnode:8090/druid/indexer/v1/task {"task":"index_hadoop_wikiticker_2016-05-24T11:38:51.681Z"} [root@nss03 druid-0.9.0]# I can see that job is submitted to the yarn queue. RM UI error details. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/yarn/local/filecache/10/mapreduce.tar.gz/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/yarn/local/filecache/130/log4j-slf4j-impl-2.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] log4j:ERROR setFile(null,true) call failed. java.io.FileNotFoundException: /hadoop/yarn/log/application_1464036814491_0009/container_e04_1464036814491_0009_01_000001 (Is a directory) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.<init>(FileOutputStream.java:213) at java.io.FileOutputStream.<init>(FileOutputStream.java:133) at org.apache.log4j.FileAppender.setFile(FileAppender.java:294) at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165) at org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55) at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104) at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842) at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768) at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580) at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526) at org.apache.log4j.LogManager.<clinit>(LogManager.java:127) at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64) at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285) at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155) at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132) at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:657) at org.apache.hadoop.service.AbstractService.<clinit>(AbstractService.java:43) May 24, 2016 4:39:11 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class May 24, 2016 4:39:11 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class May 24, 2016 4:39:11 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource class

Online	Offline
Last Visited	‎07-18-2019 05:10 PM

Member Since	‎09-18-2015 05:49 PM
Last Visited	‎07-18-2019 05:10 PM
Posts	3,274
Kudos received	1129

Cloudera Community

Re: Is Ranger KMS Encryption FIPS 140-2 compliant ...

Re: How to add another HiveServer for current meta...

Re: FQDNs - are they necessary?

Re: java.io.FileNotFoundException: (Is a director...

Re: Need Design/Architecture Suggestion on HDP & H...

Apache Mesos - Introduction

HAWQ/HDB and Hadoop with Hive and HBase

Druid - Part 2

Druid - Part 1

Re: spark-shell error : No FileSystem for scheme: ...

Re: FQDNs - are they necessary?

Re: java.io.FileNotFoundException: (Is a director...

Re: Need Design/Architecture Suggestion on HDP & H...

Re: Is SparkSQL faster than Hive or Beeline?

java.io.FileNotFoundException: (Is a directory)