Member since
09-18-2015
3274
Posts
1159
Kudos Received
426
Solutions
11-01-2016
05:31 PM
Demo
Extract data from images and store in HDFS. Documents with size less than 10mb stores into HBase.
Document > 10mb lands into HDFS with metadata into HBase
Part 1 - https://www.linkedin.com/pulse/cds-content-data-store-nosql-part-1-co-dev-neeraj-sabharwal
... View more
Labels:
07-16-2016
12:05 AM
2 Kudos
OpenHAB - Build your smart home in no time!
Welcome to http://www.openhab.org/
A vendor and technology agnostic open source automation software for your home.
OpenHAB is a mature, open source home automation platform that runs on a variety of hardware and is protocol agnostic, meaning it can connect to nearly any home automation hardware on the market today. If you’ve been frustrated with the number of manufacturer specific apps you need to run just to control your lights, then I’ve got great news for you: OpenHAB is the solution you’ve been looking for – it’s the most flexible smart home hub you’ll ever find. Source
Demo:
Go to http://www.openhab.org/getting-started/downloads.html
Download Runtime core and Demo files
Extract Runtime core files in a directory called openHAB and extract Demo files under OpenHAB. See the following:
Now, download smartphone app called openHAB in your smartphone. I am using iOS and once you launch the app then disable DEMO tab and enter the https://192.x.x.x IP:8443 in your local domain as shown below.
You will be controlling the Room settings from your phone while openHAB is running in your machine or raspberry pi.
For now, just for fun, I am running this in my mac and playing on my iOS.
Docs and Examples
If you want to test it like a "pro" then follow this example
... View more
07-16-2016
12:03 AM
4 Kudos
INTRODUCING THE HORTONWORKS CONNECTED DATA CLOUD TECHNICAL PREVIEW
To this end, we are introducing the “
Hortonworks Connected Data Cloud” Technical Preview. This Technical Preview gives you a way to quickly spin up Apache Hive and Apache Spark clusters that are ready to run ephemeral workloads in your Amazon Web Services (AWS) environment.Source
Step 1
Follow the video in this
http://hortonworks.github.io/hdp-aws/launch/
Step 2
Create and Terminate a cluster.
Ambari Services starting
Install Completed
GUI access different views
... View more
07-05-2016
03:00 PM
Chronos is a replacement for cron.
A fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules
Marathon is a framework for Mesos that is designed to launch long-running applications, and, in Mesosphere, serves as a replacement for a traditional system
In Mesosphere, Chronos compliments Marathon as it provides another way to run applications, according to a schedule or other conditions, such as the completion of another job. It is also capable of scheduling jobs on multiple Mesos slave nodes, and provides statistics about job failures and successes. Source
Install https://mesos.github.io/chronos/docs/ and gist
Demo
Part 1 - https://www.linkedin.com/pulse/data-center-operating-system-dcos-part-1-neeraj-sabharwal Part 2 - https://www.linkedin.com/pulse/apache-marathon-part-2-neeraj-sabharwal
... View more
07-05-2016
11:30 AM
1 Kudo
You need Mesos to run this - Post 1
What is Apache Marathon?
Marathon is a production-grade container orchestration platform for Mesosphere'sDatacenter Operating System (DCOS) and Apache Mesos.
I am launching multiple applications using Marathon and Mesos is providing the framework to launch those applications.
Demo
More reading https://mesosphere.github.io/marathon/ Gist & Application example
... View more
07-05-2016
04:03 AM
1 Kudo
DC/OS - a new kind of operating system that spans all of the servers in a physical or cloud-based datacenter, and runs on top of any Linux distribution.
Source
Projects
More details https://docs.mesosphere.com/overview/components/
Let's cover Mesos in this post
Frameworks (Application running on mesos) http://mesos.apache.org/documentation/latest/frameworks/
I used http://mesos.apache.org/gettingstarted/ to install Mesos in my local machine. I am launching c++, java and python framework in this demo.
Mesos demo
More reading
... View more
07-04-2016
01:51 PM
4 Kudos
Hive: Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.
HBase: Apache HBase™ is the Hadoop database, a distributed, scalable, big data store
Hawq: http://hawq.incubator.apache.org/
PXF: PXF is an extensible framework that allows HAWQ to query external system data
Let's learn Query federation
This topic describes how to access Hive data using PXF.
Link
Previously, in order to query Hive tables using HAWQ and PXF, you needed to create an external table in PXF that described the target table's Hive metadata. Since HAWQ is now integrated with HCatalog, HAWQ can use metadata stored in HCatalog instead of external tables created for PXF. HCatalog is built on top of the Hive metastore and incorporates Hive's DDL. This provides several advantages:
You do not need to know the table schema of your Hive tables You do not need to manually enter information about Hive table location or format If Hive table metadata changes, HCatalog provides updated metadata. This is in contrast to the use of static external PXF tables to define Hive table metadata for HAWQ.
HAWQ retrieves table metadata from HCatalog using PXF. HAWQ creates in-memory catalog tables from the retrieved metadata. If a table is referenced multiple times in a transaction, HAWQ uses its in-memory metadata to reduce external calls to HCatalog. PXF queries Hive using table metadata that is stored in the HAWQ in-memory catalog tables. Table metadata is dropped at the end of the transaction.
Demo
Tools used
Hive,Hawq,Zeppelin
HBase tables
Follow
this to create hbase tables
perl create_hbase_tables.pl
Create table in HAWQ to access HBASE table
Note:
Port is 51200 not 50070
Links
Gist
PXF docs
Must see
this
Zeppelin interpreter settings
... View more
Labels:
07-04-2016
01:45 PM
2 Kudos
I found this
article "For mobile analytics, Yahoo is in the process of replacing HBase with Druid"
History : 24th Oct, 2012
To test out the setup, I have deployed druid in 2 clusters. first deployment is in my multi node cluster and 2nd deployment is using
this repo.
Details are on this blog
Demo - PS: It's 10 minutes demo
We are loading pageviews and then executing queries. See links at the bottom to download the git and code.
Gif
Download
I use
this to control gif movement
Links:
Page view queries and data
Spin up the environment in your mac or windows "not sure about windows"
Git link . This will spin up Druid, ZK, Hadoop, Postgres
Gist
Happy Hadooping!!!!
... View more
Labels:
07-03-2016
12:23 AM
6 Kudos
"Druid is fast column-oriented distributed data store". Druid is an open source data store designed for OLAP queries on event data. Architecture
Historical nodes are the workhorses that handle storage and querying on "historical" data (non-realtime). Historical nodes download segments from deep storage, respond to the queries from broker nodes about these segments, and return results to the broker nodes. They announce themselves and the segments they are serving in Zookeeper, and also use Zookeeper to monitor for signals to load or drop new segments. Coordinator nodes monitor the grouping of historical nodes to ensure that data is available, replicated and in a generally "optimal" configuration. They do this by reading segment metadata information from metadata storage to determine what segments should be loaded in the cluster, using Zookeeper to determine what Historical nodes exist, and creating Zookeeper entries to tell Historical nodes to load and drop new segments. Broker nodes receive queries from external clients and forward those queries toRealtime and Historical nodes. When Broker nodes receive results, they merge these results and return them to the caller. For knowing topology, Broker nodes use Zookeeper to determine what Realtime and Historical nodes exist. Indexing Service nodes form a cluster of workers to load batch and real-time data into the system as well as allow for alterations to the data stored in the system. Realtime nodes also load real-time data into the system. They are simpler to set up than the indexing service, at the cost of several limitations for production use. Segments are stored in deep storage. You can use S3, HDFS or local mount. Queries are going from client to broker to Realtime or Historical nodes. LAMBDA Architecture Dependencies Indexing service - Source ZK, Storage and Metadata
A running ZooKeeper cluster for cluster service discovery and maintenance of current data topology A metadata storage instance for maintenance of metadata about the data segments that should be served by the system A "deep storage" LOB store/file system to hold the stored segments Source Part 2 - Demo Druid and HDFS as deep storage.
... View more
Labels:
05-20-2016
03:55 AM
2 Kudos
Hive: Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.
HBase: Apache HBase™ is the Hadoop database, a distributed, scalable, big data store
Hawq: http://hawq.incubator.apache.org/
PXF: PXF is an extensible framework that allows HAWQ to query external system data
Let's learn Query federation
This topic describes how to access Hive data using PXF. Link
Previously, in order to query Hive tables using HAWQ and PXF, you needed to create an external table in PXF that described the target table's Hive metadata. Since HAWQ is now integrated with HCatalog, HAWQ can use metadata stored in HCatalog instead of external tables created for PXF. HCatalog is built on top of the Hive metastore and incorporates Hive's DDL. This provides several advantages:
You do not need to know the table schema of your Hive tables You do not need to manually enter information about Hive table location or format If Hive table metadata changes, HCatalog provides updated metadata. This is in contrast to the use of static external PXF tables to define Hive table metadata for HAWQ.
HAWQ retrieves table metadata from HCatalog using PXF. HAWQ creates in-memory catalog tables from the retrieved metadata. If a table is referenced multiple times in a transaction, HAWQ uses its in-memory metadata to reduce external calls to HCatalog. PXF queries Hive using table metadata that is stored in the HAWQ in-memory catalog tables. Table metadata is dropped at the end of the transaction.
Demo
Tools used
Hive,Hawq,Zeppelin
HBase tables Follow this to create hbase tables perl create_hbase_tables.pl Create table in HAWQ to access HBASE table Note: Port is 51200 not 50070 Links Gist PXF docs Must see this Zeppelin interpreter settings
... View more
Labels: