Created 12-12-2016 06:24 AM
Hi,
I have a fundamental question related to these storage engine (HIVE/HBASE/SOLR) options on HDFS
Q1. If we ingest data to HDFS and then build SOLR index - is it same as directly ingesting the data into SOLR? In terms of storage usage and lay out.
Q2. Is there any approach on how to ingest once and may be have all 3 different data-access options optimally to use - HIVE for faster scan, HBASE gives for bulk retrieval, SOLR for record level search.
Thanks,
Avijeet
Created 12-12-2016 10:54 AM
Hi @Avijeet Dash,
The Solr index requires persistent storage as well.
There are several options to read Hbase from Hive and Solr from Hive and they all include storage handlers and SerDes such as https://github.com/lucidworks/hive-solr and https://github.com/chimpler/hive-solr.
Also for Hive/Hbase integration there is https://cwiki.apache.org/confluence/display/Hive/StorageHandlers
Hope this helps.
/Best regards, Mats
Created 12-12-2016 10:54 AM
Hi @Avijeet Dash,
The Solr index requires persistent storage as well.
There are several options to read Hbase from Hive and Solr from Hive and they all include storage handlers and SerDes such as https://github.com/lucidworks/hive-solr and https://github.com/chimpler/hive-solr.
Also for Hive/Hbase integration there is https://cwiki.apache.org/confluence/display/Hive/StorageHandlers
Hope this helps.
/Best regards, Mats
Created 12-12-2016 04:46 PM
You can use hbase lily indexer, which will atomically index data from hbase into solr.
You can also use apache nifi, ingest once, and fork to solr, hbase, hdfs, and hive. highly flexible implemenation model