Support Questions

Find answers, ask questions, and share your expertise

HIVE/HBASE/SOLR

avatar
Super Collaborator

Hi,

I have a fundamental question related to these storage engine (HIVE/HBASE/SOLR) options on HDFS

Q1. If we ingest data to HDFS and then build SOLR index - is it same as directly ingesting the data into SOLR? In terms of storage usage and lay out.

Q2. Is there any approach on how to ingest once and may be have all 3 different data-access options optimally to use - HIVE for faster scan, HBASE gives for bulk retrieval, SOLR for record level search.

Thanks,

Avijeet

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hi @Avijeet Dash,

The Solr index requires persistent storage as well.

There are several options to read Hbase from Hive and Solr from Hive and they all include storage handlers and SerDes such as https://github.com/lucidworks/hive-solr and https://github.com/chimpler/hive-solr.

Also for Hive/Hbase integration there is https://cwiki.apache.org/confluence/display/Hive/StorageHandlers

Hope this helps.

/Best regards, Mats

View solution in original post

2 REPLIES 2

avatar
Super Collaborator

Hi @Avijeet Dash,

The Solr index requires persistent storage as well.

There are several options to read Hbase from Hive and Solr from Hive and they all include storage handlers and SerDes such as https://github.com/lucidworks/hive-solr and https://github.com/chimpler/hive-solr.

Also for Hive/Hbase integration there is https://cwiki.apache.org/confluence/display/Hive/StorageHandlers

Hope this helps.

/Best regards, Mats

avatar
Master Guru

You can use hbase lily indexer, which will atomically index data from hbase into solr.

You can also use apache nifi, ingest once, and fork to solr, hbase, hdfs, and hive. highly flexible implemenation model