10-31-2016 02:04 PM
we are planning to index hive tables in cloudera solr to find the relative tables using data search. we don’t find any documents in cloudera site for this setup. we could see some generic document from below link for how to index hive tables using solr. but the problem is we need to build the JAR with third party tool Gradle and also we are not sure it will support cloudera solr or not.
Could you please guide me how to index hive tables in cloudera solr. Thanks
11-04-2016 09:14 AM
04-26-2017 05:21 AM
06-09-2017 07:53 AM - edited 06-09-2017 07:53 AM
You have basicaly two options :
- either the file format is simple enough and you can index it directly using the MapReduceIndexerTool as suggested by pdvorak (you access the file directly)
- either the file format is too complicated (or dynamic) and then you need to code your own indexer that will run the query on hive, get the result and then push it to solr.
06-12-2017 08:54 AM
So, the takeaway is that there isn't an official indexer (just like for mysql) for Hive tables.
Is it possible to see it in the upcoming future? or Does it even make sense?
I mean, I see a clear use case behind that. If Solr can index Hive tables, it would become so easy to make your Hadoop data searchable.
06-20-2017 01:08 AM
You should look at this : https://chimpler.wordpress.com/2013/03/20/playing-with-apache-hive-and-solr/
The content seems to be what you are looking for.
I have not tested it myself.
06-21-2017 08:52 AM
Thanks for sharing that link; I will test with it.
However, I am a little skeptical about deploying it in production even if it works.
Does Cloudera have any plans to develop and release such a connector/handler/library?
As I mentioned previously, this seems to be a valid use case for allowing users to be able to search through Hive tables.