we are planning to index hive tables in cloudera solr to find the relative tables using data search. we don’t find any documents in cloudera site for this setup. we could see some generic document from below link for how to index hive tables using solr. but the problem is we need to build the JAR with third party tool Gradle and also we are not sure it will support cloudera solr or not.
Could you please guide me how to index hive tables in cloudera solr. Thanks
You have basicaly two options :
- either the file format is simple enough and you can index it directly using the MapReduceIndexerTool as suggested by pdvorak (you access the file directly)
- either the file format is too complicated (or dynamic) and then you need to code your own indexer that will run the query on hive, get the result and then push it to solr.
So, the takeaway is that there isn't an official indexer (just like for mysql) for Hive tables.
Is it possible to see it in the upcoming future? or Does it even make sense?
I mean, I see a clear use case behind that. If Solr can index Hive tables, it would become so easy to make your Hadoop data searchable.
Thanks for sharing that link; I will test with it.
However, I am a little skeptical about deploying it in production even if it works.
Does Cloudera have any plans to develop and release such a connector/handler/library?
As I mentioned previously, this seems to be a valid use case for allowing users to be able to search through Hive tables.