Support Questions
Find answers, ask questions, and share your expertise

Re: Solr indexing on hive table


@pdvorak wrote:
Beyond files, metastore tables and Hive SQL queries are also supported.

Does that part of your answer suggest that MRIT supports Hive queries as data source for Solr indexing ?

If yes how ?

Re: Solr indexing on hive table


Thanks for the detailed explanation of issues with DIH way.


I agree that it's better to send data to Solr while you are ingesting it to HDFS/Hive tables, but what about the data which is already there in Hive tables for a different type of use case?


Assume a scenario where there's an initial use case to bring RDBMS data from two different sources into Hive tables and being able to mash them up. In this case, the data will be in some kind of container like Parquet. After the initial use case is proved i.e. the data mashup is done for both sources and an ongoing processing pipeline is defined, another use case comes up where you need to be able to search through that data. How do you think that would be achieved?

Re: Solr indexing on hive table

Super Collaborator
What is the format of your hive tables in HDFS? You can use the MRIT [1] to index files in hdfs with the appropriate morphlines read statements. If they are csv, then you would just need to use the readCSV command. You could also use readAvro or readAvroParquetFile.



Re: Solr indexing on hive table

I am also having the same scenario, I am not able to find any documents from cloudera regarding this. If you got the solution please share.