We are trying to query a table which was created in hive using the OpenCSVSerde but we are hitting the below error. As far as we know, this comes default with CDH installation and Impala should support it.
Any reason why we are not able to query the table?
Query: select * from master_staging.rms_dxc_data_mc_cal_reps limit 5
ERROR: AnalysisException: Failed to load metadata for table: 'master_staging.rms_dxc_data_mc_cal_reps'
CAUSED BY: TableLoadingException: Failed to load metadata for table: master_staging.rms_dxc_data_mc_cal_reps
CAUSED BY: InvalidStorageDescriptorException: Impala does not support tables of this type. REASON: SerDe library 'org.apache.hadoop.hive.serde2.OpenCSVSerde' is not supported.
Impala doesn't support this Hive SerDe. In general Impala uses it's own optimised parsing code instead of using Hive's SerDe infrastructure. If you're ingesting data from CSV and using the SerDe to do the conversion, I'd recommend using Hive to do the ETL to convert to a more efficient storage format, e.g. Parquet.
I tried loading a .parquet file using the Metastore Mnager.
When tried to query the table using Impala editor, I am able to see the contets of the table.
The table query works in Impala with .parquet file loaded in Hive.
When querying a parquet hive table in impala, be sure to run:
invalidate metadata <table name>;
After this you should see the results.
So what you are saying is, a csv file that is loaded in Hive to create a table, can not be queried by Impala editor??