Hi,
Trying to upload logs into a Hive external table. Some records contain queries which have multiple columns in them, but commas are also the field delimiters. Other questions here recommend using org.apache.hadoop.hive.serde2.OpenCSVSerde. So I used:
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
STORED AS TEXTFILE
LOCATION 'hdfs://hdfs/path/to/directory/external/2020Jun30'
TBLPROPERTIES ('skip.header.line.count'='1');
This didn't deal with the additional non-escaped commas in the log file though. So I tried:
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = ",",
"quoteChar" = '"',
"escapeChar" = "\\"
)
STORED AS TEXTFILE
However, in Hue I get this error when using org.apache.hadoop.hive.serde2.OpenCSVSerde:
AnalysisException: Failed to load metadata for table: 'db.ext_table' CAUSED BY: TableLoadingException: Failed to load metadata for table: db.ext_table CAUSED BY: InvalidStorageDescriptorException: Impala does not support tables of this type. REASON: SerDe library 'org.apache.hadoop.hive.serde2.OpenCSVSerde' is not supported.
I can read from the table via beeline, but not via Hue. Is this something (a library?) that I need to add to Hue?
Or do I need to create the table in some other way?
How do I deal with sql query commas in a comma delimitered log file?