Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

XML Data processing in Hive

XML Data processing in Hive

New Contributor

In below XML raw table author name was split by comma. But I need to insert a single value in the Main table.

XML Raw table (Staging table)

CREATE TABLE xml_raw(line string);
LOAD DATA LOCAL INPATH ‘/home/xxxxx/book_simple.txt’ INTO TABLE xml_raw;

Eg: SELECT XPATH_STRING(line,‘catalog/book/author’) FROM xml_raw;
Gambardella, Matthew
Ralls, Kim
Corets, Eva

Main Table

CREATE TABLE books (author string, title string, genre string, price double, publish_date string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n’
STORED AS TEXTFILE;

Below is the insert command to load the main table, but the result comes as not expected.

INSERT OVERWRITE TABLE books
SELECT XPATH_STRING(line,‘catalog/book/author’),
XPATH_STRING(line,‘catalog/book/title’),
XPATH_STRING(line,‘catalog/book/genre’),
XPATH_DOUBLE(line,‘catalog/book/price’),
XPATH_STRING(line,‘catalog/book/publish_date’)
FROM xml_raw;

Result

SELECT * FROM books ;
OK
books.author books.title books.genre books.price books.publish_date
Gambardella Matthew XML Developer’s Guide NULL 44.95
Ralls Kim Midnight Rain NULL 5.95
Corets Eva Maeve Ascendant NULL 5.95

But required result to be like
books.author books.title books.genre books.price books.publish_date
Corets,Eva Maeve Ascendant Fantasy 5.95 2000-11-17

I need author name to be in the single column. Awaiting your response.

1 REPLY 1
Highlighted

Re: XML Data processing in Hive

Contributor

@VENKATESH M

You can try escaping the delimiter in your dataset and then populate the table.

CREATE TABLE books (author string, title string, genre string, price double, publish_date string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',' ESCAPED BY '\\'

LINES TERMINATED BY '\n’

STORED AS TEXTFILE;