Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to load an unequal csv dataset into Hive table?

Highlighted

How to load an unequal csv dataset into Hive table?

New Contributor

I want to load the extracted data(without semi-colon, double-quote and comma) from this csv dataset into Hive table. I don't want to use any external programming language how it can be done?

INPUT BOOK-RATING.csv DATASET :

"User-ID;""ISBN"";""Book-Rating""",,
"276725;""034545104X"";""0""",,
"276726;""0155061224"";""5""",,
"276727;""0446520802"";""0""",,
"276729;""052165615X"";""3""",,
"276729;""0521795028"";""6""",,
"276733;""2080674722"";""0""",,
"276736;""3257224281"";""8""",,
"276737;""0600570967"";""6""",,
"276744;""038550120X"";""7""",,
"276745;""342310538"";""10""",,

AFTER LOADING INTO HIVE TABLE VIEW SHOULD BE IN 3 COLUMNS:

276725	034545104X	0
276726	0155061224	5
276727	0446520802	0
276729	052165615X	3
276729	0521795028	6
276733	2080674722	0
276736	3257224281	8
276737	0600570967	6
276744	038550120X	7
276745	342310538	10
1 REPLY 1

Re: How to load an unequal csv dataset into Hive table?

Hi @Bhaskar Das

Use csvserde to load the csv file into a staging Hive table. Once done write a select clause using regex which select only numbers & alphabets for every column and insert it into the target table. By that way you can eliminate the other characters.

Use [a-zA-Z0-9_.] in the regex function to select only numbers & alphabets. Hope it helps!!