Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Nested CSV for Impala Tables?

Highlighted

Nested CSV for Impala Tables?

New Contributor

Hi Impalas,

 

I have text data already in HDFS; it's comma-separated, but the cells in the table themselves contain commas, so doing a straightforward table creation with the clause

 

ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

 

delimits on text within a cell, which I would like to prevent.  There is a custom SerDe for Hive that allows you to choose separators to avoid this problem (see http://dev.bizo.com/2010/11/csv-and-hive.html ).

 

Is there something similar for Impala?

 

If not, is there a workaround?

 

I'm using CDH4.4.  

version=2.0.0-cdh4.4.0
git.hash=c0eba6cd38c984557e96a16ccd7356b7de835e79
cloudera.hash=c0eba6cd38c984557e96a16ccd7356b7de835e79
cloudera.base-branch=cdh4-base-2.0.0
cloudera.build-branch=cdh4-2.0.0_4.4.0
cloudera.pkg.version=2.0.0+1475
cloudera.pkg.release=1.cdh4.4.0.p0.23
cloudera.cdh.release=cdh4.4.0
cloudera.build.time=2013.09.04-01:49:02GMT

cloudera.pkg.name=hadoop-hdfs

 

Thanks for the help and for being a great service,

Chris

2 REPLIES 2

Re: Nested CSV for Impala Tables?

Contributor
Impala does not support custom SerDe, but it natively support text files with any seperator you defined. For example:

create table tsv(id int, s string, n int, t timestamp, b boolean)
stored as textfile
fields terminated by '\t';



Here's the relevant doc:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Im...
Highlighted

Re: Nested CSV for Impala Tables?

Cloudera Employee

If you want to keep comma as the separator, you can use the ESCAPED BY clause to define an escape character for the table

(usually ESCAPED BY '\\' to use the familiar backslash escape)

and then any commas within the field values, rewrite them as \,

 

Or to use | or \t or something as the separator, you can use FIELDS TERMINATED BY as Gwen suggested.

 

Hope this helps,

John

Don't have an account?
Coming from Hortonworks? Activate your account here