Support Questions

simran_k · ‎08-31-2016

I get some rows broken like:

admin,base,Default,configurable,"50,76,188,467",IN1541MTODREWHT-187,1,Plush Maxi Dress,Buy Plush Maxi Dress Online | Maxi Dresses | StalkBuyLove,/i/n/in1541mtodrewht-187-front.jpg,/i/n/in1541mtodrewht-187-front.jpg,plush-maxi-dress,plush-maxi-dress-97763-SBLPR.html,"""","""",Block after Info Column,"""","""","""", ,No,"""",No,Tax 5,"""",NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL ",61.8""",

61.8 actually comes into next row and when I open it in excel , it looks like;

,61.8"

in the field.

I don't want my rows breaking. It happens after I use CSV serde and then download the data from hive.

bpreachuk · ‎08-31-2016

Without seeing the input file you may need to do a pre-processing step in pig or do a second 'create table as' step in order to reformat the data correctly.

It's great to use csv-serde since it does such a good job of stripping out quoted text among other things, but you may need that extra processing in order to use csv-serde effectively (handling NULLs and double-quotes the way you want it to).

View solution in original post

bpreachuk · ‎08-31-2016

Hi @Simran Kaur.

This looks like a very complex set of data with a wildly varying structure. Pipe characters, empty strings (""""), forward slashes, quoted and unquoted text, overloaded string fields delimited internally with pipes, etc.

From what you state, It appears that this data came from a source (file or RDBMS) -> then was loaded into HDFS -> then a Hive table structure placed on the file using CSVSerde.

Are you using comma as the delimiter for the table? If you could paste in the table DDL it would help.

My suspicion is that the data in the source data file is not correctly formatted so it causing the CSVSerde to have a weird value in that column. I suspect that [in the Hive table] the final column in question actually holds this 14 character literal string:

NULL ",61.8"""

Please validate if that is true via a select statement.

Also please confirm that the source file does not have a <CR><LF> character after the NULL and before the double-quote.

In either case - the file may require some data cleansing, and it may make sense to use a different delimiter on the source file - perhaps a tilde "~".

simran_k · ‎08-31-2016

@bpreachuk: You are right about that. I exported it from RDB and mapped it to hive. but my CSV seems to have just the right structure. Just that when I map it to CSVserde, it changed nulll values into """" and then I see " as value in the field instead of NULLS. similarly, it gives me values like 61.8 with random quotes and that is creating the problem

bpreachuk · ‎08-31-2016

Without seeing the input file you may need to do a pre-processing step in pig or do a second 'create table as' step in order to reformat the data correctly.

It's great to use csv-serde since it does such a good job of stripping out quoted text among other things, but you may need that extra processing in order to use csv-serde effectively (handling NULLs and double-quotes the way you want it to).

Cloudera Community

Support Questions

hive csv serde not working properly

Hive Null comparison not works properly

Issue with Hive JSON SerDe

Authorization configured for nifi resources in ra...

hive.reloadable.aux.jars.path not working properly

working with HBase and Hive (WIP)

XML and Hive parsing error with Serde.

regex pattern for hive regex serde

problem receiving Data on HandleHTTPRequest not wo...

Adding Hive SerDe jar on SparkSQL Thrift Server

Converting a Large JSON File into CSV