<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: hive csv serde not working properly in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/hive-csv-serde-not-working-properly/m-p/128351#M39250</link>
    <description>&lt;P&gt;@bpreachuk: You are right about that. I exported it from RDB and mapped it to hive. but my CSV seems to have just the right structure. Just that when I map it to CSVserde, it changed nulll values into """" and then I see " as value in the field instead of NULLS. similarly, it gives me values like 61.8 with random quotes and that is creating the problem&lt;/P&gt;</description>
    <pubDate>Wed, 31 Aug 2016 18:39:59 GMT</pubDate>
    <dc:creator>simran_k</dc:creator>
    <dc:date>2016-08-31T18:39:59Z</dc:date>
    <item>
      <title>hive csv serde not working properly</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/hive-csv-serde-not-working-properly/m-p/128349#M39248</link>
      <description>&lt;P&gt;I get some rows broken like:&lt;/P&gt;&lt;PRE&gt;admin,base,Default,configurable,"50,76,188,467",IN1541MTODREWHT-187,1,Plush Maxi Dress,Buy Plush Maxi Dress Online | Maxi Dresses | StalkBuyLove,/i/n/in1541mtodrewht-187-front.jpg,/i/n/in1541mtodrewht-187-front.jpg,plush-maxi-dress,plush-maxi-dress-97763-SBLPR.html,"""","""",Block after Info Column,"""","""","""", ,No,"""",No,Tax 5,"""",NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL ",61.8""",



&lt;/PRE&gt;&lt;P&gt;61.8 actually comes into next row and when I open it in excel , it looks like;&lt;/P&gt;&lt;PRE&gt;,61.8"&lt;/PRE&gt;&lt;P&gt;in the field. &lt;/P&gt;&lt;P&gt;I don't want my rows breaking. It happens after I use CSV serde and then download the data from hive. &lt;/P&gt;</description>
      <pubDate>Wed, 31 Aug 2016 16:27:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/hive-csv-serde-not-working-properly/m-p/128349#M39248</guid>
      <dc:creator>simran_k</dc:creator>
      <dc:date>2016-08-31T16:27:55Z</dc:date>
    </item>
    <item>
      <title>Re: hive csv serde not working properly</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/hive-csv-serde-not-working-properly/m-p/128350#M39249</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/10486/simrank.html" nodeid="10486"&gt;@Simran Kaur&lt;/A&gt;. &lt;/P&gt;&lt;P&gt;This looks like a very complex set of data with a wildly varying structure. Pipe characters, empty strings (""""), forward slashes, quoted and unquoted text, overloaded string fields delimited internally with pipes, etc.  &lt;/P&gt;&lt;P&gt;From what you state, It appears that this data came from a source (file or RDBMS) -&amp;gt; then was loaded into HDFS -&amp;gt; then a Hive table structure placed on the file using CSVSerde.    &lt;/P&gt;&lt;P&gt;Are you using comma as the delimiter for the table?  If you could paste in the table DDL it would help. &lt;/P&gt;&lt;P&gt;My suspicion is that the data in the source data file is not correctly formatted so it causing the CSVSerde to have a weird value in that column.  I suspect that [in the Hive table] the final column in question actually holds this 14 character literal string: &lt;/P&gt;&lt;PRE&gt;NULL ",61.8"""&lt;/PRE&gt;&lt;P&gt;Please validate if that is true via a select statement.&lt;/P&gt;&lt;P&gt;Also please confirm that the source file does not have a &amp;lt;CR&amp;gt;&amp;lt;LF&amp;gt; character after the NULL and before the double-quote. &lt;/P&gt;&lt;P&gt;In either case - the file may require some data cleansing, and it may make sense to use a different delimiter on the source file - perhaps a tilde "~".   &lt;/P&gt;</description>
      <pubDate>Wed, 31 Aug 2016 17:38:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/hive-csv-serde-not-working-properly/m-p/128350#M39249</guid>
      <dc:creator>bpreachuk</dc:creator>
      <dc:date>2016-08-31T17:38:52Z</dc:date>
    </item>
    <item>
      <title>Re: hive csv serde not working properly</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/hive-csv-serde-not-working-properly/m-p/128351#M39250</link>
      <description>&lt;P&gt;@bpreachuk: You are right about that. I exported it from RDB and mapped it to hive. but my CSV seems to have just the right structure. Just that when I map it to CSVserde, it changed nulll values into """" and then I see " as value in the field instead of NULLS. similarly, it gives me values like 61.8 with random quotes and that is creating the problem&lt;/P&gt;</description>
      <pubDate>Wed, 31 Aug 2016 18:39:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/hive-csv-serde-not-working-properly/m-p/128351#M39250</guid>
      <dc:creator>simran_k</dc:creator>
      <dc:date>2016-08-31T18:39:59Z</dc:date>
    </item>
    <item>
      <title>Re: hive csv serde not working properly</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/hive-csv-serde-not-working-properly/m-p/128352#M39251</link>
      <description>&lt;P&gt;Without seeing the input file you may need to do a pre-processing step in pig or do a second 'create table as' step in order to reformat the data correctly.    &lt;/P&gt;&lt;P&gt;It's great to use csv-serde since it does such a good job of stripping out quoted text among other things, but you may need that extra processing in order to use csv-serde effectively (handling NULLs and double-quotes the way you want it to).  &lt;/P&gt;</description>
      <pubDate>Wed, 31 Aug 2016 20:25:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/hive-csv-serde-not-working-properly/m-p/128352#M39251</guid>
      <dc:creator>bpreachuk</dc:creator>
      <dc:date>2016-08-31T20:25:25Z</dc:date>
    </item>
  </channel>
</rss>

