<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Hive and comma delimited fields in managed table in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Hive-and-comma-delimited-fields-in-managed-table/m-p/299629#M219773</link>
    <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am uploading Navigator logs into Hive for analysis. I used this:&amp;nbsp;&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'&lt;BR /&gt;WITH SERDEPROPERTIES (&lt;BR /&gt;"separatorChar" = ",",&lt;BR /&gt;"quoteChar" = '"',&lt;BR /&gt;"escapeChar" = "\\"&lt;BR /&gt;)&lt;BR /&gt;STORED AS TEXTFILE&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Based off:&amp;nbsp;&lt;A href="https://community.cloudera.com/t5/Support-Questions/Hive-escaping-field-delimiter-in-column-value/m-p/233346#M195173" target="_blank" rel="noopener"&gt;https://community.cloudera.com/t5/Support-Questions/Hive-escaping-field-delimiter-in-column-value/m-p/233346#M195173&lt;/A&gt;&amp;nbsp;to get an external table to handle the commas in queries. I think it's working fine (but will test tomorrow to make sure).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, I then do a move to a managed partitioned table, but it is not handling the commas in queries correctly. I created it as:&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;PARTITIONED BY(event_day INT, event_month INT, event_year INT);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How do I handle the commas in queries as part of this move?&lt;/P&gt;&lt;P&gt;Does the managed table need the same delimter &amp;amp; escape info as the external table?&lt;/P&gt;&lt;P&gt;If that's the case, what's the syntax for it?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Schwifty&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 14:38:11 GMT</pubDate>
    <dc:creator>getschwifty</dc:creator>
    <dc:date>2022-09-16T14:38:11Z</dc:date>
    <item>
      <title>Hive and comma delimited fields in managed table</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-and-comma-delimited-fields-in-managed-table/m-p/299629#M219773</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am uploading Navigator logs into Hive for analysis. I used this:&amp;nbsp;&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'&lt;BR /&gt;WITH SERDEPROPERTIES (&lt;BR /&gt;"separatorChar" = ",",&lt;BR /&gt;"quoteChar" = '"',&lt;BR /&gt;"escapeChar" = "\\"&lt;BR /&gt;)&lt;BR /&gt;STORED AS TEXTFILE&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Based off:&amp;nbsp;&lt;A href="https://community.cloudera.com/t5/Support-Questions/Hive-escaping-field-delimiter-in-column-value/m-p/233346#M195173" target="_blank" rel="noopener"&gt;https://community.cloudera.com/t5/Support-Questions/Hive-escaping-field-delimiter-in-column-value/m-p/233346#M195173&lt;/A&gt;&amp;nbsp;to get an external table to handle the commas in queries. I think it's working fine (but will test tomorrow to make sure).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, I then do a move to a managed partitioned table, but it is not handling the commas in queries correctly. I created it as:&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;PARTITIONED BY(event_day INT, event_month INT, event_year INT);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How do I handle the commas in queries as part of this move?&lt;/P&gt;&lt;P&gt;Does the managed table need the same delimter &amp;amp; escape info as the external table?&lt;/P&gt;&lt;P&gt;If that's the case, what's the syntax for it?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Schwifty&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 14:38:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-and-comma-delimited-fields-in-managed-table/m-p/299629#M219773</guid>
      <dc:creator>getschwifty</dc:creator>
      <dc:date>2022-09-16T14:38:11Z</dc:date>
    </item>
    <item>
      <title>Re: Hive and comma delimited fields in managed table</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-and-comma-delimited-fields-in-managed-table/m-p/299712#M219806</link>
      <description>&lt;P&gt;So, this:&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;&lt;SPAN&gt;ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;WITH SERDEPROPERTIES (&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;"separatorChar" = ",",&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;"quoteChar" = '"',&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;"escapeChar" = "\\"&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;STORED AS TEXTFILE&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;does not handle commas in csv records, even if they are enclosed with " " (ie:&amp;nbsp;"CREATE TABLE db.table AS ( SELECT db.col1, db.col2, db.col3, db.col3,... etc"&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;There has to be a way to do this. Does anyone know?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 14 Jul 2020 23:58:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-and-comma-delimited-fields-in-managed-table/m-p/299712#M219806</guid>
      <dc:creator>getschwifty</dc:creator>
      <dc:date>2020-07-14T23:58:45Z</dc:date>
    </item>
    <item>
      <title>Re: Hive and comma delimited fields in managed table</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-and-comma-delimited-fields-in-managed-table/m-p/299714#M219808</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This isn't meant to be a blog post. Here's the answer:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;csv file has lots of new line characters. Probably (I'm guessing) from where the devs were writing their queries? Who knows. Either way, it looked like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;Timestamp,Username,"IP Address","Service Name",Operation,Resource,Allowed,Impersonator,sub_operation,entity_id,stored_object_name,additional_info,collection_name,solr_version,operation_params,service,operation_text,url,operation_text,table_name,resource_path,database_name,object_type,Source,Destination,Permissions,"Delegation Token ID","Table Name",Family,Qualifier,"Operation Text","Database Name","Table Name","Object Type","Resource Path","Usage Type","Operation Text","Database Name","Table Name","Object Type","Resource Path","Usage Type","Operation Text","Query ID","Session ID",Status,"Database Name","Table Name","Object Type",Privilege$&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;2020-07-01T22:49:13.000Z,user1,::ffff:xx.xx.xx.xx,IMPALA,QUERY,env_db:table_&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;$&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;name,true,"&lt;A href="mailto:hue/dcakne19.csda.gov.au@INTERNAL.DEPT.LOCAL" target="_blank"&gt;hue/host19.fqdn@DOMAIN&lt;/A&gt;",,,,,,,,,,,"select * fro$&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;m table_name",table_name2 etc etc",,,&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Crazy. New line characters in the middle of words. Found this online:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;awk 'NR == 1{ printf $0; next }; { printf "%s%s", (/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]T[0-9][0-9]:[0-9][0-9]+/? ORS : ""), $0 } END{ print "" }' inputfile.csv &amp;gt; outputfile.csv&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am terrible at regex. Can't tell you why it's doing what it's doing. But it works. Check against the timestamp, which each line starts with:&amp;nbsp;&lt;STRONG&gt;2020-07-08T23:49&lt;/STRONG&gt;:13.000Z&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Strip off the header line first, then run the newline stripper code:&lt;/P&gt;&lt;P&gt;sed -i 1d inputfile.csv&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Testing looks good. Time for lunch.&lt;/P&gt;</description>
      <pubDate>Wed, 15 Jul 2020 03:08:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-and-comma-delimited-fields-in-managed-table/m-p/299714#M219808</guid>
      <dc:creator>getschwifty</dc:creator>
      <dc:date>2020-07-15T03:08:45Z</dc:date>
    </item>
  </channel>
</rss>

