<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Creating an Impala External Table from fixed width csv in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Creating-an-Impala-External-Table-from-fixed-width-csv/m-p/341277#M233482</link>
    <description>&lt;P&gt;Here is the code.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;create external table testtable1 
(code string, codesystem string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
     "input.regex" = "(.{27)(.{50})"
     )
LOCATION '/data/raw/testtable1';&lt;/LI-CODE&gt;&lt;P&gt;The error message is:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;ParseException: Syntax error in line 3:undefined: ROW FORMAT SERDE 'org.apache.hadoop.hiv... ^ Encountered: IDENTIFIER Expected: DELIMITED CAUSED BY: Exception: Syntax error&lt;/LI-CODE&gt;&lt;P&gt;It looks like Impala table only accepts "Row Format Delimited".&lt;/P&gt;&lt;P&gt;Then how can I create an hive table with fixed width layout? Should I just do it outside Impala, bu through Hive, and then do other data operation on this table via Impala?&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
    <pubDate>Tue, 12 Apr 2022 22:46:42 GMT</pubDate>
    <dc:creator>Seaport</dc:creator>
    <dc:date>2022-04-12T22:46:42Z</dc:date>
    <item>
      <title>Creating an Impala External Table from fixed width csv</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Creating-an-Impala-External-Table-from-fixed-width-csv/m-p/341277#M233482</link>
      <description>&lt;P&gt;Here is the code.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;create external table testtable1 
(code string, codesystem string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
     "input.regex" = "(.{27)(.{50})"
     )
LOCATION '/data/raw/testtable1';&lt;/LI-CODE&gt;&lt;P&gt;The error message is:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;ParseException: Syntax error in line 3:undefined: ROW FORMAT SERDE 'org.apache.hadoop.hiv... ^ Encountered: IDENTIFIER Expected: DELIMITED CAUSED BY: Exception: Syntax error&lt;/LI-CODE&gt;&lt;P&gt;It looks like Impala table only accepts "Row Format Delimited".&lt;/P&gt;&lt;P&gt;Then how can I create an hive table with fixed width layout? Should I just do it outside Impala, bu through Hive, and then do other data operation on this table via Impala?&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Tue, 12 Apr 2022 22:46:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Creating-an-Impala-External-Table-from-fixed-width-csv/m-p/341277#M233482</guid>
      <dc:creator>Seaport</dc:creator>
      <dc:date>2022-04-12T22:46:42Z</dc:date>
    </item>
    <item>
      <title>Re: Creating an Impala External Table from fixed width csv</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Creating-an-Impala-External-Table-from-fixed-width-csv/m-p/341317#M233490</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/45630"&gt;@Seaport&lt;/a&gt;&amp;nbsp;, the "RegexSerDe" is in the contrib package, which is not supported officially, and as such you can use it in some parts of the platform but the different components may not give you full support for that.&lt;/P&gt;&lt;P&gt;I would recommend you to preprocess the datafiles to have a commonly consumable format (CSV) before ingesting them into the cluster.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Alternatively you can ingest it into a table which has only a single (string) column, and then do the processing/validation/formatting/transforming of the data with inserting it into a proper final table with the columns you need. During the insert you can still use "regex" or "substring" type of functions / UDFs to extract the fields you need from the fixed-width datafiles (from the table with a single column).&lt;/P&gt;&lt;P&gt;I hope this helps,&lt;/P&gt;&lt;P&gt;Best regards, Miklos&lt;/P&gt;</description>
      <pubDate>Wed, 13 Apr 2022 09:36:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Creating-an-Impala-External-Table-from-fixed-width-csv/m-p/341317#M233490</guid>
      <dc:creator>mszurap</dc:creator>
      <dc:date>2022-04-13T09:36:14Z</dc:date>
    </item>
    <item>
      <title>Re: Creating an Impala External Table from fixed width csv</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Creating-an-Impala-External-Table-from-fixed-width-csv/m-p/341470#M233533</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/12885"&gt;@mszurap&lt;/a&gt;&amp;nbsp;Thanks for the response. I actually took the 2nd option you mentioned&amp;nbsp; -&amp;nbsp;&lt;SPAN&gt;&amp;nbsp;ingesting it into a table which has only a single (string) column. But I am not sure whether it is the right approach. I appreciate the confirmation.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Regards,&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 14 Apr 2022 17:42:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Creating-an-Impala-External-Table-from-fixed-width-csv/m-p/341470#M233533</guid>
      <dc:creator>Seaport</dc:creator>
      <dc:date>2022-04-14T17:42:19Z</dc:date>
    </item>
  </channel>
</rss>

