<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Create a hive table upon a fixed-width log file but the last column width is not fixed in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Create-a-hive-table-upon-a-fixed-width-log-file-but-the-last/m-p/241453#M203256</link>
    <description>&lt;P&gt;Basically, I am trying to analyze SQL server log files using Hive.&lt;/P&gt;&lt;P&gt;The layout is the log file is&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;char(23) - for timestamp&lt;/LI&gt;&lt;LI&gt;char(1) - space&lt;/LI&gt;&lt;LI&gt;char(12) - source&lt;/LI&gt;&lt;LI&gt;the rest of the row, and the length varies. &lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;The row delimiter is CR+LF.&lt;/P&gt;&lt;P&gt; Below are some entries in the log.&lt;/P&gt;&lt;PRE&gt;2019-05-28 07:29:55.03 Server &amp;nbsp; &amp;nbsp; &amp;nbsp;UTC adjustment: -7:00 
2019-05-28 07:29:55.03 Server &amp;nbsp; &amp;nbsp; &amp;nbsp;(c) Microsoft Corporation. 
2019-05-28 07:29:55.03 Server &amp;nbsp; &amp;nbsp; &amp;nbsp;All rights reserved. 
2019-05-28 07:29:55.03 Server &amp;nbsp; &amp;nbsp; &amp;nbsp;Server process ID is 3368.&lt;/PRE&gt;&lt;P&gt;There are several posts here regarding fixed-width column layout. But in my case, the last column is identified not by the width but by the row delimiter.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 31 May 2019 21:39:35 GMT</pubDate>
    <dc:creator>Seaport</dc:creator>
    <dc:date>2019-05-31T21:39:35Z</dc:date>
    <item>
      <title>Create a hive table upon a fixed-width log file but the last column width is not fixed</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Create-a-hive-table-upon-a-fixed-width-log-file-but-the-last/m-p/241453#M203256</link>
      <description>&lt;P&gt;Basically, I am trying to analyze SQL server log files using Hive.&lt;/P&gt;&lt;P&gt;The layout is the log file is&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;char(23) - for timestamp&lt;/LI&gt;&lt;LI&gt;char(1) - space&lt;/LI&gt;&lt;LI&gt;char(12) - source&lt;/LI&gt;&lt;LI&gt;the rest of the row, and the length varies. &lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;The row delimiter is CR+LF.&lt;/P&gt;&lt;P&gt; Below are some entries in the log.&lt;/P&gt;&lt;PRE&gt;2019-05-28 07:29:55.03 Server &amp;nbsp; &amp;nbsp; &amp;nbsp;UTC adjustment: -7:00 
2019-05-28 07:29:55.03 Server &amp;nbsp; &amp;nbsp; &amp;nbsp;(c) Microsoft Corporation. 
2019-05-28 07:29:55.03 Server &amp;nbsp; &amp;nbsp; &amp;nbsp;All rights reserved. 
2019-05-28 07:29:55.03 Server &amp;nbsp; &amp;nbsp; &amp;nbsp;Server process ID is 3368.&lt;/PRE&gt;&lt;P&gt;There are several posts here regarding fixed-width column layout. But in my case, the last column is identified not by the width but by the row delimiter.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 31 May 2019 21:39:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Create-a-hive-table-upon-a-fixed-width-log-file-but-the-last/m-p/241453#M203256</guid>
      <dc:creator>Seaport</dc:creator>
      <dc:date>2019-05-31T21:39:35Z</dc:date>
    </item>
    <item>
      <title>Re: Create a hive table upon a fixed-width log file but the last column width is not fixed</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Create-a-hive-table-upon-a-fixed-width-log-file-but-the-last/m-p/241454#M203257</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/112879/li-h.html"&gt;@Haijin Li&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Use &lt;A rel="noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer" href="https://community.hortonworks.com/questions/76198/can-we-create-external-hive-table-on-top-of-fixed.html" target="_blank"&gt;hive Regex serde&lt;/A&gt; and your matching regex will be&lt;/P&gt;&lt;PRE&gt;(.{22})(.{1})(.{12})(.*)&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;(.{22}) -&amp;gt; 1st capture group for 22 characters&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;(.{1}) -&amp;gt; 2nd for 1 character&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;(.{12}) -&amp;gt; 3rd for 12 characters&lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;(.*) -&amp;gt; 4th capture group matches for rest of the row.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 01 Jun 2019 22:57:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Create-a-hive-table-upon-a-fixed-width-log-file-but-the-last/m-p/241454#M203257</guid>
      <dc:creator>Shu_ashu</dc:creator>
      <dc:date>2019-06-01T22:57:11Z</dc:date>
    </item>
    <item>
      <title>Re: Create a hive table upon a fixed-width log file but the last column width is not fixed</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Create-a-hive-table-upon-a-fixed-width-log-file-but-the-last/m-p/241455#M203258</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/18929/yaswanthmuppireddy.html" nodeid="18929"&gt;@Shu&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Thanks for your help. I am currently on the road and will test your solution once I am back in office.&lt;/P&gt;</description>
      <pubDate>Sun, 02 Jun 2019 10:22:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Create-a-hive-table-upon-a-fixed-width-log-file-but-the-last/m-p/241455#M203258</guid>
      <dc:creator>Seaport</dc:creator>
      <dc:date>2019-06-02T10:22:25Z</dc:date>
    </item>
    <item>
      <title>Re: Create a hive table upon a fixed-width log file but the last column width is not fixed</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Create-a-hive-table-upon-a-fixed-width-log-file-but-the-last/m-p/241456#M203259</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/18929/yaswanthmuppireddy.html" nodeid="18929" target="_blank"&gt;@Shu&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Your code put me on the right track. Thanks again. &lt;/P&gt;&lt;P&gt;However, I got some strange returned records. Here are my guesses of the possible causes.&lt;/P&gt;&lt;P&gt;1. The CR and LF as the row delimiter and they are not the default row delimiter of the regex SerDe. How can I specify the row delimiter?&lt;/P&gt;&lt;P&gt;2. There are two strange characters in the first column of the first row. It might be related to the row delimiter too. &lt;/P&gt;&lt;P&gt;Below is the create-table script.&lt;/P&gt;&lt;P&gt;Create External Table slog(LogTime string, LogSource string, LogMessage string) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ("input.regex" = "(.{46})(.{24})(.*)") LOCATION '/path/to/slog/';&lt;/P&gt;&lt;P&gt;I attached a screenshot of the log file (in notepad ++) and the hive query result.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="109151-captureserde.png" style="width: 978px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/13730i3FECCA5675A0E21F/image-size/medium?v=v2&amp;amp;px=400" role="button" title="109151-captureserde.png" alt="109151-captureserde.png" /&gt;&lt;/span&gt; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 17 Aug 2019 22:14:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Create-a-hive-table-upon-a-fixed-width-log-file-but-the-last/m-p/241456#M203259</guid>
      <dc:creator>Seaport</dc:creator>
      <dc:date>2019-08-17T22:14:55Z</dc:date>
    </item>
    <item>
      <title>Re: Create a hive table upon a fixed-width log file but the last column width is not fixed</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Create-a-hive-table-upon-a-fixed-width-log-file-but-the-last/m-p/241457#M203260</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/112879/li-h.html" nodeid="112879"&gt;@Haijin Li&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Try with &lt;A rel="noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer" href="https://stackoverflow.com/questions/26588319/how-to-use-regex-serde-for-as-delimiter-in-file" target="_blank"&gt;this&lt;/A&gt; and &lt;A rel="noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer" href="https://stackoverflow.com/questions/28278682/regex-how-to-create-regex-for-hive-regex-serde" target="_blank"&gt;this&lt;/A&gt; approaches and also to test out the regex serde functionality, Create new file using &lt;STRONG&gt;vi editor&lt;/STRONG&gt; in shell and move it to HDFS directory and create table on top of this directory.&lt;/P&gt;</description>
      <pubDate>Tue, 04 Jun 2019 10:04:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Create-a-hive-table-upon-a-fixed-width-log-file-but-the-last/m-p/241457#M203260</guid>
      <dc:creator>Shu_ashu</dc:creator>
      <dc:date>2019-06-04T10:04:58Z</dc:date>
    </item>
  </channel>
</rss>

