<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: I need to know how to use regex for new line in pig latin in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/I-need-to-know-how-to-use-regex-for-new-line-in-pig-latin/m-p/120515#M47151</link>
    <description>&lt;P&gt;Unfortunately when pig loads data it does it line by line.  When processing data it also does so line by line and does not hold it in memory -- so there is no way to operate over multiple lines.&lt;/P&gt;&lt;P&gt;Similarly, when applying regex, it ignores the newline operator -- once records are loaded you are forced to operate on a record by record basis (though of course you can aggregate into sum, average, etc)&lt;/P&gt;&lt;P&gt;There is one possibility with processing multiple lines, but it will not work in your case: if you have fields in double quotes that have a new line inside the field then you can use piggybank's CSVExcelStorage to remove them.  Since you are using log data this will not work for you.&lt;/P&gt;&lt;P&gt;&lt;A href="https://pig.apache.org/docs/r0.14.0/api/org/apache/pig/piggybank/storage/class-use/CSVExcelStorage.Multiline.html" target="_blank"&gt;https://pig.apache.org/docs/r0.14.0/api/org/apache/pig/piggybank/storage/class-use/CSVExcelStorage.Multiline.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You will have to preprocess the data using another programming paradigm to group your lines (INFO and next n number of lines) together.  &lt;/P&gt;&lt;P&gt;Suggestions are:&lt;/P&gt;&lt;UL&gt;
&lt;LI&gt;Spark&lt;/LI&gt;&lt;LI&gt;map-reduce program where you implement your own InputFormat or RecordReader&lt;/LI&gt;&lt;LI&gt;NiFi (using ExtractText processor and regex, where Enable Multiline Mode = false), typically outside of hadoop&lt;/LI&gt;&lt;LI&gt;awk or sed (outside of hadoop)&lt;/LI&gt;&lt;LI&gt;java or groovy (outside of hadoop)&lt;/LI&gt;&lt;LI&gt;python, R, etc (outside of hadoop)&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;These look like good solutions for you (using Spark):&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;A href="http://stackoverflow.com/questions/32408123/how-to-parse-log-lines-using-spark-that-could-span-multiple-lines" target="_blank"&gt;http://stackoverflow.com/questions/32408123/how-to-parse-log-lines-using-spark-that-could-span-multiple-lines&lt;/A&gt; &lt;/LI&gt;&lt;LI&gt;&lt;A href="http://apache-spark-user-list.1001560.n3.nabble.com/multi-line-elements-td51.html" target="_blank"&gt;http://apache-spark-user-list.1001560.n3.nabble.com/multi-line-elements-td51.html&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;EM&gt;If this is what you are looking for let me know by accepting the answer; else, let me know of any gaps or follow up questions.&lt;/EM&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 24 Nov 2016 20:47:25 GMT</pubDate>
    <dc:creator>gkeys</dc:creator>
    <dc:date>2016-11-24T20:47:25Z</dc:date>
    <item>
      <title>I need to know how to use regex for new line in pig latin</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/I-need-to-know-how-to-use-regex-for-new-line-in-pig-latin/m-p/120514#M47150</link>
      <description>&lt;P&gt;I am using catalana log . my input has is like below lines. for date and others i have no problem but I need to read neext line which is start after INFO . i tried alot but i do not how to bring next line .I have used \\n and \\r but they did not work.&lt;/P&gt;&lt;P&gt;my regex is like this .&lt;/P&gt;&lt;P&gt;A= LOAD 'catalina.log' USING TextLoader AS (line:chararray);&lt;/P&gt;&lt;P&gt; B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line, 
 '^([a-zA-z]{3}\\s[0-9]{1,2},\\s[0-9]{4}\\s[0-9]{1,2}:[0-9]{2}:[0-9]{2}\\s[A-Z]{2})\\n+(.*)INFO:(.*)));&lt;/P&gt;&lt;P&gt;DUMP B;&lt;/P&gt;&lt;P&gt;input :
Nov 3, 2016 11:00:06 AM org.apache.catalina.startup.HostConfig deployDescriptor&lt;/P&gt;&lt;P&gt;INFO: Deploying configuration descriptor host-manager.xmlF&lt;/P&gt;&lt;P&gt;output: Nov 3, 2016 11:00:06 AM org.apache.catalina.startup.HostConfig deployDescriptor&lt;/P&gt;</description>
      <pubDate>Thu, 24 Nov 2016 17:15:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/I-need-to-know-how-to-use-regex-for-new-line-in-pig-latin/m-p/120514#M47150</guid>
      <dc:creator>zolghadr_nima</dc:creator>
      <dc:date>2016-11-24T17:15:03Z</dc:date>
    </item>
    <item>
      <title>Re: I need to know how to use regex for new line in pig latin</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/I-need-to-know-how-to-use-regex-for-new-line-in-pig-latin/m-p/120515#M47151</link>
      <description>&lt;P&gt;Unfortunately when pig loads data it does it line by line.  When processing data it also does so line by line and does not hold it in memory -- so there is no way to operate over multiple lines.&lt;/P&gt;&lt;P&gt;Similarly, when applying regex, it ignores the newline operator -- once records are loaded you are forced to operate on a record by record basis (though of course you can aggregate into sum, average, etc)&lt;/P&gt;&lt;P&gt;There is one possibility with processing multiple lines, but it will not work in your case: if you have fields in double quotes that have a new line inside the field then you can use piggybank's CSVExcelStorage to remove them.  Since you are using log data this will not work for you.&lt;/P&gt;&lt;P&gt;&lt;A href="https://pig.apache.org/docs/r0.14.0/api/org/apache/pig/piggybank/storage/class-use/CSVExcelStorage.Multiline.html" target="_blank"&gt;https://pig.apache.org/docs/r0.14.0/api/org/apache/pig/piggybank/storage/class-use/CSVExcelStorage.Multiline.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You will have to preprocess the data using another programming paradigm to group your lines (INFO and next n number of lines) together.  &lt;/P&gt;&lt;P&gt;Suggestions are:&lt;/P&gt;&lt;UL&gt;
&lt;LI&gt;Spark&lt;/LI&gt;&lt;LI&gt;map-reduce program where you implement your own InputFormat or RecordReader&lt;/LI&gt;&lt;LI&gt;NiFi (using ExtractText processor and regex, where Enable Multiline Mode = false), typically outside of hadoop&lt;/LI&gt;&lt;LI&gt;awk or sed (outside of hadoop)&lt;/LI&gt;&lt;LI&gt;java or groovy (outside of hadoop)&lt;/LI&gt;&lt;LI&gt;python, R, etc (outside of hadoop)&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;These look like good solutions for you (using Spark):&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;A href="http://stackoverflow.com/questions/32408123/how-to-parse-log-lines-using-spark-that-could-span-multiple-lines" target="_blank"&gt;http://stackoverflow.com/questions/32408123/how-to-parse-log-lines-using-spark-that-could-span-multiple-lines&lt;/A&gt; &lt;/LI&gt;&lt;LI&gt;&lt;A href="http://apache-spark-user-list.1001560.n3.nabble.com/multi-line-elements-td51.html" target="_blank"&gt;http://apache-spark-user-list.1001560.n3.nabble.com/multi-line-elements-td51.html&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;EM&gt;If this is what you are looking for let me know by accepting the answer; else, let me know of any gaps or follow up questions.&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 24 Nov 2016 20:47:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/I-need-to-know-how-to-use-regex-for-new-line-in-pig-latin/m-p/120515#M47151</guid>
      <dc:creator>gkeys</dc:creator>
      <dc:date>2016-11-24T20:47:25Z</dc:date>
    </item>
    <item>
      <title>Re: I need to know how to use regex for new line in pig latin</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/I-need-to-know-how-to-use-regex-for-new-line-in-pig-latin/m-p/120516#M47152</link>
      <description>&lt;P&gt;OK, thanks alot  for help , i will  try it &lt;/P&gt;</description>
      <pubDate>Sat, 03 Dec 2016 04:46:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/I-need-to-know-how-to-use-regex-for-new-line-in-pig-latin/m-p/120516#M47152</guid>
      <dc:creator>zolghadr_nima</dc:creator>
      <dc:date>2016-12-03T04:46:40Z</dc:date>
    </item>
  </channel>
</rss>

