<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Patterns for batch processing time-series data? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Patterns-for-batch-processing-time-series-data/m-p/97304#M10765</link>
    <description>&lt;P&gt;What patterns or practices exist for dealing with time-series data specifically in batch mode, i.e, Tez or MR as opposed to Spark. Sorting orders the data within a block or ORC split, but how are boundaries between blocks usually handled?  For instance, finding derivatives, inflection points, etc. breaks down at file boundaries---are there standard patterns or libraries to deal with this?&lt;/P&gt;</description>
    <pubDate>Thu, 19 Nov 2015 21:16:49 GMT</pubDate>
    <dc:creator>pcoates</dc:creator>
    <dc:date>2015-11-19T21:16:49Z</dc:date>
    <item>
      <title>Patterns for batch processing time-series data?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Patterns-for-batch-processing-time-series-data/m-p/97304#M10765</link>
      <description>&lt;P&gt;What patterns or practices exist for dealing with time-series data specifically in batch mode, i.e, Tez or MR as opposed to Spark. Sorting orders the data within a block or ORC split, but how are boundaries between blocks usually handled?  For instance, finding derivatives, inflection points, etc. breaks down at file boundaries---are there standard patterns or libraries to deal with this?&lt;/P&gt;</description>
      <pubDate>Thu, 19 Nov 2015 21:16:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Patterns-for-batch-processing-time-series-data/m-p/97304#M10765</guid>
      <dc:creator>pcoates</dc:creator>
      <dc:date>2015-11-19T21:16:49Z</dc:date>
    </item>
    <item>
      <title>Re: Patterns for batch processing time-series data?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Patterns-for-batch-processing-time-series-data/m-p/97305#M10766</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/99/bwilson.html" nodeid="99"&gt;@Brandon Wilson&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Fri, 20 Nov 2015 02:17:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Patterns-for-batch-processing-time-series-data/m-p/97305#M10766</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2015-11-20T02:17:26Z</dc:date>
    </item>
    <item>
      <title>Re: Patterns for batch processing time-series data?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Patterns-for-batch-processing-time-series-data/m-p/97306#M10767</link>
      <description>&lt;P&gt;One option that comes to mind is to leverage a custom &lt;A href="https://hadoop.apache.org/docs/r2.6.2/api/org/apache/hadoop/mapreduce/InputFormat.html"&gt;InputFormat&lt;/A&gt;. HDFS doesn't care about where it breaks a file so the input format helps ensure that there are not awkward breaks between blocks when reading files. With this approach, you can define your own notion of a record, whether it be a line of text (&lt;A href="https://hadoop.apache.org/docs/r2.6.2/api/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.html"&gt;TextInputFormat&lt;/A&gt;) or a window that could encapsulate multiple records. &lt;/P&gt;&lt;P&gt;You can then use this custom InputFormat to read the data into An MR job or you can use it to develop you own custom Pig loader to work with your data in Pig. &lt;/P&gt;&lt;P&gt;I am not personally aware of any libraries that have been built to address time-series specifically.&lt;/P&gt;</description>
      <pubDate>Sat, 21 Nov 2015 01:25:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Patterns-for-batch-processing-time-series-data/m-p/97306#M10767</guid>
      <dc:creator>bwilson</dc:creator>
      <dc:date>2015-11-21T01:25:13Z</dc:date>
    </item>
    <item>
      <title>Re: Patterns for batch processing time-series data?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Patterns-for-batch-processing-time-series-data/m-p/97307#M10768</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/99/bwilson.html" nodeid="99"&gt;@Brandon Wilson&lt;/A&gt; Thanks! &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 21 Nov 2015 05:09:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Patterns-for-batch-processing-time-series-data/m-p/97307#M10768</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2015-11-21T05:09:12Z</dc:date>
    </item>
  </channel>
</rss>

