<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Overall questions about Oryx 2 in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Overall-questions-about-Oryx-2/m-p/31521#M7106</link>
    <description>&lt;P&gt;Another question about Oryx 2.&lt;/P&gt;&lt;P&gt;The CSV training data is with Unix timestamp.&lt;/P&gt;&lt;P&gt;(1) What's it for ?&lt;/P&gt;&lt;P&gt;(2) Does it matter in the unit of seconds or milliseconds ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;&lt;P&gt;Jason&lt;/P&gt;</description>
    <pubDate>Thu, 03 Sep 2015 22:18:33 GMT</pubDate>
    <dc:creator>JasonChen</dc:creator>
    <dc:date>2015-09-03T22:18:33Z</dc:date>
    <item>
      <title>Overall questions about Oryx 2</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Overall-questions-about-Oryx-2/m-p/31100#M7102</link>
      <description>&lt;P&gt;Sean,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Several questions about Oryx 2:&lt;/P&gt;&lt;P&gt;(1) I know Oryx 2 uses kafka for data pipeline.&amp;nbsp; Does Oryx2 also use Spark Streaming ?&lt;/P&gt;&lt;P&gt;(2) Regarding the update and input topics saved with kafka... If the model is big (say, ~50 GB), it occupies kafka mem (and disk) usage..&lt;/P&gt;&lt;P&gt;right ? Is there a way that serving layer getting model from HDFS directly, while speed layer still able to approximate the predictions&lt;/P&gt;&lt;P&gt;based on real-time events ?&lt;/P&gt;&lt;P&gt;(3) Is the model saved in&amp;nbsp;kafka distributed across the cluster nodes ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;&lt;P&gt;Jason&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 09:38:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Overall-questions-about-Oryx-2/m-p/31100#M7102</guid>
      <dc:creator>JasonChen</dc:creator>
      <dc:date>2022-09-16T09:38:51Z</dc:date>
    </item>
    <item>
      <title>Re: Overall questions about Oryx 2</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Overall-questions-about-Oryx-2/m-p/31101#M7103</link>
      <description>&lt;P&gt;Yes, it uses Spark streaming for the batch and speed layers.&lt;/P&gt;&lt;P&gt;Really big models are just 'passed' to the topic as an HDFS location. The max is configurable but is about 16MB. This tends to only matter for decision forests or ALS models with large numbers of users and items.&lt;/P&gt;&lt;P&gt;The data in Kafka topics is replicated according to the topic config. Yes it can potentially be replicated across the machines that server as brokers.&lt;/P&gt;</description>
      <pubDate>Sun, 23 Aug 2015 19:26:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Overall-questions-about-Oryx-2/m-p/31101#M7103</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2015-08-23T19:26:07Z</dc:date>
    </item>
    <item>
      <title>Re: Overall questions about Oryx 2</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Overall-questions-about-Oryx-2/m-p/31217#M7104</link>
      <description>&lt;P&gt;Quick check...&lt;/P&gt;&lt;P&gt;Does it imply Oryx 2 serving layer can read model from HDFS directly (if the model is big)?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Jason&lt;/P&gt;</description>
      <pubDate>Wed, 26 Aug 2015 02:08:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Overall-questions-about-Oryx-2/m-p/31217#M7104</guid>
      <dc:creator>JasonChen</dc:creator>
      <dc:date>2015-08-26T02:08:33Z</dc:date>
    </item>
    <item>
      <title>Re: Overall questions about Oryx 2</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Overall-questions-about-Oryx-2/m-p/31242#M7105</link>
      <description>&lt;P&gt;Oops, thanks for catching that. Yes the serving layer needs to see HDFS to read big models. You can change a few kafka and oryx configs to allow very big models as kafka messages and thus bigger models if needed, though ideally the serving layer can just see HDFS.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I had also envisioned that the serving layer is often run in or next to the cluster, and isn't publicly visible. It's a service to other front-end systems, or at least behind a load balancer. So exposing a machine with cluster access isn't so crazy as it need not be open to the world.&lt;/P&gt;</description>
      <pubDate>Wed, 26 Aug 2015 08:02:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Overall-questions-about-Oryx-2/m-p/31242#M7105</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2015-08-26T08:02:57Z</dc:date>
    </item>
    <item>
      <title>Re: Overall questions about Oryx 2</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Overall-questions-about-Oryx-2/m-p/31521#M7106</link>
      <description>&lt;P&gt;Another question about Oryx 2.&lt;/P&gt;&lt;P&gt;The CSV training data is with Unix timestamp.&lt;/P&gt;&lt;P&gt;(1) What's it for ?&lt;/P&gt;&lt;P&gt;(2) Does it matter in the unit of seconds or milliseconds ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;&lt;P&gt;Jason&lt;/P&gt;</description>
      <pubDate>Thu, 03 Sep 2015 22:18:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Overall-questions-about-Oryx-2/m-p/31521#M7106</guid>
      <dc:creator>JasonChen</dc:creator>
      <dc:date>2015-09-03T22:18:33Z</dc:date>
    </item>
    <item>
      <title>Re: Overall questions about Oryx 2</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Overall-questions-about-Oryx-2/m-p/31522#M7107</link>
      <description>&lt;P&gt;Timestamp is for ordering, and for determining decay of the strength factor. The ordering of events is not guaranteed by HDFS / Kafka, and does matter to some extent, especially if there are 'delete' events. It also matters when figuring out how old a data point is and how much its value has decayed, if it's enabled.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You could use seconds or milliseconds, I suppose, if you used them consistently. However the serving layer uses a standard ms timestamp, so that's probably best to emulate.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 03 Sep 2015 22:26:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Overall-questions-about-Oryx-2/m-p/31522#M7107</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2015-09-03T22:26:49Z</dc:date>
    </item>
  </channel>
</rss>

