<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Using filter in joined dataset in spark ? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-filter-in-joined-dataset-in-spark/m-p/18762#M2934</link>
    <description>&lt;P&gt;&lt;SPAN&gt;I am joining two datasets , first one coming&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;from stream and second one which is in HDFS.&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;After joining the two datasets , I need to apply&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;filter on the joined datasets, but here I am facing as issue. Please assist&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;to resolve.&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I am using the code below,&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;PRE&gt;val streamkv = streamrecs.map(_.split("~")).map(r =&amp;gt; ( r(0), (r(5), r(6))))&amp;nbsp;
val HDFSlines = sc.textFile("/user/Rest/sample.dat").map(_.split("~")).map(r&amp;nbsp;
=&amp;gt; ( r(1), (r(0) r(3),r(4),)))&amp;nbsp;
val streamwindow = streamkv.window(Minutes(1))&amp;nbsp;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;val join1 = streamwindow.transform(joinRDD =&amp;gt; { joinRDD.join(HDFSlines)} )&amp;nbsp;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I am getting the following error, when I use the filter&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;PRE&gt;val tofilter = join1.filter {&amp;nbsp;
&amp;nbsp;| case (_, (_, _),(_,_,device)) =&amp;gt;&amp;nbsp;
&amp;nbsp;| device.contains("iPhone")&amp;nbsp;
&amp;nbsp;| }.count()&amp;nbsp;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;error: constructor cannot be instantiated to expected type;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;found &amp;nbsp; : (T1, T2, T3)&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;required: (String, ((String, String), (String, String, String)))&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;case (_, (_, _),(_,_,device)) =&amp;gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;How can I solve this error?.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 09:07:39 GMT</pubDate>
    <dc:creator>ArunShell</dc:creator>
    <dc:date>2022-09-16T09:07:39Z</dc:date>
    <item>
      <title>Using filter in joined dataset in spark ?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-filter-in-joined-dataset-in-spark/m-p/18762#M2934</link>
      <description>&lt;P&gt;&lt;SPAN&gt;I am joining two datasets , first one coming&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;from stream and second one which is in HDFS.&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;After joining the two datasets , I need to apply&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;filter on the joined datasets, but here I am facing as issue. Please assist&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;to resolve.&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I am using the code below,&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;PRE&gt;val streamkv = streamrecs.map(_.split("~")).map(r =&amp;gt; ( r(0), (r(5), r(6))))&amp;nbsp;
val HDFSlines = sc.textFile("/user/Rest/sample.dat").map(_.split("~")).map(r&amp;nbsp;
=&amp;gt; ( r(1), (r(0) r(3),r(4),)))&amp;nbsp;
val streamwindow = streamkv.window(Minutes(1))&amp;nbsp;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;val join1 = streamwindow.transform(joinRDD =&amp;gt; { joinRDD.join(HDFSlines)} )&amp;nbsp;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I am getting the following error, when I use the filter&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;PRE&gt;val tofilter = join1.filter {&amp;nbsp;
&amp;nbsp;| case (_, (_, _),(_,_,device)) =&amp;gt;&amp;nbsp;
&amp;nbsp;| device.contains("iPhone")&amp;nbsp;
&amp;nbsp;| }.count()&amp;nbsp;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;error: constructor cannot be instantiated to expected type;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;found &amp;nbsp; : (T1, T2, T3)&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;required: (String, ((String, String), (String, String, String)))&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;case (_, (_, _),(_,_,device)) =&amp;gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;How can I solve this error?.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 09:07:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-filter-in-joined-dataset-in-spark/m-p/18762#M2934</guid>
      <dc:creator>ArunShell</dc:creator>
      <dc:date>2022-09-16T09:07:39Z</dc:date>
    </item>
    <item>
      <title>Re: Using filter in joined dataset in spark ?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-filter-in-joined-dataset-in-spark/m-p/18766#M2935</link>
      <description>&lt;P&gt;Your signature is just a little bit off. The result of a join is not a triple, but a tuple whose second element is a tuple.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You have:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;(_, (_, _),(_,_,device))&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;but I think you need:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;(_, ((_, _),(_,_,device)))&lt;/PRE&gt;</description>
      <pubDate>Mon, 15 Sep 2014 12:19:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-filter-in-joined-dataset-in-spark/m-p/18766#M2935</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2014-09-15T12:19:03Z</dc:date>
    </item>
  </channel>
</rss>

