<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to create spark PairRDD in scala .? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-create-spark-PairRDD-in-scala/m-p/30811#M6890</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;To create a pair-RDD from a RDD,&amp;nbsp;I used the "keyBy" transformation to extract the key from each value:&lt;/P&gt;&lt;PRE&gt;val fileC = sc.textFile("hdfs://.../user/.../myfile.txt")
                          .keyBy(line =&amp;gt; line.substring(5,13).trim())
                          .mapValues(line =&amp;gt; (    line.substring(87,92).trim()
                                              ,   line.substring(99,112).trim()
                                              ,   line.substring(120,126).trim()
                                              ,   line.substring(127,131).trim()
                                              )
                                    )&lt;/PRE&gt;&lt;P&gt;The "keyBy" provides me a new pair-RDD for which the key is a substring of my text value.&lt;/P&gt;&lt;P&gt;Then the "mapValues" transformations opers like a "map" one on each value of my pair-RDD, not on keys...&lt;/P&gt;</description>
    <pubDate>Fri, 14 Aug 2015 14:42:33 GMT</pubDate>
    <dc:creator>Grg</dc:creator>
    <dc:date>2015-08-14T14:42:33Z</dc:date>
    <item>
      <title>How to create spark PairRDD in scala .?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-create-spark-PairRDD-in-scala/m-p/30512#M6889</link>
      <description>&lt;P&gt;I am trying to verify cogroup join and groupByKey for PairRDDs. I could check that in Spark Java API. But, cannot do it with scala project&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Below is the simple code that i tried, let me know where i made mistake.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;object PairsCheck {&lt;/P&gt;&lt;P&gt;def main(args: Array[String]) = {&lt;BR /&gt;&lt;BR /&gt;val conf = new SparkConf;&lt;BR /&gt;val sc = new SparkContext(conf)&lt;BR /&gt;&lt;BR /&gt;val lines = sc.textFile("/home/test1.txt")&lt;/P&gt;&lt;P&gt;val lines2 = sc.textFile("/home/test2.txt")&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;val words = lines.flatMap { x =&amp;gt; x.split("\\W+") }&lt;BR /&gt;val words2 = lines2.flatMap { x =&amp;gt; x.split("\\W+") }&lt;BR /&gt;&lt;BR /&gt;val pairs: RDD[(Int, String)] = words.map {case(x) =&amp;gt; (x.length(), x) }&lt;BR /&gt;&lt;SPAN&gt;val pairs2: RDD[(Int, String)] = words2.map {case(x) =&amp;gt; (x.length(), x) }&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;import org.apache.spark.SparkContext._&lt;BR /&gt;// --&amp;gt; Here i tried to call join/co group functions that applies for pairsRDD, but could not do that. If i call join, it is throwing error.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you in advance.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 09:36:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-create-spark-PairRDD-in-scala/m-p/30512#M6889</guid>
      <dc:creator>Srini_D</dc:creator>
      <dc:date>2022-09-16T09:36:58Z</dc:date>
    </item>
    <item>
      <title>Re: How to create spark PairRDD in scala .?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-create-spark-PairRDD-in-scala/m-p/30811#M6890</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;To create a pair-RDD from a RDD,&amp;nbsp;I used the "keyBy" transformation to extract the key from each value:&lt;/P&gt;&lt;PRE&gt;val fileC = sc.textFile("hdfs://.../user/.../myfile.txt")
                          .keyBy(line =&amp;gt; line.substring(5,13).trim())
                          .mapValues(line =&amp;gt; (    line.substring(87,92).trim()
                                              ,   line.substring(99,112).trim()
                                              ,   line.substring(120,126).trim()
                                              ,   line.substring(127,131).trim()
                                              )
                                    )&lt;/PRE&gt;&lt;P&gt;The "keyBy" provides me a new pair-RDD for which the key is a substring of my text value.&lt;/P&gt;&lt;P&gt;Then the "mapValues" transformations opers like a "map" one on each value of my pair-RDD, not on keys...&lt;/P&gt;</description>
      <pubDate>Fri, 14 Aug 2015 14:42:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-create-spark-PairRDD-in-scala/m-p/30811#M6890</guid>
      <dc:creator>Grg</dc:creator>
      <dc:date>2015-08-14T14:42:33Z</dc:date>
    </item>
  </channel>
</rss>

