<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: RDD questions in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/RDD-questions/m-p/319363#M227775</link>
    <description>&lt;P&gt;Thanks for the update.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;scala&amp;gt; val myRDD=spark.read.textFile("/devsh_loudacre/frostroad.txt")&lt;BR /&gt;myRDD: org.apache.spark.sql.Dataset[String] = [value: string]&lt;/P&gt;&lt;P&gt;why does myRDD.parallelize not working for above?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;scala&amp;gt; val myRDD1=sc.parallelize(myRDD)&lt;BR /&gt;&amp;lt;console&amp;gt;:26: error: type mismatch;&lt;BR /&gt;found : org.apache.spark.sql.Dataset[String]&lt;BR /&gt;required: Seq[?]&lt;BR /&gt;Error occurred in an application involving default arguments.&lt;BR /&gt;val myRDD1=sc.parallelize(myRDD)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Does the above mean a dataset has been created?&lt;/P&gt;&lt;P&gt;what is the difference between the above and below?&lt;/P&gt;&lt;P&gt;val myRDD2=sc.textFile("/devsh_loudacre/frostroad.txt")&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;can I add the .parallelize function with the above command?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Roshan&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sat, 26 Jun 2021 11:05:16 GMT</pubDate>
    <dc:creator>roshanbi</dc:creator>
    <dc:date>2021-06-26T11:05:16Z</dc:date>
    <item>
      <title>RDD questions</title>
      <link>https://community.cloudera.com/t5/Support-Questions/RDD-questions/m-p/319276#M227751</link>
      <description>&lt;P&gt;Hello Team,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am working the tutorial on RDD.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am having some difficulties understanding some commands.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can you please advise what steps 3-8 do?&lt;/P&gt;&lt;P&gt;. Encode the Schema in a string&lt;/P&gt;&lt;PRE&gt;val schemaString = "name age"&lt;/PRE&gt;&lt;P&gt;4. Generate the schema based on the string of schema&lt;/P&gt;&lt;PRE&gt;val fields = schemaString.split(" ").map(fieldName =&amp;gt; StructField(fieldName, StringType, nullable = true))

val schema = StructType(fields)&lt;/PRE&gt;&lt;P&gt;5. Convert records of the RDD (people) to Rows&lt;/P&gt;&lt;PRE&gt;val rowRDD = peopleRDD.map(_.split(",")).map(attributes =&amp;gt; Row(attributes(0), attributes(1).trim))&lt;/PRE&gt;&lt;P&gt;6. Apply the schema to the RDD&lt;/P&gt;&lt;PRE&gt;val peopleDF = spark.createDataFrame(rowRDD, schema)&lt;/PRE&gt;&lt;P&gt;6. Creates a temporary view using the DataFrame&lt;/P&gt;&lt;PRE&gt;peopleDF.createOrReplaceTempView("people")&lt;/PRE&gt;&lt;P&gt;7. SQL can be run over a temporary view created using DataFrames&lt;/P&gt;&lt;PRE&gt;val results = spark.sql("SELECT name FROM people")&lt;/PRE&gt;&lt;P&gt;8.The results of SQL queries are DataFrames and support all the normal RDD operations. The columns of a row in the result can be accessed by field index or by field name&lt;/P&gt;&lt;PRE&gt;results.map(attributes =&amp;gt; "Name: " + attributes(0)).show()&lt;/PRE&gt;&lt;P&gt;&lt;A href="https://www.cloudera.com/tutorials/dataframe-and-dataset-examples-in-spark-repl.html" target="_blank" rel="noopener"&gt;https://www.cloudera.com/tutorials/dataframe-and-dataset-examples-in-spark-repl.html&lt;/A&gt;&lt;/P&gt;&lt;H3&gt;&lt;A href="https://www.cloudera.com/tutorials/dataframe-and-dataset-examples-in-spark-repl.html#programmatically-specifying-schema" target="_blank" rel="noopener"&gt;Programmatically Specifying Schema&lt;/A&gt;&lt;/H3&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What does the code below do?&lt;/P&gt;&lt;PRE&gt;val ds = Seq(1, 2, 3).toDS()&lt;/PRE&gt;&lt;PRE&gt;val ds = Seq(Person("Andy", 32)).toDS()&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Section&lt;/PRE&gt;&lt;P&gt;Section DataSet API is clear. If we need to map the JSON file to a&amp;nbsp; class we use the as(class name).&lt;/P&gt;&lt;P&gt;So to map a file to a class we use the ".as[Classname]"?&lt;/P&gt;&lt;P&gt;what does this command do?&lt;/P&gt;&lt;PRE&gt;val ds = Seq(1, 2, 3).toDS()&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;Thanks,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Roshan&lt;/P&gt;</description>
      <pubDate>Thu, 24 Jun 2021 14:05:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/RDD-questions/m-p/319276#M227751</guid>
      <dc:creator>roshanbi</dc:creator>
      <dc:date>2021-06-24T14:05:04Z</dc:date>
    </item>
    <item>
      <title>Re: RDD questions</title>
      <link>https://community.cloudera.com/t5/Support-Questions/RDD-questions/m-p/319360#M227773</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/81707"&gt;@roshanbi&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;val ds = Seq(1, 2, 3).toDS()&lt;/P&gt;&lt;P&gt;It will create sequence of number and later we are converting it into DataSet.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;There are multiple ways we can create dataset. Above one one way of creating Dataset.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If you are created a dataframe with case class and you want to convert it into dataset you can use &lt;STRONG&gt;dataframe.as[Classname]&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Here you can find different ways of creating dataset.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;A href="https://www.educba.com/spark-dataset/" target="_blank"&gt;https://www.educba.com/spark-dataset/&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Please let me know is there any doubts.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please Accept as Solution once you satisfied with above answer.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 25 Jun 2021 23:28:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/RDD-questions/m-p/319360#M227773</guid>
      <dc:creator>RangaReddy</dc:creator>
      <dc:date>2021-06-25T23:28:50Z</dc:date>
    </item>
    <item>
      <title>Re: RDD questions</title>
      <link>https://community.cloudera.com/t5/Support-Questions/RDD-questions/m-p/319363#M227775</link>
      <description>&lt;P&gt;Thanks for the update.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;scala&amp;gt; val myRDD=spark.read.textFile("/devsh_loudacre/frostroad.txt")&lt;BR /&gt;myRDD: org.apache.spark.sql.Dataset[String] = [value: string]&lt;/P&gt;&lt;P&gt;why does myRDD.parallelize not working for above?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;scala&amp;gt; val myRDD1=sc.parallelize(myRDD)&lt;BR /&gt;&amp;lt;console&amp;gt;:26: error: type mismatch;&lt;BR /&gt;found : org.apache.spark.sql.Dataset[String]&lt;BR /&gt;required: Seq[?]&lt;BR /&gt;Error occurred in an application involving default arguments.&lt;BR /&gt;val myRDD1=sc.parallelize(myRDD)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Does the above mean a dataset has been created?&lt;/P&gt;&lt;P&gt;what is the difference between the above and below?&lt;/P&gt;&lt;P&gt;val myRDD2=sc.textFile("/devsh_loudacre/frostroad.txt")&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;can I add the .parallelize function with the above command?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Roshan&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 26 Jun 2021 11:05:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/RDD-questions/m-p/319363#M227775</guid>
      <dc:creator>roshanbi</dc:creator>
      <dc:date>2021-06-26T11:05:16Z</dc:date>
    </item>
    <item>
      <title>Re: RDD questions</title>
      <link>https://community.cloudera.com/t5/Support-Questions/RDD-questions/m-p/319382#M227790</link>
      <description>&lt;P&gt;What does the code below do?&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;val conf = new SparkConf().setMaster("local").setAppName("testApp")&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;val sc= SparkContext.getOrCreate(conf)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Reference:&amp;nbsp;&lt;A href="https://www.educba.com/spark-rdd-operations/" target="_blank"&gt;https://www.educba.com/spark-rdd-operations/&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 27 Jun 2021 10:51:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/RDD-questions/m-p/319382#M227790</guid>
      <dc:creator>roshanbi</dc:creator>
      <dc:date>2021-06-27T10:51:22Z</dc:date>
    </item>
    <item>
      <title>Re: RDD questions</title>
      <link>https://community.cloudera.com/t5/Support-Questions/RDD-questions/m-p/319392#M227793</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/81707"&gt;@roshanbi&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please find the difference:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;SPAN&gt;val &lt;/SPAN&gt;&lt;SPAN&gt;textFileDF &lt;/SPAN&gt;: Dataset[&lt;SPAN&gt;String&lt;/SPAN&gt;] = spark.read.textFile(&lt;SPAN&gt;"/path"&lt;/SPAN&gt;)      // returns Dataset object&lt;BR /&gt;&lt;SPAN&gt;val &lt;/SPAN&gt;&lt;SPAN&gt;textFileRDD &lt;/SPAN&gt;: RDD[&lt;SPAN&gt;String&lt;/SPAN&gt;] = spark.sparkContext.textFile(&lt;SPAN&gt;"/path"&lt;/SPAN&gt;) // returns RDD object&lt;/PRE&gt;&lt;P&gt;If you are satisfied, please Accept as Solution.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 27 Jun 2021 16:06:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/RDD-questions/m-p/319392#M227793</guid>
      <dc:creator>RangaReddy</dc:creator>
      <dc:date>2021-06-27T16:06:56Z</dc:date>
    </item>
  </channel>
</rss>

