- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
RDD questions
- Labels:
-
Apache Spark
Created ‎06-24-2021 07:05 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Team,
I am working the tutorial on RDD.
I am having some difficulties understanding some commands.
Can you please advise what steps 3-8 do?
. Encode the Schema in a string
val schemaString = "name age"
4. Generate the schema based on the string of schema
val fields = schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, nullable = true)) val schema = StructType(fields)
5. Convert records of the RDD (people) to Rows
val rowRDD = peopleRDD.map(_.split(",")).map(attributes => Row(attributes(0), attributes(1).trim))
6. Apply the schema to the RDD
val peopleDF = spark.createDataFrame(rowRDD, schema)
6. Creates a temporary view using the DataFrame
peopleDF.createOrReplaceTempView("people")
7. SQL can be run over a temporary view created using DataFrames
val results = spark.sql("SELECT name FROM people")
8.The results of SQL queries are DataFrames and support all the normal RDD operations. The columns of a row in the result can be accessed by field index or by field name
results.map(attributes => "Name: " + attributes(0)).show()
https://www.cloudera.com/tutorials/dataframe-and-dataset-examples-in-spark-repl.html
Programmatically Specifying Schema
What does the code below do?
val ds = Seq(1, 2, 3).toDS()
val ds = Seq(Person("Andy", 32)).toDS()
Section
Section DataSet API is clear. If we need to map the JSON file to a class we use the as(class name).
So to map a file to a class we use the ".as[Classname]"?
what does this command do?
val ds = Seq(1, 2, 3).toDS()
Thanks,
Roshan
Created on ‎06-27-2021 09:04 AM - edited ‎06-27-2021 09:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @roshanbi
Please find the difference:
val textFileDF : Dataset[String] = spark.read.textFile("/path") // returns Dataset object
val textFileRDD : RDD[String] = spark.sparkContext.textFile("/path") // returns RDD object
If you are satisfied, please Accept as Solution.
Created ‎06-25-2021 04:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @roshanbi
val ds = Seq(1, 2, 3).toDS()
It will create sequence of number and later we are converting it into DataSet.
There are multiple ways we can create dataset. Above one one way of creating Dataset.
If you are created a dataframe with case class and you want to convert it into dataset you can use dataframe.as[Classname]
Here you can find different ways of creating dataset.
https://www.educba.com/spark-dataset/
Please let me know is there any doubts.
Please Accept as Solution once you satisfied with above answer.
Created ‎06-26-2021 04:05 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the update.
scala> val myRDD=spark.read.textFile("/devsh_loudacre/frostroad.txt")
myRDD: org.apache.spark.sql.Dataset[String] = [value: string]
why does myRDD.parallelize not working for above?
scala> val myRDD1=sc.parallelize(myRDD)
<console>:26: error: type mismatch;
found : org.apache.spark.sql.Dataset[String]
required: Seq[?]
Error occurred in an application involving default arguments.
val myRDD1=sc.parallelize(myRDD)
Does the above mean a dataset has been created?
what is the difference between the above and below?
val myRDD2=sc.textFile("/devsh_loudacre/frostroad.txt")
can I add the .parallelize function with the above command?
Thanks,
Roshan
Created ‎06-27-2021 03:51 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What does the code below do?
val conf = new SparkConf().setMaster("local").setAppName("testApp")
val sc= SparkContext.getOrCreate(conf)
Reference: https://www.educba.com/spark-rdd-operations/
Created on ‎06-27-2021 09:04 AM - edited ‎06-27-2021 09:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @roshanbi
Please find the difference:
val textFileDF : Dataset[String] = spark.read.textFile("/path") // returns Dataset object
val textFileRDD : RDD[String] = spark.sparkContext.textFile("/path") // returns RDD object
If you are satisfied, please Accept as Solution.
