Support Questions

Saeed.Barghi · ‎07-21-2015

Hi all,

I am trying to create a DataFrame of a text file which gives me error: "value toDF is not a member of org.apache.spark.rdd.RDD"

The only solution I can find online is to import SQLContext.implicits._ which in trun throws "not found: value SQLContext"

I googled this new error but couldn't find anything. The funny part is that the piece of code I am using works in Spark-Shell, but fails when I try to build it using sbt package

I am suing Cloudera's QuickStart VM and My Spark Version is 1.3.0 and my Scala Version: 2.10.4 .

Any help is highly appreciated,

Cheers.

Here comes my piece of code:

import...........

import SQLContext.implicits._

...

class Class_1() extends Runnable {
val conf = new SparkConf().setAppName("TestApp")
val sc = new SparkContext(conf)

val sqlContext= new org.apache.spark.sql.SQLContext(sc)
var fDimCustomer = sc.textFile("DimCustomer.txt")

def loadData(fileName:String) {

fDimCustomer = sc.textFile("DimCustomer.txt")

case class DimC(ID:Int, Name:String)
var dimCustomer1 = fDimCustomer.map(_.split(',')).map(r=>DimC(r(0).toInt,r(1))).toDF
dimCustomer1.registerTempTable("Cust_1")

val customers = sqlContext.sql("select * from Cust_1")
customers.show()

}

......

Saeed.Barghi · ‎07-23-2015

Ok, I finally fixed the issue. 2 things needed to be done:

1- Import implicits:

Note that this should be done only after an instance of org.apache.spark.sql.SQLContext is created. It should be written as:

val sqlContext= new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._

2- Move case class outside of the method:

case class, by use of which you define the schema of the DataFrame, should be defined outside of the method needing it. You can read more about it here:

https://issues.scala-lang.org/browse/SI-6649

Cheers.

View solution in original post

Tong · ‎09-04-2015

Can you show me how you write case class to define schema and how to use it in your method? Thanks so much

sumeet89 · ‎10-28-2015

Hi,

Can you shae your program.

I am getting one single error mentioned below:-

[info] Compiling 1 Scala source to /home/sumeet/SimpleSparkProject/target/scala-2.11/classes...
[error] /home/sumeet/SimpleSparkProject/src/main/scala/SimpleApp.scala:16: value toDF is not a member of org.apache.spark.rdd.RDD[Auction]
[error] val auction = ebay.toDF()
[error] ^

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.sql._
object SimpleApp {
def main(args: Array[String]) {
val sc = new SparkContext("local", "Simple App", "/usr/local/spark-1.4.0-incubating",
List("target/scala-2.10/simple-project_2.10-1.0.jar"))
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val ebayText = sc.textFile("/home/sumeet/Desktop/useful huge sample data/ebay.csv")
ebayText.first()
case class Auction(auctionid: String, bid: Float, bidtime: Float, bidder: String, bidderrate: Integer, openbid: Float, price: Float)
val ebay = ebayText.map(_.split(",")).map(p => Auction(p(0),p(1).toFloat,p(2).toFloat,p(3),p(4).toInt,p(5).toFloat,p(6).toFloat))
ebay.first()
ebay.count()
val auction = ebay.toDF()
auction.show()
}
}

AshwiniPatil · ‎01-15-2017

Hi, Thank You! It resolved the similar issue that I was facing. However, coulc you please share your knowledge on why is this done? And what exactly implicit does in this case. Reply appreciated. Sorry for reopening this post.

abhideoria23 · ‎10-26-2017

package org.example.textclassification

import org.apache.predictionio.controller.P2LAlgorithm
import org.apache.predictionio.controller.Params

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.rdd.RDD
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.UserDefinedFunction

import grizzled.slf4j.Logger

case class LRAlgorithmParams(regParam: Double) extends Params

class LRAlgorithm(val ap: LRAlgorithmParams)
extends P2LAlgorithm[PreparedData, LRModel, Query, PredictedResult] {

@transient lazy val logger = Logger[this.type]

def train(sc: SparkContext, pd: PreparedData): LRModel = {

// Import SQLContext for creating DataFrame.
val sql: SQLContext = new SQLContext(sc)
import sql.implicits._

val lr = new LogisticRegression()
.setMaxIter(10)
.setThreshold(0.5)
.setRegParam(ap.regParam)

val labels: Seq[Double] = pd.categoryMap.keys.toSeq

val data = labels.foldLeft(pd.transformedData.toDF)( //transform to Spark DataFrame
// Add the different binary columns for each label.
(data: DataFrame, label: Double) => {
// function: multiclass labels --> binary labels
val f: UserDefinedFunction = functions.udf((e : Double) => if (e == label) 1.0 else 0.0)

data.withColumn(label.toInt.toString, f(data("label")))
}
ubuntu@ip-172-20-9-118:/spark/tracxn/predictionio/classification/isCompany$ cat src/main/scala/LRAlgorithm.scala
package org.example.textclassification

import org.apache.predictionio.controller.P2LAlgorithm
import org.apache.predictionio.controller.Params

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.rdd.RDD
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.UserDefinedFunction

import grizzled.slf4j.Logger

case class LRAlgorithmParams(regParam: Double) extends Params

class LRAlgorithm(val ap: LRAlgorithmParams)
extends P2LAlgorithm[PreparedData, LRModel, Query, PredictedResult] {

@transient lazy val logger = Logger[this.type]

def train(sc: SparkContext, pd: PreparedData): LRModel = {

// Import SQLContext for creating DataFrame.
val sql: SQLContext = new SQLContext(sc)
import sql.implicits._

val lr = new LogisticRegression()
.setMaxIter(10)
.setThreshold(0.5)
.setRegParam(ap.regParam)

val labels: Seq[Double] = pd.categoryMap.keys.toSeq

val data = labels.foldLeft(pd.transformedData.toDF)( //transform to Spark DataFrame
// Add the different binary columns for each label.
(data: DataFrame, label: Double) => {
// function: multiclass labels --> binary labels
val f: UserDefinedFunction = functions.udf((e : Double) => if (e == label) 1.0 else 0.0)

data.withColumn(label.toInt.toString, f(data("label")))
}
)

// Create a logistic regression model for each class.
val lrModels : Seq[(Double, LREstimate)] = labels.map(
label => {
val lab = label.toInt.toString

val fit = lr.setLabelCol(lab).fit(
data.select(lab, "features")
)

// Return (label, feature coefficients, and intercept term.
(label, LREstimate(fit.weights.toArray, fit.intercept))

}
)

new LRModel(
tfIdf = pd.tfIdf,
categoryMap = pd.categoryMap,
lrModels = lrModels
)
}

def predict(model: LRModel, query: Query): PredictedResult = {
model.predict(query.text)
}
}

case class LREstimate (
coefficients : Array[Double],
intercept : Double
)

class LRModel(
val tfIdf: TFIDFModel,
val categoryMap: Map[Double, String],
val lrModels: Seq[(Double, LREstimate)]) extends Serializable {

/** Enable vector inner product for prediction. */
private def innerProduct (x : Array[Double], y : Array[Double]) : Double = {
x.zip(y).map(e => e._1 * e._2).sum
}

/** Define prediction rule. */
def predict(text: String): PredictedResult = {
val x: Array[Double] = tfIdf.transform(text).toArray

// Logistic Regression binary formula for positive probability.
// According to MLLib documentation, class labeled 0 is used as pivot.
// Thus, we are using:
// log(p1/p0) = log(p1/(1 - p1)) = b0 + xTb =: z
// p1 = exp(z) * (1 - p1)
// p1 * (1 + exp(z)) = exp(z)
// p1 = exp(z)/(1 + exp(z))
val pred = lrModels.map(
e => {
val z = scala.math.exp(innerProduct(e._2.coefficients, x) + e._2.intercept)
(e._1, z / (1 + z))
}
).maxBy(_._2)

PredictedResult(categoryMap(pred._1), pred._2)
}

override def toString = s"LR model"
}

Getting same error in my code . Can you help me how to fix it

raviverma · ‎03-25-2021

Import implicit

where sc=

val sc = SparkSession
  .builder()
  .appName("demo")
  .master("local")
  .getOrCreate()

import sc.implicits._

Cloudera Community

Support Questions

Spark/Scala Error: value toDF is not a member of org.apache.spark.rdd.RDD