Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

sbt scala compilation error

Master Collaborator

I am getting the following error during compilation, also below are the build.sbt file and the source code. :

error

[info] Done updating.
[info] Compiling 4 Scala sources to /root/weblogs/target/scala-2.11/classes...
[error] /root/weblogs/src/main/scala/LogSQL.scala:60: value createOrReplaceTempView is not a member of org.apache.spark.sql.DataFrame
[error]       requestsDataFrame.createOrReplaceTempView("requests")
[error]                         ^
[error] one error found
[error] (compile:compile) Compilation failed

scala code

[root@hadoop1 scala]# more LogSQL.scala
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext, Time}
import org.apache.spark.storage.StorageLevel
import org.apache.spark.sql.SQLContext
import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext
import java.util.regex.Pattern
import java.util.regex.Matcher
import Utilities._
/** Illustrates using SparkSQL with Spark Streaming, to issue queries on
 *  Apache log data extracted from a stream on port 9999.
 */
object LogSQL {
  def main(args: Array[String]) {
    // Create the context with a 1 second batch size
    val ssc = new StreamingContext("local[*]", "LogSQL", Seconds(1))
    setupLogging()
    // Construct a regular expression (regex) to extract fields from raw Apache log lines
    val pattern = apacheLogPattern()
    // Create a socket stream to read log data published via netcat on port 9999 locally
    val lines = ssc.socketTextStream("127.0.0.1", 9998, StorageLevel.MEMORY_AND_DISK_SER)
    // Extract the (URL, status, user agent) we want from each log line
    val requests = lines.map(x => {
      val matcher:Matcher = pattern.matcher(x)
      if (matcher.matches()) {
        val request = matcher.group(5)
        val requestFields = request.toString().split(" ")
        val url = util.Try(requestFields(1)) getOrElse "[error]"
        (url, matcher.group(6).toInt, matcher.group(9))
      } else {
        ("error", 0, "error")
      }
    })
    // Process each RDD from each batch as it comes in
    requests.foreachRDD((rdd, time) => {
      // So we'll demonstrate using SparkSQL in order to query each RDD
      // using SQL queries.
      // Get the singleton instance of SQLContext
      val sqlContext = SQLContextSingleton.getInstance(rdd.sparkContext)
      import sqlContext.implicits._
      // SparkSQL can automatically create DataFrames from Scala "case classes".
      // We created the Record case class for this purpose.
      // So we'll convert each RDD of tuple data into an RDD of "Record"
      // objects, which in turn we can convert to a DataFrame using toDF()
      val requestsDataFrame = rdd.map(w => Record(w._1, w._2, w._3)).toDF()
      // Create a SQL table from this DataFrame
      requestsDataFrame.createOrReplaceTempView("requests")
      // Count up occurrences of each user agent in this RDD and print the results.
      // The powerful thing is that you can do any SQL you want here!
      // But remember it's only querying the data in this RDD, from this batch.
      val wordCountsDataFrame =
        sqlContext.sql("select agent, count(*) as total from requests group by agent")
      println(s"========= $time =========")
      wordCountsDataFrame.show()
      // If you want to dump data into an external database instead, check out the
      // org.apache.spark.sql.DataFrameWriter class! It can write dataframes via
      // jdbc and many other formats! You can use the "append" save mode to keep
      // adding data from each batch.
    })
    // Kick it off
    ssc.checkpoint("C:/checkpoint/")
    ssc.start()
    ssc.awaitTermination()
  }
}
/** Case class for converting RDD to DataFrame */
case class Record(url: String, status: Int, agent: String)
/** Lazily instantiated singleton instance of SQLContext
 *  (Straight from included examples in Spark)  */
object SQLContextSingleton {
  @transient  private var instance: SQLContext = _
  def getInstance(sparkContext: SparkContext): SQLContext = {
    if (instance == null) {
      instance = new SQLContext(sparkContext)
    }
    instance
  }
}

build.sbt

[root@hadoop1 weblogs]# more build.sbt
name := "weblogs"
version := "1.0"
scalaVersion := "2.11.6"
resolvers ++= Seq(
"Apache HBase" at "http://repository.apache.org/content/repositories/releases",
"Typesafe repository" at "http://repo.typesafe.com/typesafe/releases/"
)
libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "1.6.2",
  "org.apache.spark" %% "spark-streaming" % "1.6.2",
  "org.apache.spark" %% "spark-sql" % "1.6.2",
  "org.apache.spark" %% "spark-mllib" % "1.6.2"
)

1 ACCEPTED SOLUTION

Master Collaborator

I fixed this issue by upgrading from SPARK 1.6.2 to SPARK2.0. I actually upgraded my HDC2.4 cluster to HDC2.5.

View solution in original post

6 REPLIES 6

@Sami Ahmad

Is the following missing from your import statements?

import org.apache.spark.sql.DataFrame

Master Collaborator

I added this line but still getting the same error ?

[info] Compiling 4 Scala sources to /root/weblogs/target/scala-2.11/classes...
[error] /root/weblogs/src/main/scala/LogSQL.scala:60: value createOrReplaceTempView is not a member of org.apache.spark.sql.DataFrame
[error]  requestsDataFrame.createOrReplaceTempView("requests")
[error]  ^
[error] one error found
[error] (compile:compile) Compilation failed
[error] Total time: 14 s, completed Sep 27, 2016 12:04:22 PM

Master Collaborator

can anyone compile the attached scala code please?

Master Collaborator

I found two solutions to the similar problem I am facing on web , do they apply to my code ?

1- Import implicits:

Note that this should be done only after an instance of org.apache.spark.sql.SQLContext is created. It should be written as:

val sqlContext= new org.apache.spark.sql.SQLContext(sc) import sqlContext.implicits._

2- Move case class outside of the method:

case class, by use of which you define the schema of the DataFrame, should be defined outside of the method needing it. You can read more about it here:

https://issues.scala-lang.org/browse/SI-6649

Master Collaborator

I fixed this issue by upgrading from SPARK 1.6.2 to SPARK2.0. I actually upgraded my HDC2.4 cluster to HDC2.5.

@Sami Ahmad

The upgrade addressed it, but I guess that we still don't know the cause.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.