Support Questions

aliyesami · ‎09-23-2016

I am getting the following error during compilation, also below are the build.sbt file and the source code. :

error

[info] Done updating.
[info] Compiling 4 Scala sources to /root/weblogs/target/scala-2.11/classes...
[error] /root/weblogs/src/main/scala/LogSQL.scala:60: value createOrReplaceTempView is not a member of org.apache.spark.sql.DataFrame
[error]       requestsDataFrame.createOrReplaceTempView("requests")
[error]                         ^
[error] one error found
[error] (compile:compile) Compilation failed

scala code

[root@hadoop1 scala]# more LogSQL.scala
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext, Time}
import org.apache.spark.storage.StorageLevel
import org.apache.spark.sql.SQLContext
import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext
import java.util.regex.Pattern
import java.util.regex.Matcher
import Utilities._
/** Illustrates using SparkSQL with Spark Streaming, to issue queries on
 *  Apache log data extracted from a stream on port 9999.
 */
object LogSQL {
  def main(args: Array[String]) {
    // Create the context with a 1 second batch size
    val ssc = new StreamingContext("local[*]", "LogSQL", Seconds(1))
    setupLogging()
    // Construct a regular expression (regex) to extract fields from raw Apache log lines
    val pattern = apacheLogPattern()
    // Create a socket stream to read log data published via netcat on port 9999 locally
    val lines = ssc.socketTextStream("127.0.0.1", 9998, StorageLevel.MEMORY_AND_DISK_SER)
    // Extract the (URL, status, user agent) we want from each log line
    val requests = lines.map(x => {
      val matcher:Matcher = pattern.matcher(x)
      if (matcher.matches()) {
        val request = matcher.group(5)
        val requestFields = request.toString().split(" ")
        val url = util.Try(requestFields(1)) getOrElse "[error]"
        (url, matcher.group(6).toInt, matcher.group(9))
      } else {
        ("error", 0, "error")
      }
    })
    // Process each RDD from each batch as it comes in
    requests.foreachRDD((rdd, time) => {
      // So we'll demonstrate using SparkSQL in order to query each RDD
      // using SQL queries.
      // Get the singleton instance of SQLContext
      val sqlContext = SQLContextSingleton.getInstance(rdd.sparkContext)
      import sqlContext.implicits._
      // SparkSQL can automatically create DataFrames from Scala "case classes".
      // We created the Record case class for this purpose.
      // So we'll convert each RDD of tuple data into an RDD of "Record"
      // objects, which in turn we can convert to a DataFrame using toDF()
      val requestsDataFrame = rdd.map(w => Record(w._1, w._2, w._3)).toDF()
      // Create a SQL table from this DataFrame
      requestsDataFrame.createOrReplaceTempView("requests")
      // Count up occurrences of each user agent in this RDD and print the results.
      // The powerful thing is that you can do any SQL you want here!
      // But remember it's only querying the data in this RDD, from this batch.
      val wordCountsDataFrame =
        sqlContext.sql("select agent, count(*) as total from requests group by agent")
      println(s"========= $time =========")
      wordCountsDataFrame.show()
      // If you want to dump data into an external database instead, check out the
      // org.apache.spark.sql.DataFrameWriter class! It can write dataframes via
      // jdbc and many other formats! You can use the "append" save mode to keep
      // adding data from each batch.
    })
    // Kick it off
    ssc.checkpoint("C:/checkpoint/")
    ssc.start()
    ssc.awaitTermination()
  }
}
/** Case class for converting RDD to DataFrame */
case class Record(url: String, status: Int, agent: String)
/** Lazily instantiated singleton instance of SQLContext
 *  (Straight from included examples in Spark)  */
object SQLContextSingleton {
  @transient  private var instance: SQLContext = _
  def getInstance(sparkContext: SparkContext): SQLContext = {
    if (instance == null) {
      instance = new SQLContext(sparkContext)
    }
    instance
  }
}

build.sbt

[root@hadoop1 weblogs]# more build.sbt
name := "weblogs"
version := "1.0"
scalaVersion := "2.11.6"
resolvers ++= Seq(
"Apache HBase" at "http://repository.apache.org/content/repositories/releases",
"Typesafe repository" at "http://repo.typesafe.com/typesafe/releases/"
)
libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "1.6.2",
  "org.apache.spark" %% "spark-streaming" % "1.6.2",
  "org.apache.spark" %% "spark-sql" % "1.6.2",
  "org.apache.spark" %% "spark-mllib" % "1.6.2"
)

aliyesami · ‎09-29-2016

I fixed this issue by upgrading from SPARK 1.6.2 to SPARK2.0. I actually upgraded my HDC2.4 cluster to HDC2.5.

View solution in original post

cstanca · ‎09-26-2016

@Sami Ahmad

Is the following missing from your import statements?

import org.apache.spark.sql.DataFrame

aliyesami · ‎09-27-2016

I added this line but still getting the same error ?

[info] Compiling 4 Scala sources to /root/weblogs/target/scala-2.11/classes...
[error] /root/weblogs/src/main/scala/LogSQL.scala:60: value createOrReplaceTempView is not a member of org.apache.spark.sql.DataFrame
[error]  requestsDataFrame.createOrReplaceTempView("requests")
[error]  ^
[error] one error found
[error] (compile:compile) Compilation failed
[error] Total time: 14 s, completed Sep 27, 2016 12:04:22 PM

aliyesami · ‎09-27-2016

can anyone compile the attached scala code please?

aliyesami · ‎09-27-2016

I found two solutions to the similar problem I am facing on web , do they apply to my code ?

1- Import implicits:

Note that this should be done only after an instance of org.apache.spark.sql.SQLContext is created. It should be written as:

val sqlContext= new org.apache.spark.sql.SQLContext(sc) import sqlContext.implicits._

2- Move case class outside of the method:

case class, by use of which you define the schema of the DataFrame, should be defined outside of the method needing it. You can read more about it here:

https://issues.scala-lang.org/browse/SI-6649

aliyesami · ‎09-29-2016

I fixed this issue by upgrading from SPARK 1.6.2 to SPARK2.0. I actually upgraded my HDC2.4 cluster to HDC2.5.

cstanca · ‎09-30-2016

@Sami Ahmad

The upgrade addressed it, but I guess that we still don't know the cause.

Cloudera Community

Support Questions

sbt scala compilation error

sbt compilation error

Introduction to CDE Scala Jobs

Creating fat jars for Spark Kafka Streaming using ...

Kafka Producer sample code in Scala and Python

Hiveserver2 Hung issues in Compilation Phase

Receiving AVRO Messages through KAFKA in a Spark S...

Dr Elephant - Build / Compile Source Code

error compiling hbase java code

How to create a Hive UDF in Scala

Spark 2.3 Structured Streaming Integration with Ap...