About aliyesami

aliyesami · ‎09-29-2016

also if you look into the message its saying "SPARK_MAJOR_VERSION is set to 2 "

aliyesami · ‎09-29-2016

hi lgeorge if you look at my message I am showing the variable set , so this is not the issue

aliyesami · ‎09-29-2016

I upgraded my HDP2.4 to HDP2.5 and it installed spark2 successfully and I see it also as green and no errors in the HDP console but when on command line I see the version its showing me as 1.6.2 still ? [root@hadoop5 ~]# spark-shell --version SPARK_MAJOR_VERSION is set to 2, using Spark2 Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.6.2 /_/ Type --help for more information. [root@hadoop5 ~]# echo $SPARK_MAJOR_VERSION 2

aliyesami · ‎09-28-2016

I am using HDP 2.4 , and spark 1.6.2 . I need to upgrade my spark to v2.0 , can it be done in HDP2.4 ? if not then can I install two releases of spark together on the same machine?

aliyesami · ‎09-27-2016

I found two solutions to the similar problem I am facing on web , do they apply to my code ? 1- Import implicits: Note that this should be done only after an instance of org.apache.spark.sql.SQLContext is created. It should be written as: val sqlContext= new org.apache.spark.sql.SQLContext(sc) import sqlContext.implicits._ 2- Move case class outside of the method: case class, by use of which you define the schema of the DataFrame, should be defined outside of the method needing it. You can read more about it here: https://issues.scala-lang.org/browse/SI-6649

aliyesami · ‎09-27-2016

can anyone compile the attached scala code please?

aliyesami · ‎09-27-2016

I added this line but still getting the same error ? [info] Compiling 4 Scala sources to /root/weblogs/target/scala-2.11/classes... [error] /root/weblogs/src/main/scala/LogSQL.scala:60: value createOrReplaceTempView is not a member of org.apache.spark.sql.DataFrame [error] requestsDataFrame.createOrReplaceTempView("requests") [error] ^ [error] one error found [error] (compile:compile) Compilation failed [error] Total time: 14 s, completed Sep 27, 2016 12:04:22 PM

aliyesami · ‎09-23-2016

I am getting the following error during compilation, also below are the build.sbt file and the source code. : error [info] Done updating. [info] Compiling 4 Scala sources to /root/weblogs/target/scala-2.11/classes... [error] /root/weblogs/src/main/scala/LogSQL.scala:60: value createOrReplaceTempView is not a member of org.apache.spark.sql.DataFrame [error] requestsDataFrame.createOrReplaceTempView("requests") [error] ^ [error] one error found [error] (compile:compile) Compilation failed scala code [root@hadoop1 scala]# more LogSQL.scala import org.apache.spark.SparkConf import org.apache.spark.streaming.{Seconds, StreamingContext, Time} import org.apache.spark.storage.StorageLevel import org.apache.spark.sql.SQLContext import org.apache.spark.rdd.RDD import org.apache.spark.SparkContext import java.util.regex.Pattern import java.util.regex.Matcher import Utilities._ /** Illustrates using SparkSQL with Spark Streaming, to issue queries on * Apache log data extracted from a stream on port 9999. */ object LogSQL { def main(args: Array[String]) { // Create the context with a 1 second batch size val ssc = new StreamingContext("local[*]", "LogSQL", Seconds(1)) setupLogging() // Construct a regular expression (regex) to extract fields from raw Apache log lines val pattern = apacheLogPattern() // Create a socket stream to read log data published via netcat on port 9999 locally val lines = ssc.socketTextStream("127.0.0.1", 9998, StorageLevel.MEMORY_AND_DISK_SER) // Extract the (URL, status, user agent) we want from each log line val requests = lines.map(x => { val matcher:Matcher = pattern.matcher(x) if (matcher.matches()) { val request = matcher.group(5) val requestFields = request.toString().split(" ") val url = util.Try(requestFields(1)) getOrElse "[error]" (url, matcher.group(6).toInt, matcher.group(9)) } else { ("error", 0, "error") } }) // Process each RDD from each batch as it comes in requests.foreachRDD((rdd, time) => { // So we'll demonstrate using SparkSQL in order to query each RDD // using SQL queries. // Get the singleton instance of SQLContext val sqlContext = SQLContextSingleton.getInstance(rdd.sparkContext) import sqlContext.implicits._ // SparkSQL can automatically create DataFrames from Scala "case classes". // We created the Record case class for this purpose. // So we'll convert each RDD of tuple data into an RDD of "Record" // objects, which in turn we can convert to a DataFrame using toDF() val requestsDataFrame = rdd.map(w => Record(w._1, w._2, w._3)).toDF() // Create a SQL table from this DataFrame requestsDataFrame.createOrReplaceTempView("requests") // Count up occurrences of each user agent in this RDD and print the results. // The powerful thing is that you can do any SQL you want here! // But remember it's only querying the data in this RDD, from this batch. val wordCountsDataFrame = sqlContext.sql("select agent, count(*) as total from requests group by agent") println(s"========= $time =========") wordCountsDataFrame.show() // If you want to dump data into an external database instead, check out the // org.apache.spark.sql.DataFrameWriter class! It can write dataframes via // jdbc and many other formats! You can use the "append" save mode to keep // adding data from each batch. }) // Kick it off ssc.checkpoint("C:/checkpoint/") ssc.start() ssc.awaitTermination() } } /** Case class for converting RDD to DataFrame */ case class Record(url: String, status: Int, agent: String) /** Lazily instantiated singleton instance of SQLContext * (Straight from included examples in Spark) */ object SQLContextSingleton { @transient private var instance: SQLContext = _ def getInstance(sparkContext: SparkContext): SQLContext = { if (instance == null) { instance = new SQLContext(sparkContext) } instance } } build.sbt [root@hadoop1 weblogs]# more build.sbt name := "weblogs" version := "1.0" scalaVersion := "2.11.6" resolvers ++= Seq( "Apache HBase" at "http://repository.apache.org/content/repositories/releases", "Typesafe repository" at "http://repo.typesafe.com/typesafe/releases/" ) libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % "1.6.2", "org.apache.spark" %% "spark-streaming" % "1.6.2", "org.apache.spark" %% "spark-sql" % "1.6.2", "org.apache.spark" %% "spark-mllib" % "1.6.2" )

aliyesami · ‎09-19-2016

I am getting this error no matter if I run my scala code via SBT or via spark-submit . I am on Scala 2.11.6 and spark version 1.6.2 how can I fix this error ?

aliyesami · ‎09-15-2016

this build.sbt fix the issue and now it compiles the package fine [root@hadoop1 TwitterPopularTags]# more build.sbt name := "TwitterPopularTags" version := "1.0" scalaVersion := "2.11.8" val sparkVersion = "1.6.1" libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % sparkVersion, "org.apache.spark" %% "spark-streaming" % sparkVersion, "org.apache.spark" %% "spark-streaming-twitter" % sparkVersion ) resolvers += "Akka Repository" at "http://repo.akka.io/releases/" [root@hadoop1 TwitterPopularTags]#

Online	Offline
Last Visited	‎11-03-2016 03:37 PM

Member Since	‎04-22-2016 08:38 AM
Last Visited	‎11-03-2016 03:37 PM
Posts	931
Kudos received	46

Cloudera Community

Re: NON-ANSI JOIN in hive

Re: insert query hangs from hive view ONLY

Re: cant start rest server

Re: hbase rest api failing

Re: how to read time reported by yarn

Re: spark2 not accessable

Re: spark2 not accessable

spark2 not accessable

apache spark 2.0

Re: sbt scala compilation error

Re: sbt scala compilation error

Re: sbt scala compilation error

sbt scala compilation error

java.io.IOException: No FileSystem for scheme: C

Re: sbt compilation error