Member since
09-03-2015
50
Posts
8
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2909 | 09-12-2017 07:24 PM |
07-27-2017
04:56 PM
Hi, I was just wondering if it is ok to perform window operations on dstreams with 1 week as window length. Please let me know if there are any major concerns. Thanks
... View more
Labels:
- Labels:
-
Apache Spark
06-06-2017
09:59 PM
Hi I am getting error "Queries with streaming sources must be executed with writeStream.start();" while running the code shown below. Any help will be greatly appreciated. package ca.twitter2
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerConfig, ProducerRecord}
import kafka.serializer.StringDecoder
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.SparkConf
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.log4j._
import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.kafka.common.serialization.StringDeserializer
//import org.apache.spark.streaming.kafka010._
//import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
//import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe
//import kafka.serializer.StringDecoder
import java.util.HashMap
import org.apache.spark.SparkConf
import org.apache.spark._
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.sql._
import org.apache.spark.sql.kafka010
//import com.datastax.spark.connector._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.SaveMode
import org.apache.spark.sql.functions.{explode, split}
object kafkatest3 {
def main(args: Array[String]) {
val spark = SparkSession
.builder()
.appName("kafkatest3")
.master("local[*]")
.getOrCreate()
val topics = Array("twitter")
val ds1 = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "siaihdf1a.coc.ca:6667")
.option("subscribe", "twitter")
.option("startingOffsets", "earliest")
.load()
import spark.implicits._
val df = ds1.selectExpr("CAST(key AS STRING)", "CAST( value AS STRING)").as[(String, String)]
ds1.printSchema()
df.createOrReplaceTempView("df");
val records = spark.sql ("SELECT count(*) from df GROUP BY key")
records.show()
val query = records.writeStream.outputMode("complete").format("console").start()
query.awaitTermination()
spark.stop()
}
}
Thanks, CHandra
... View more
Labels:
- Labels:
-
Apache Kafka
-
Apache Spark
10-04-2016
07:50 PM
1 Kudo
Hi, I am in the process of doing a protype of IOT in our organisation and in the process of charting out the architecture. It would be really appreciated if someone could help me choose the stream processor - storm or spark streaming. Not sure which one I should go about. Basically we are planning to record sensor events from fleet and we are ok with ocassional message loss. Also we prefer something which is easy to implement. Not sure which one is easier to implement as well. We are also planning to utilize the lambda architecture..one for batch and the other one for real time information. Thanks
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache Storm
09-29-2016
07:34 PM
Hi, We are planning to do a prototype of Hadoop Log Analysis and not sure which Data ingestion tool should we select - Nifi or Flume. Can anyone suggest me which one we select and why - pros and cons Thanks, Chandra
... View more
Labels:
- Labels:
-
Apache Flume
-
Apache NiFi
09-02-2016
09:19 PM
Also what is the need to run Hive queries on SparkSql when Hive on Tez can run much faster....
... View more
09-02-2016
08:16 PM
Thanks for your valuable information. So your recommendation is to go for Hive on LLAP rather than SparkSQL. Please correct me if I am wrong.
... View more
09-02-2016
07:44 PM
1 Kudo
Hi, Can you please let me know which one is faster -Hive on Tez or accessing Hive using Spark SQL. Thanks, Chandra
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
06-15-2016
06:01 PM
Thanks for your answer. If we have multiple directories, will the hdfs files be stored multiple times in those directories? sorry I am a newbie hence need to get this clarified.
... View more