About Chandra

Chandra · ‎07-27-2017

Hi, I was just wondering if it is ok to perform window operations on dstreams with 1 week as window length. Please let me know if there are any major concerns. Thanks

Chandra · ‎06-06-2017

Hi I am getting error "Queries with streaming sources must be executed with writeStream.start();" while running the code shown below. Any help will be greatly appreciated. package ca.twitter2 import org.apache.kafka.clients.producer.{KafkaProducer, ProducerConfig, ProducerRecord} import kafka.serializer.StringDecoder import org.apache.spark._ import org.apache.spark.streaming._ import org.apache.spark.streaming.StreamingContext._ import org.apache.spark.SparkConf import org.apache.spark.storage.StorageLevel import org.apache.spark.streaming.{Seconds, StreamingContext} import org.apache.log4j._ import org.apache.kafka.clients.consumer.ConsumerRecord import org.apache.kafka.common.serialization.StringDeserializer //import org.apache.spark.streaming.kafka010._ //import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent //import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe //import kafka.serializer.StringDecoder import java.util.HashMap import org.apache.spark.SparkConf import org.apache.spark._ import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.sql._ import org.apache.spark.sql.kafka010 //import com.datastax.spark.connector._ import org.apache.spark.sql.functions._ import org.apache.spark.sql.SaveMode import org.apache.spark.sql.functions.{explode, split} object kafkatest3 { def main(args: Array[String]) { val spark = SparkSession .builder() .appName("kafkatest3") .master("local[*]") .getOrCreate() val topics = Array("twitter") val ds1 = spark .readStream .format("kafka") .option("kafka.bootstrap.servers", "siaihdf1a.coc.ca:6667") .option("subscribe", "twitter") .option("startingOffsets", "earliest") .load() import spark.implicits._ val df = ds1.selectExpr("CAST(key AS STRING)", "CAST( value AS STRING)").as[(String, String)] ds1.printSchema() df.createOrReplaceTempView("df"); val records = spark.sql ("SELECT count(*) from df GROUP BY key") records.show() val query = records.writeStream.outputMode("complete").format("console").start() query.awaitTermination() spark.stop() } } Thanks, CHandra

Chandra · ‎10-05-2016

Thank you very much for your insight.

Chandra · ‎10-04-2016

Hi, I am in the process of doing a protype of IOT in our organisation and in the process of charting out the architecture. It would be really appreciated if someone could help me choose the stream processor - storm or spark streaming. Not sure which one I should go about. Basically we are planning to record sensor events from fleet and we are ok with ocassional message loss. Also we prefer something which is easy to implement. Not sure which one is easier to implement as well. We are also planning to utilize the lambda architecture..one for batch and the other one for real time information. Thanks

Chandra · ‎09-29-2016

Hi, We are planning to do a prototype of Hadoop Log Analysis and not sure which Data ingestion tool should we select - Nifi or Flume. Can anyone suggest me which one we select and why - pros and cons Thanks, Chandra

Chandra · ‎09-02-2016

Also what is the need to run Hive queries on SparkSql when Hive on Tez can run much faster....

Chandra · ‎09-02-2016

Thanks for your valuable information. So your recommendation is to go for Hive on LLAP rather than SparkSQL. Please correct me if I am wrong.

Chandra · ‎09-02-2016

Hi, Can you please let me know which one is faster -Hive on Tez or accessing Hive using Spark SQL. Thanks, Chandra

Chandra · ‎06-15-2016

thanks much

Chandra · ‎06-15-2016

Thanks for your answer. If we have multiple directories, will the hdfs files be stored multiple times in those directories? sorry I am a newbie hence need to get this clarified.

Online	Offline
Last Visited	‎09-27-2017 04:06 PM

Member Since	‎09-03-2015 08:31 AM
Last Visited	‎09-27-2017 04:06 PM
Posts	50
Kudos received	8

Cloudera Community

Re: Failing Checkpoint Spark Streaming

Window Operations on Spark Streaming

Error in Spark Streaming - Kafka integration Struc...

Re: Stream Processor Selection

Stream Processor Selection

Log Analysis - Data Ingestion

Re: HIve on Tez or HIve query using Spark SQL

Re: HIve on Tez or HIve query using Spark SQL

HIve on Tez or HIve query using Spark SQL

Re: Name Node and Data Node Directories

Re: Name Node and Data Node Directories