Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Access streaming RDD or DStream via beeline

Access streaming RDD or DStream via beeline

Master Collaborator

Hi,

 is it possible to acccess the cumulative RDD or DStream in a simple streaming application via jdbc or beeline client?

For example in this simple code (copied form stackoverflow) I would like to access via beeline or jdbc (BI tool) and report the cumulative count of words - stateDstream ( I would like to access it via SQL  registerTempTable)

 

// Create the context with a 1 second batch size
    val ssc = new StreamingContext(sparkConf, Seconds(1))
    ssc.checkpoint(".")

    // Create a NetworkInputDStream on target ip:port and count the
    // words in input stream of \n delimited test (eg. generated by 'nc')
    val lines = ssc.socketTextStream(args(0), args(1).toInt)
    val words = lines.flatMap(_.split(" "))
    val wordDstream = words.map(x => (x, 1))

    // Update the cumulative count using updateStateByKey
    // This will give a Dstream made of state (which is the cumulative count of the words)
    val stateDstream = wordDstream.updateStateByKey[Int](updateFunc)
    stateDstream.print()
    ssc.start()
    ssc.awaitTermination()

 

Thanks

 

 

 

1 REPLY 1

Re: Access streaming RDD or DStream via beeline

Champion
Spark does have a thrift server that can be ran to access data in Spark applications via JDBC. It is not supported by Cloudera though.

Datastax has documentation around it.
Don't have an account?
Coming from Hortonworks? Activate your account here