Member since
09-03-2015
50
Posts
8
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2909 | 09-12-2017 07:24 PM |
11-03-2017
04:17 PM
Thanks very much. I see now whats going on. I tried both of your suggestions and seem to work well
... View more
09-12-2017
07:24 PM
Figured out that I had to use dataset to make sure checkpointing works....
... View more
08-02-2017
05:29 AM
Doesn't seem like streaming data directly to HDFS will make it very easy to find/aggregate at the end of each window? What about creating a key/value store (with reddis, hbase, or elasticSearch for example) and using it to lookup all the keys associated with each window.
... View more
06-09-2017
10:57 PM
1 Kudo
I wrote about this in my Spark Structured Streaming blog here: https://www.linkedin.com/pulse/spark-21-structured-streaming-databricks-laurent-weichberger See this sample: val query = inactive.writeStream
.format("parquet")
.option("path", "/com/infotrellis/spark")
.option("checkpointLocation", "/com/infotrellis/check")
.start()
query.awaitTermination()
... View more
11-03-2017
07:42 PM
Hi @Greg Keys, Thanks for the post. Row filtering works based on the column values which is not in the end. But I am not sure how to filter the rows based on the last column value. Can you please let me know. Thanks
... View more
10-01-2016
10:04 PM
Apache Nifi is more feature-rich, battle tested and servers many purposes. Simply, it has bidirectional flow whereas Flume only moves data to HDFS. There's also visual UI for real time command and control as opposed to Flume with only configuration property files to deal with. If you are in the beginning stages, do yourself a favor and go with Nifi.
... View more
09-02-2016
09:19 PM
Also what is the need to run Hive queries on SparkSql when Hive on Tez can run much faster....
... View more