Member since
12-14-2015
8
Posts
6
Kudos Received
0
Solutions
04-29-2016
05:22 PM
Its a good question, assuming the source of entropy is good the chances of a duplicate are essentially 0 ( randomUUID has 2^122 permutations which is roughly the number of atoms in the universe ) There are other ways too however, I assume there are some ready made solutions out there but how about using some old fashioned MapReduce: Just one way: Assuming you could create all the UUIDs in one go and you had the data stored in a delimited format, you could create a unique key based on the long offset provided for each line by Textinputformat. TextInputFormat provides lines of text together with a long offset ( bytes from the start using the split offsets ), so you could just add this to a starting number ( for example have a batchid that is steadily increased ) and create a unique number that way. There are definitely other ways to do it too. For example going through a MapReduce jobid + taskid + rowinsplitid.
... View more
01-18-2016
05:26 PM
@Raghavendran Chellappa the release notes for HDP are the source of truth, Jira is external to Hortonworks. If Release notes say we support it, then that's the way to go. What is it that you find not working? We have Kafka 0.8.2.0 stable not beta. Kafka 0.9 also works. We usually backport critical features. Also, have you tried Nifi, it has all the latest Kafka support including Kerberos.
... View more
01-07-2016
04:10 PM
Actually, many BI vendors including Tableau have announced a Spark Connector over JDBC, which should presumably be able to leverage data loaded into RDD's in memory. If you load data via Spark Streaming into RDD, then either schematize it (rdd.registerTempTable) or convert to DataFrame (rdd.toDF), you should be able to query that data from a JDBC connection and display in dashboard. Here is info on Tableau connector, including a video at bottom of page: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&so...
... View more