Member since
02-24-2016
11
Posts
9
Kudos Received
0
Solutions
03-24-2016
06:03 PM
1 Kudo
As Simon mentioned RDDs don't have schema attached. DataFrame (conceptually similar to a DB Table) do have an attached schema (column name, column type etc) and you can quite easily filter on a column . You can also create a DF from RDD and then go about filter. See http://hortonworks.com/hadoop-tutorial/a-lap-around-apache-spark/ section about programmatically specifying schema (that attaches schema to RDD to get a DataFrame) and see the section Additional DataFrame API Example to see a DF filter example.
... View more
02-24-2016
11:00 AM
1 Kudo
Thanks !!!
... View more
02-24-2016
09:00 AM
Not really. You mean as a persisted storage layer under hibernate and ejbs correct? Hive wouldn't work well for this since it's not an oltp database. It is a wareshoue.. So that would leave hbase most likely with Apache Phoenix. I Googled it a bit and focused on hibernate because that seems to be the most popular recently and did not find a connector for Phoenix. Doesn't mean it's not possible to write one. Googled a bit more and there is Hibernate OGM for NoSQL stores as well. Unfortunately it currently does not support HBase. http://hibernate.org/ogm/ So the two possibilities would be to write an extension for OGM for HBase or rewrite a connector for Apache Phoenix. I wrote one for Netezza a while back and it should not be terribly difficult, although the Phoenix syntax has some differences to standard SQL ( UPSERT instead of INSERT ... )
... View more