About rgelhausen

rgelhausen · ‎09-30-2016

yes

rgelhausen · ‎09-27-2016

He filters all cells by timestamp here. If I understand correctly, without the filter, the DataFrame would expose all cell versions that exist in the snapshot.

rgelhausen · ‎09-27-2016

@Artem Ervits See @Dan Zaratsian's examples reading cell versions and timestamps from a snapshot here.

rgelhausen · ‎09-23-2016

@Sunile Manjee I don't have stats, but you need to use Phoenix Bulk Load regardless, as HBase Bulk Load will not ensure consistent secondary indices, nor will it use the correct signing and byte ordering conventions that Phoenix needs.

rgelhausen · ‎09-23-2016

Livy is not "hidden". If you have started the Livy server, you can interact with its REST API from any application.

rgelhausen · ‎09-22-2016

*Edit* Realistically, questions about shared SparkContexts are often about 1. Making shared use of cached DataFrames/DataSets Livy and the Spark Thrift JDBC/ODBC server are decent initial solutions. Keep an eye on Spark-LLAP integration which will be better all around (security, efficiency, etc.) 2. Problems with Spark applications consuming all of a cluster's resources. Spark's ability to spin up and spin down executor instances dynamically based on utilization is probably a better solution to this problem than sharing a single spark context.

rgelhausen · ‎09-22-2016

@Sunile Manjee Consider how Spark applications run: a driver runs either on the client, or in a YARN container. If multiple users will ask the same Spark application instance to do multiple things, they need an interface to communicate that to the Driver. Livy is the out of the box REST interface that shares a single Spark application by presenting the control interface to external users. If you do not want to use Livy, but still want to share a Spark context, you need to build an external means of communicating with the shared Driver. One solution might be to have the driver periodically pull new queries from a database or from files on disk. This functionality is not builtin to Spark, but could be implemented with a while loop and a sleep statement.

rgelhausen · ‎09-21-2016

@hduraiswamy - in order of preference SyncSort Use the mainframe’s native JDBC services – often unacceptable as the mainframe must consume additional MIPS to convert into JDBC types before sending over the net Use this open serde which unfortunately skips reading everything except fixed length fields, severely limiting usefulness I've heard about LegStar being used for similar projects, but am not sure how.

rgelhausen · ‎09-19-2016

@Andrew Watson You can set cell level acls via the HBase shell or via HBase's Java API. This type of policy is not exposed or controlled via Ranger. If possible, I would implement row level policies in a client-side application, as HBase's cell ACLs are expensive (additional metadata must be stored and read with every cell). My favorite solution is to create a Phoenix View that exposes only specific rows. As noted above, your client-side app would have to decide whether to allow access to a given view.

rgelhausen · ‎09-14-2016

@Carlos Barichello Livy isn't hidden. If you've started Livy, you can use its REST API to launch Spark jobs from Zeppelin or from elsewhere.

Online	Offline
Last Visited	‎01-23-2018 02:10 AM

Member Since	‎09-21-2015 08:50 PM
Last Visited	‎01-23-2018 02:10 AM
Posts	133
Kudos received	123

Cloudera Community

Re: Phoenix table design

Re: How to determine whether a hive script fails?

Re: Performance metrics phoenix bulk load vs hbase...

Re: What is recommended way of moving mainframe da...

Re: HBase Row Level Filtering

Re: Is sharing spark RDD or context a supported in...

Re: hbase snapshot accessing older version of cell

Re: hbase snapshot accessing older version of cell

Re: Performance metrics phoenix bulk load vs hbase...

Re: Is sharing spark RDD or context a supported in...

Re: Is sharing spark RDD or context a supported in...

Re: Is sharing spark RDD or context a supported in...

Re: What is recommended way of moving mainframe da...

Re: HBase Row Level Filtering

Re: REST API for starting Spark jobs?