About rgelhausen

rgelhausen · ‎06-13-2016

Do I just use '/' as the directory separator?

rgelhausen · ‎06-13-2016

rgelhausen · ‎04-25-2016

Great solution to schema inference, @Simon Elliston Ball, but I still have the question about Spark and/or other YARN job launching from NiFi

rgelhausen · ‎04-21-2016

@Sunile Manjee have a look at the phoenix query server. It's a beta feature in HDP, but is installable via Ambari. When you pick which nodes should be clients, datanodes, nodemanagers, etc, you can check the Phoenix Query Server box.

rgelhausen · ‎04-20-2016

rgelhausen · ‎04-14-2016

I see that MergeContent can merge multiple flowfiles into one with specified size or flowfile count semantics. Is there a processor that does this based on elapsed time instead?

rgelhausen · ‎04-14-2016

this looks interesting, but would require an already running spark-application and ability to communicate with the correct hadoop worker-node, which doesn't seem straight-forward. Your idea did make me think about YARN RM's REST API, so have an upvote. Still want to see if there's a more straightforward suggestion, so will leave the q open.

rgelhausen · ‎04-14-2016

I'm aware of ExecuteProcess, which could invoke spark-submit, but I'm not running NiFi on an HDP node. I receive lots of arbitrary CSV and JSON files that I don't have pre-existing tables for. Instead of trying to script DDL creation inside NiFi, it would be nice to invoke a Spark job that infers schema and creates tables from data already loaded to HDFS.

rgelhausen · ‎04-13-2016

@Divya Gehlot- as @Sunile Manjee noted, HBase is an indexed lookup system which can also perform scans. This makes you think a bit about your data access/query patterns before you can create an optimal table design. In general, you want to design your rowkeys around your access patterns. Ensure your highest order rowkey bits can always be known to your application at HBase read-time, else your access will be a full-scan instead of a range scan. Users of the raw HBase API often find themselves performing logic in their application code instead of server-side within HBase's RegionServer processes. A simple, but powerful way to avoid both writing large amounts of client application code and pulling significant chunks of data back, consider using Apache Phoenix on top of HBase. It makes it easy to perform a more selective HBase query via SQL query language, which also: 1. Lends itself more naturally to thinking about how data is laid out in your tables 2. Lets you define secondary indices on the data your queries access regardless of whether your application knows a specific rowkey (or range) it needs to access.

rgelhausen · ‎04-11-2016

Hi @nejm hadjmbarek, can you post the error you're seeing? From the above, it looks like you've forgotten to include the ZooKeeper znode for HBase: "/hbase-unsecure". Try with the following connection string instead: "jdbc:phoenix:195.154.55.93:2181:/hbase-unsecure". If you need a simple simple java application (with maven pomfile) that embeds and uses the Phoenix JDBC driver, take a look here.

Online	Offline
Last Visited	‎01-23-2018 02:10 AM

Member Since	‎09-21-2015 08:50 PM
Last Visited	‎01-23-2018 02:10 AM
Posts	133
Kudos received	123

Cloudera Community

Re: Phoenix table design

Re: How to determine whether a hive script fails?

Re: Performance metrics phoenix bulk load vs hbase...

Re: What is recommended way of moving mainframe da...

Re: HBase Row Level Filtering

Re: How can I specify S3 bucket folders/paths with...

How can I specify S3 bucket folders/paths with Put...

Re: Can I use NiFi to launch Spark (or other YARN)...

Re: does phoenix have a thrift or restful api?

Is there a way to get time-based ticks/triggers in...

Are there processors that can share/collect flowfi...

Re: Can I use NiFi to launch Spark (or other YARN)...

Can I use NiFi to launch Spark (or other YARN) job...

Re: HBase Scan slow after inserting million reords...

Re: Not able to connect phoenix via java jdbc