Member since
09-21-2015
133
Posts
130
Kudos Received
24
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4871 | 12-17-2016 09:21 PM | |
2922 | 11-01-2016 02:28 PM | |
1191 | 09-23-2016 09:50 PM | |
2068 | 09-21-2016 03:08 AM | |
1136 | 09-19-2016 06:41 PM |
06-13-2016
01:52 PM
Do I just use '/' as the directory separator?
... View more
06-13-2016
09:05 AM
Labels:
- Labels:
-
Apache NiFi
04-25-2016
03:54 PM
Great solution to schema inference, @Simon Elliston Ball, but I still have the question about Spark and/or other YARN job launching from NiFi
... View more
04-21-2016
03:14 AM
5 Kudos
@Sunile Manjee have a look at the phoenix query server. It's a beta feature in HDP, but is installable via Ambari. When you pick which nodes should be clients, datanodes, nodemanagers, etc, you can check the Phoenix Query Server box.
... View more
04-20-2016
05:17 PM
1 Kudo
Labels:
- Labels:
-
Apache Spark
04-14-2016
11:40 PM
I see that MergeContent can merge multiple flowfiles into one with specified size or flowfile count semantics. Is there a processor that does this based on elapsed time instead?
... View more
Labels:
- Labels:
-
Apache NiFi
04-14-2016
11:03 PM
this looks interesting, but would require an already running spark-application and ability to communicate with the correct hadoop worker-node, which doesn't seem straight-forward. Your idea did make me think about YARN RM's REST API, so have an upvote. Still want to see if there's a more straightforward suggestion, so will leave the q open.
... View more
04-14-2016
10:48 PM
2 Kudos
I'm aware of ExecuteProcess, which could invoke spark-submit, but I'm not running NiFi on an HDP node. I receive lots of arbitrary CSV and JSON files that I don't have pre-existing tables for. Instead of trying to script DDL creation inside NiFi, it would be nice to invoke a Spark job that infers schema and creates tables from data already loaded to HDFS.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache NiFi
-
Apache Spark
04-13-2016
04:02 AM
2 Kudos
@Divya Gehlot- as @Sunile Manjee noted, HBase is an indexed lookup system which can also perform scans. This makes you think a bit about your data access/query patterns before you can create an optimal table design. In general, you want to design your rowkeys around your access patterns. Ensure your highest order rowkey bits can always be known to your application at HBase read-time, else your access will be a full-scan instead of a range scan. Users of the raw HBase API often find themselves performing logic in their application code instead of server-side within HBase's RegionServer processes. A simple, but powerful way to avoid both writing large amounts of client application code and pulling significant chunks of data back, consider using Apache Phoenix on top of HBase. It makes it easy to perform a more selective HBase query via SQL query language, which also: 1. Lends itself more naturally to thinking about how data is laid out in your tables 2. Lets you define secondary indices on the data your queries access regardless of whether your application knows a specific rowkey (or range) it needs to access.
... View more
04-11-2016
03:33 PM
1 Kudo
Hi @nejm hadjmbarek, can you post the error you're seeing? From the above, it looks like you've forgotten to include the ZooKeeper znode for HBase: "/hbase-unsecure". Try with the following connection string instead: "jdbc:phoenix:195.154.55.93:2181:/hbase-unsecure". If you need a simple simple java application (with maven pomfile) that embeds and uses the Phoenix JDBC driver, take a look here.
... View more