Member since
09-21-2015
133
Posts
130
Kudos Received
24
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4871 | 12-17-2016 09:21 PM | |
2922 | 11-01-2016 02:28 PM | |
1191 | 09-23-2016 09:50 PM | |
2068 | 09-21-2016 03:08 AM | |
1136 | 09-19-2016 06:41 PM |
04-11-2016
01:22 AM
@jfrazee if I match on multiple entries in the dictionary, will this processor emit one FlowFile for every entry match? A single flowfile with attributes of all the matched entries?
... View more
04-10-2016
10:55 PM
I have flatfiles of metadata (with updates every few minutes throughout the day). I have another stream that I need to join to this metadata in real-time. I know I can accomplish this in Storm or Spark Streaming with some code. Can NiFi help me do this without writing code? For example, I have a list of malicious websites (the metadata), and I'm streaming in http requests.. I need to join the domains on those requests with the list of malicious websites and emit an alert if there are match(es). Slightly more complex version of the same requirement.. how would I incorporate regular updates to the metadata?
... View more
Labels:
- Labels:
-
Apache NiFi
04-07-2016
07:39 PM
2 Kudos
Can I add a secondary index to a dynamic column defined as part of a view?
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Phoenix
04-06-2016
08:47 PM
@Jeremy Dyer -- updated the question with additional items - any comments on those?
... View more
04-06-2016
08:15 PM
2 Kudos
How do upserts of new records impact the number of pre-split regions? How do updates of existing records impact the number of pre-split regions?
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Phoenix
04-04-2016
03:05 AM
Thanks, manually specifying the full path worked. Is there a way to set a default directory? I've tried "set hive.metastore.warehouse.dir=my_dir" but it had no effect.
... View more
04-02-2016
09:26 PM
2 Kudos
I'm using SparkSQL (local mode) in Zeppelin for development work. As I am not running on a cluster, I do not have /user/hive/warehouse directories. If I'm using strictly SQL, is there a way I can specify the directory of my newly created tables? How about setting the default output directories? Failing Example: %sql
create table pings as
select
split(time, " ")[0] as month,
split(time, " ")[2] as year,
split(split(time, " ")[3], ":")[0] as hour,
split(split(time, " ")[3], ":")[1] as minute,
split(split(split(time, " ")[3], ":")[2], "\\.")[0] as second,
substr(split(split(split(time, " ")[3], ":")[2], "\\.")[1],0, 3) as ms,
*
from pings_raw
MetaException(message:file:/user/hive/warehouse/pings is not a directory or unable to create one)
set zeppelin.spark.sql.stacktrace = true to see full stacktrace
... View more
Labels:
- Labels:
-
Apache Spark
03-30-2016
07:12 PM
5 Kudos
What factors should inform whether I use NiFi or Sqoop for ingesting my data?
... View more
Labels:
- Labels:
-
Apache NiFi
-
Apache Sqoop
03-21-2016
04:20 AM
2 Kudos
In case you haven't already created your Hive table, this will help you do so. Assuming your cluster is running in Linux VMs, Python is already installed. I can't comment on tf-idf, but the below should help you understand a generic approach to integrating Python functions and Hive queries. Once you have a Hive table, running Python scripts against records is straightforward. You'll need to ssh into your "edge" node, one that has the Hive CLI installed. To start it, type "hive" at the command prompt. hive> add file my_script.py; hive> select transform(col1, col2) as result1, result2 using 'my_script.py' from my_table; See the transform docs, but essentially this will run the equivalent of an MR streaming job against every record in my_table. Records are passed to your script delimited by newlines, and fields are delimited by tabs. Anything you print to standard out will be interpreted the same way (print one line per output record, fields separated by tabs). Here's an example that uses a Python script to validate datatypes. Feel free to ping back if you need additional help.
... View more
03-17-2016
10:08 PM
2 Kudos
Hi @Amar ch, I didn't time it, but it takes somewhere from 15-30 minutes to fully start the NiFi process. One of the first things NiFi does on startup is unpack all the NAR files in the lib directory to make them available as processors. If you want the service to start faster, you can remove NARs for processors you don't intend to use. Even once the service is started, I've found the NCM will be very slow. For this reason, I would plan on developing a template on your laptop and importing it into your RaspberryPi NiFi instance rather than trying to build out the flow directly on the pi. The stumbling block above aside, once the flow is defined and started, I have had 0 problems with it. Been monitoring WiFi traffic for several months without a blip, even after a few power outages. NiFi started back up and resumed working just fine. Edit: The number of standard NARs has grown since my comment in December, hence the increase in startup time from 10-30 minutes 😃
... View more