Member since
06-29-2016
13
Posts
8
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
289 | 07-07-2016 04:34 PM | |
205 | 07-07-2016 02:27 PM |
05-12-2017
02:29 PM
Correct; the correlation would happen in the merge process, and then written out, although you may be able to use both if your batch sizes are going to be pretty large.
... View more
05-12-2017
12:53 PM
Without seeing the full data flow, my initial thought would be to try and use a 'merge content' process, and use a variant of your timestamp attribute as the correlation attribute. All flow files with the same correlation attribute should be grouped together; then just write the resulting set of merged flow files out to HDFS.
... View more
02-27-2017
07:58 PM
1 Kudo
Here's a working example using separate signature files in the filesystem (e.g. 'filename.tgz' is the file and 'filename.sig' is a file containing the hash), and storing these in a distributed map cache. verify-sha256-hash.xml
... View more
02-27-2017
05:34 PM
1 Kudo
I believe NiFi uses the quartz scheduler to handle the cron scheduling. Syntax here, but according to the examples listed the following 3 expressions should do what you want (one of them is the one you used above): 0 30 1 ? * *
0 30 1 * * ?
0 30 1 * * ? * Try the first expression (using '?' for the day of month field) and see if that works.
... View more
02-27-2017
05:19 PM
Looks like a permissions issue with the encrypt-config tool. I'm guessing that you are running the script as a user who doesn't have the appropriate permissions to write/modify the /usr/hdf/current/nifi/conf directory (see here for an overview of the encrypt-config tool). A few things to try: 1) Take a look at the /usr/hdf/current/nifi/conf directory, and make sure this directory can be written to by the user you're running the tool as (ambari? make sure that user is in the appropriate group to read/write this directory). 2) Try running the encrypt-config.sh tool on its own (see the link above; do this on its own and not thru ambari) and see if you run into the same issue, and see if this gets you further in diagnosing the issue.
... View more
01-11-2017
05:17 PM
Have you taken a look at the 'DetectDuplicate' processor? It will cache a value based on flow file attributes, and will route to a separate output ('Duplicate') if the value has been seen previously. There is an ability to set a time interval to age off cached flowfile values, which seems to be what you're trying to do.
... View more
01-11-2017
05:10 PM
Did you add the http proxy to the ambari-env.sh script as recommended here (different from just setting the http_proxy environment variable): https://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.1/bk_ambari_reference_guide/content/_how_to_set_up_an_internet_proxy_server_for_ambari.html It appears that ambari cannot reach the repo locations since no http proxy is set is my guess.
... View more
12-19-2016
08:32 PM
I did something almost identical to what you want to do using the 'MergeContent' processor. I have a bunch of filenames that I'm retrieving using a 'ListFile' processor, and then after they're processed, sticking each set of file attributes into their own flow file using an 'AttributeToJSON' processor. Finally, I use the MergeContent processor to merge all these separate flow files into one big flow file using 'binary concatenation' (using a CRLF as a delimiter, so each filename is on its own line), which I can use as a single list of all the files that were processed. For your use case, I imagine that you'd do almost the exact same thing, using something like AttributeToJSON to put the filename attributes into individual flowfiles (after the original files were successfully written using the putSFTP processor, of course), and then merge them into a single flow file which can then be emailed (if you don't need the flow file formatted as JSON, you can throw an 'EvaluateJSONPath' processor in there to clean up the flow file to your specific needs).
... View more
08-10-2016
06:00 PM
Are you following the sandbox installation instructions from here? Because the steps you're describing above don't seem to be correct. Did you download the sandbox from here? Once the sandbox boots up, you should see a console with the login instructions, which should be pointing to your loopback address (127.0.0.1), and not 'sandbox', which doesn't sound like what you're trying to do. Specifically, you should be trying to ssh into the sandbox using the command: ssh root@127.0.0.1 -p2222
or using using a web browser: http://127.0.0.1:8888
... View more
08-09-2016
03:22 PM
1 Kudo
The error you're getting (ImportError: No module named rpm) is usually due to not having rpm-python installed. See the thread here for instructions on how to check and see if it is installed, and how to force a re-installation if necessary.
... View more
07-08-2016
05:19 PM
1 Kudo
You should be able to do this using subqueries (caveat: I haven't tried this directly in Hive, but it works in Postgres): select tbl1.id,
tbl2.age_count,
tbl3.rating_count,
from users as tbl1,
(select i1.key, count(i1.age) age_count from age as i1 group by i1.key) as tbl2,
(select i2.id, count(i2.rating) rating_count from rating as i2 group by i2.id) as tbl2
where tbl1.id = tbl2.key
and tbl1.id = tbl3.id
... View more
07-07-2016
04:34 PM
1 Kudo
Take a look at this thread and see if the answer is relevant.
... View more
07-07-2016
02:27 PM
3 Kudos
Are you logged into the sandbox as "maria_dev" or "admin"? You need to be logged in as "admin" to start and stop services. Follow the instructions here on how to configure the initial admin password for the sandbox. Once you have re-set the password and logged in to Ambari as admin, you should now see a 'service actions' pull-down in the upper right hand corner, and under that should be an option to start the service.
... View more