Member since
05-12-2017
19
Posts
0
Kudos Received
0
Solutions
08-11-2017
11:25 AM
This worked for me. Running the startup script also succeeded. Thanks!
... View more
08-07-2017
10:30 PM
I have the same issue, using Windows 10 with Docker Version 17.06.0-ce-win19 (12801)
Channel: stable I ran the following command: docker load -i "HDF_3.0_docker_12_6_2017.tar.gz" After some time a similar error is shown: open /var/lib/docker/tmp/docker-import-807691524/apps/json: no such file or directory Have you found a solution to this?
... View more
06-23-2017
01:15 PM
Thanks for the quick input - my question got resolved by the link you added. @Matt Clarke writes: "A move operation typically does not update the timestamp, but a copy will.". For my system on Ubuntu 14.04, I need to change the content of the files (i.e. with gedit). When I solely copy the file in a different folder, rename it, copy with ctrl+c and paste it into the input directory with ctrl+v, the ListFile processor will not list the added file. It works when the file is changed. I guess this behavior should only occur when testing the processor with the same static file like I did...
... View more
06-23-2017
12:38 PM
I have set up a simple flow in NiFi 1.1.2:
ListFile -> (success) -> FetchFile ListFile reads /vagrant/testData folder. Running on primary NiFi node only. I have not changed any different settings than these two (directory to observe, execution scheduling) FetchFile moves the file to /vagrant/testData2. That's all. To reproduce the bug(?):
Start the flow, do not have any files in the input directory Put a file into input directory File will be moved as expected -- this is correct behavior so far Move another file into the directory (different file name) You can wait 5 minutes, but the file will not be found by the ListFile processor anymore Stop ListFile processor Change a setting, for example "Recurse Subdirectories" to false Start ListFile processor again The second file you put into the input directory will now be moved to output directory Put a third file into the input folder --> nothing will happen again. Is this a known issue? How to fix it?
... View more
Labels:
- Labels:
-
Apache NiFi
06-23-2017
09:25 AM
One more thing that I just read here: "Note: that would be an ideal case in terms of balancing but, for
efficiency purpose, the Site-to-Site mechanism might send batch of flow
files to the remote node. In the above example, with only 3 flow files, I
would probably not end up with one flow file per node." Is this a problem when I need the entire content of my files present in one flowfile? For example, I have JSON formatted files that will be converted to JSON by using the NiFi ConvertJSONToAvro processor. When the JSON file gets splitted by the Site-to-Site mechanism, I would get more than one output Avro file for each JSON input file, right? Is it possible to merge the content to one single avro file again? For example, with the MergeContent processor. For more information: I might need the entire Avro file in one big file to process it with a Python script. The python script will export the Avro file to another scientific format. Thanks again!
... View more
06-23-2017
09:02 AM
Two more questions came to my mind: Is it better to put my input files to HDFS first, as shown in your link, instead of a traditional shared network drive? Can I use Site-To-Site transfer in the following flow? Network latency might be an issue here. Is this option not viable at all? GetFile on primary NiFi cluster node -> receives new files on the primary node RPG to input ports -> pushes data flowfiles from the primary node to all other nodes in the cluster After that, proceed with my original data flow as the data is now distributed in the NiFi cluster Thanks Bryan!
... View more
06-21-2017
08:49 AM
I have a NiFi cluster and want to process files that I store in a directory on one node to be distributed in the cluster. To keep the question short:
Input files: 20ish files are coming in a batch each time The data flow works fine when I run it on one node (not distributed) The output will be stored on HDFS, so that's not a problem. Just that my input files will be distributed evenly to the NiFi nodes so the processing will be fast.
How can I have it that the number of input files get split somewhat
evenly on the NiFi nodes and each runs the entire flow for each file one
by one? What I mean is that one of my processors need the entire file to be present on the node not just parts of the file. Should I upload all my input files to the primary node? How do files get distributed? A downside of this would be that all files need to be distributed in between nodes (this is doable though, the files are not that big). Another idea would be to store the input files in a specific directory on each node in the cluster. But then the "import" script that puts the files would need to know all nodes in the cluster and know about downtimes of the nodes as well... Thanks in advance
... View more
Labels:
- Labels:
-
Apache NiFi
06-20-2017
11:14 AM
This would be the perfect solution for our cluster setup. How stable is this?
... View more
06-20-2017
09:34 AM
We are running NiFi instances "next to" Hortonworks HDP. This means we manually installed NiFi on our cluster nodes and let them connect to our ZooKeeper. What are the benefits of installing HDF next to HDP on our nodes when we only use NiFi and no other components such as MiNiFi or Kafka at the moment we use NiFi to process incoming files (files the get stored in a specific folder), transform these files, put them into HDFS, create logs about the transformation process, finish data flow after the transformed files are stored in HDFS they get analyzed in some way Should we consider using HDF instead of "plain" NiFi? Is it okay to run NiFi instances next to the HDP, without using HDF? Which problems can we run into? Thanks!
... View more
Labels:
06-08-2017
02:56 PM
Thanks for this in-depth answer, it resolved all my questions. Loving this community so far! For anyone running into the same problem, I just stumbled upon the following link which explains the use of ExecuteStreamCommand with examples: https://pierrevillard.com/2016/03/09/transform-data-with-apache-nifi/
... View more