Member since
09-29-2015
871
Posts
723
Kudos Received
255
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3353 | 12-03-2018 02:26 PM | |
2308 | 10-16-2018 01:37 PM | |
3628 | 10-03-2018 06:34 PM | |
2397 | 09-05-2018 07:44 PM | |
1819 | 09-05-2018 07:31 PM |
06-02-2016
03:18 PM
1 Kudo
Can you retry all these tests and during the second cat, instead of "cat 'record02' ", cat something longer like "cat 'record123456789'". I'd like to see if tracking the file size is the issue, because record01 and record02 would be the same file size.
... View more
05-25-2016
12:52 PM
This most likely means there is another JAR you need to add... If you look at the pom file for the hadoop-azure JAR: http://central.maven.org/maven2/org/apache/hadoop/hadoop-azure/2.7.0/hadoop-azure-2.7.0.pom You can see all the dependencies it needs: <dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-storage</artifactId>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<scope>compile</scope>
</dependency>
My guess would be the azure-storage JAR is missing. This becomes a slippery slope though, because then azure-storage might have transitive dependencies as well.
... View more
05-25-2016
12:47 PM
1 Kudo
Generally all processors are executing within a single OS process started by a single user. The only case I can think of where one processor could execute at a higher level would be when using the ExecuteProcess/ExcuteStreamCommand processors... the command can be "sudo" and the args can be the command to execute. This assumes the user that started NiFi has sudo privileges.
... View more
05-24-2016
01:58 PM
2 Kudos
The DistributedMapCache is a NiFi concept which is used to store information for later retrieval, either by the current processor by another processor. There are two components - the DistributedMapCacheServer which runs on one node if you are in a cluster, and the DistributedMapCacheClientService which runs on all nodes if in a cluster, and communicates with the server. Both of these are Controller Services, configured in NiFi through the controller section in the top right toolbar. Processors use the client service to store and retrieve data from the cache server. In this case, DetectDuplicate uses the cache to store information about what it has seen and determine if it is a duplicate.
... View more
05-20-2016
05:05 PM
1 Kudo
I'm not totally sure if this is the problem, but given that NiFi has NARs with isolated class loading, adding something to the classpath usually isn't as simple as dropping it in the lib directory. The hadoop libraries NAR would be unpacked to this location: work/nar/extensions/nifi-hadoop-libraries-nar-<VERSION>.nar-unpacked/META-INF/bundled-dependencies/ You could trying putting the hadoop-azure.jar there, keeping in mind that if the work directory was removed, NiFi would unpack the original NAR again without your added jar. Some have had success creating a custom version of the hadoop libraries NAR to switch to other libraries: https://github.com/bbukacek/nifi-hadoop-libraries-bundle Right now Apache NiFi is based on Apache Hadoop 2.6.2.
... View more
05-19-2016
11:46 AM
I'm not 100% sure how LZO works, but in a lot of cases the codec ends up needing a native library. On a unix system you would set LD_LIBRARY_PATH to include the location of the .so files for the LZO codec, or put them in JAVA_HOME/jre/lib native directory. You could do something like: export LD_LIBRARY_PATH=/usr/hdp/2.2.0.0-1084/hadoop/lib/native
bin/nifi.sh start
That should let PutHDFS know about the appropriate libraries.
... View more
05-18-2016
08:11 PM
1 Kudo
Have you seen the Kerberos section of the NiFi admin guide? https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#kerberos_login_identity_provide
... View more
05-18-2016
01:19 PM
Can you try what Matt suggested above, to remove the "io.compression.codecs" from core-site.xml? I agree with him that this is likely related to the compression codecs, you can see in the stacktracke, the relevant lines are: org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2058) ~[na:na] at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:128) ~[na:na] at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:175) ~[na:na] at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.getCompressionCodec(AbstractHadoopProcessor.java:375) ~[na:na] at org.apache.nifi.processors.hadoop.PutHDFS.onTrigger(PutHDFS.java:220) ~[na:na] at
... View more
05-18-2016
01:14 AM
1 Kudo
@bschofield Another idea for transferring large files over a high-latency network, might be the following... On the sending side use a SegmentContent processor to break a large FlowFile into many smaller segments, followed by a PostHTTP processor with the Concurrent Tasks increased higher than 1. This lets the sending side better utilize the network by concurrently sending segments. On the receiving side, use a ListenHTTP processor to received the segmented FlowFiles, followed by a MergeContent processor with a Merge Strategy of Defragment. The Defragment mode will merge all the segments back together to recreate the original FlowFile.
... View more
05-17-2016
05:01 PM
What version of NiFi is this?
... View more