Member since
08-17-2016
45
Posts
21
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1161 | 09-05-2018 09:20 PM | |
1025 | 06-29-2017 06:50 PM | |
6569 | 02-28-2017 07:12 PM | |
1395 | 11-11-2016 01:57 AM |
10-16-2018
07:11 PM
@Lenu K
Are you using a keytab with PutHDFS? You could set the permission of the directory to which PutHDFS is writing to allow group/other read access so that Hive can read that directory, and also set the umask in the PutHDFS processor to write files that are readable by group/other.
... View more
10-11-2018
10:59 PM
@Daniel Niguse Going by the tags on this question, it looks like you may be using HDP 2.6.0 version of Hortonworks Sandbox? You'll want to check to see if docker is forwarding port 9090 to the container. The HDP 2.6.5 version of Hortonworks Sandbox looks like it forwards that port by default. Sandbox Port Forwards - HDP 2.6.5 I would also suggest making modifications to the NiFi configuration through Ambari, rather than directly modifying the configuration files. The "nifi-ambari-config" config-site contains the properties for the HTTP(S) ports.
... View more
10-08-2018
09:45 PM
@Zhen Zeng To create an HDF cluster with Cloudbreak, a KDC must be configured, unless you have registered an LDAP in Cloudbreak and select that when creating the cluster. During cluster creation, did you use test a KDC or an existing KDC? For configuring Cloudbreak to create a cluster that uses a KDC, please refer to the Enable Kerberos documentation for Cloudbreak 2.7. For complete instructions to create an HDF cluster with Cloudbreak, please refer to the Cloudbreak 2.7 documentation for Creating HDF Clusters.
... View more
09-06-2018
09:03 PM
I apologize that I couldn't think of a workaround, and that you'll have to set "Permission umask" for each processor. After NIFI-5575 is resolved, it will be included in a future HDF release, and you should be able to update your flow to remove the specific settings in each processor.
... View more
09-05-2018
09:33 PM
@Alaa
Nabil
From the information you've provided, it looks like PutHDFS should work. Without seeing your nifi-app.log, core-site.xml, and hdfs-site.xml files, I am not sure what is keeping PutHDFS from being able to write files to HDFS. Does this happen for every file sent to PutHDFS? You could run through the checklist in this StackOverflow post, as well.
... View more
09-05-2018
09:20 PM
@Kei
Miyauchi
With core-site.xml and hdfs-site.xml being provided in the "Hadoop Configuration Resources" property, that config is passed to the hadoop client that PutHDFS uses to send data to HDFS. However, in the code it looks like if the "Permissions umask" property is not set, then PutHDFS will use a default umask of "18", which is pulled from FsPermission.java from hadoop-common. Unfortunately, I don't think there's a workaround. The "Permissions umask" property doesn't support EL, so for now you would have to set the umask explicitly via the property. I created bug NIFI-5575 to track the issue.
... View more
09-04-2018
07:04 PM
@Alaa
Nabil A few questions for you: Are there 0-byte files in HDFS that correspond to the files you're trying to send with PutHDFS? This would mean that PutHDFS was able to create the file when contacting the namenode, but may not be able to reach the datanode. Are you able to use the HDFS command line client to send files to HDFS from the same node on which NiFi is running? Are you running the HDP cluster in a VM, or the Hortonworks Sandbox? There are ports need to be open on the hosts that are datanodes. Port 50010 may not be open, making the datanode unreachable by NiFi. You can see the default ports here: https://ambari.apache.org/1.2.3/installing-hadoop-using-ambari/content/reference_chap2_1.html
... View more
07-12-2018
06:07 PM
@Bob T Is the /usr/lib/hdinsight-datalake directory itself readable/executable by the user running NiFi? Without a specific FACL for the hdinsight-datalake directory, the user running NiFi needs to have read/execute permission on each dir in the path and read permission files in that dir to be able to access the JARs. I see the permissions on the JARs are wide open, but can you confirm read/execute on the directories?
... View more
07-12-2018
05:00 PM
@Bob T
I think HdlAdiFileSystem was renamed in the version of hadoop-azure-datalake-2.7.3.2.6.5.8-7.jar you are using. Try updating the fs.adl.impl and fs.AbstractFileSystem.adl.impl values in core-site.xml: <property>
<name>fs.adl.impl</name>
<value>org.apache.hadoop.fs.adl.AdlFileSystem</value>
</property>
<property>
<name>fs.AbstractFileSystem.adl.impl</name>
<value>org.apache.hadoop.fs.adl.Adl</value>
</property>
... View more
07-11-2018
09:01 PM
@Bob T
Could you please put stack traces inside of code blocks to make them a bit easier to read? It looks like you are still having classpath problems. Assuming that NiFi's lib directory is now restored to how it is from a "vanilla" install, I would check to make sure that you have the proper versions of the additional jars you're adding that work with Hadoop 2.7.3. That's the version of hadoop-client that is used by NiFi 1.5. It might help if you also (using code blocks) comment with a listing of the nifi/lib dir, the /usr/lib/hdinsight-datalake dir, and the contents of (or a link to) the xml files you've listed in "Hadoop Configuration Resources", sanitized of any information you don't want to post publicly. 🙂
... View more
07-11-2018
08:02 PM
@Bob T The link you included has instructions to put those jars in /usr/lib/hdinsight-datalake, and then in the processor configuration for FetchHDFS, set the property "Additional Classpath Resources" to "/usr/lib/hdinsight-datalake". You don't have to use those specific directories, but they must be in a directory that NiFi can read, and NiFi needs to have read permissions on each jar. Also, please remove the jars you added to NiFi's lib directory. Adding jars directly to NiFi's lib directory can break NiFi itself, or some of its components. It has to do with how classloaders are created so that different NiFi components can use the versions of dependencies that they need. If jars are placed directly in NiFi's lib directory, it may override a dependency for a component and cause it to fail. Could you please perform those steps and try running the flow again?
... View more
06-28-2018
06:33 PM
java.lang.NoClassDefFoundError: org/aoache/http/client/HttpClient The "o" in aoache looks pretty indicative of that error. Was this error copy/pasted from a log or was it retyped by hand?
... View more
05-29-2018
06:44 PM
1 Kudo
@vishal dutt Issues with HiveConnectionPool obtaining a new Kerberos ticket after the current one has expired should be resolved with the next maintenance release of HDF.
... View more
03-02-2018
06:19 PM
3 Kudos
@Mahendra Hegde It looks like your ZooKeeper server(s) may be down. Can you check the status of that service in Ambari? Also, if you can find the point where these errors are occurring in nifi-app.log on the NiFi node on which this is happening, there may be more information. Can you attach that log here so I can take a look at it?
... View more
02-22-2018
05:17 PM
@Tarun Kumar To follow up on your question, with the release of HDF 3.1, the issues with promptForName/stuck threads in Hadoop components should be resolved. In addition to the property that @jwitt mentioned (javax.security.auth.useSubjectCredsOnly=true), several code changes were made to how HDFS/HBase/Hive components in NiFi acquire a UGI.
... View more
12-19-2017
05:07 PM
@dhieru singh No worries. I wanted to let you know that we are actively working this issue and any debug logging or details you can provide to consistently reproduce the issue (NiFi settings, environment settings, JAAS configs, etc) will be helpful.
... View more
12-19-2017
02:51 PM
@dhieru singh The issue your running into is probably the same thing that @Tarun Kumar is experiencing. Please see my reply to that question [1] and follow the steps outlined to provide some more information on what's happening with your cluster. I am happy take a look at the information you provide. [1] https://community.hortonworks.com/questions/155101/nifi-puthdfs-processor-stuck-and-not-able-to-relea.html?childToView=155116#answer-155116
... View more
12-19-2017
12:22 AM
4 Kudos
@Tarun Kumar
I have been investigating this issue for more than a few days, but so far I have not been able to reproduce it in any environment to which I have access. What you're seeing is the end result of a set of conditions that cause the JAAS configuration and previously authenticated principal to be contextually lost, resulting in Krb5LoginModule.promptForName to interactively prompt for the principal name. If you wouldn't mind sharing details about your configuration, we can work together to diagnose what's causing PutHDFS to fail. Could you please answer the following questions:
What values are set for the ticket lifetime and ticket renewal lifetime in the KDC for the principal you have set in PutHDFS? What values are set for the ticket lifetime and ticket renewal liftetime in the krb5.conf that you have set for NiFi in Ambari? Does this issue occur consistently? Would it seem to happen around the time that the principal's kerberos ticket would be getting renewed, some time between 80% and 100% of the ticket lifetime? How often are files sent to PutHDFS incoming queue? On a regular interval, or is it sporadic? What is the "Relogin Period" property set to in PutHDFS' configuration? There are a few settings you can add/change for help provide more information to debug the issue.
Add the following line to NiFi's Advanced nifi-env config site in Ambari to allow Hadoop to log debug info regarding JAAS export HADOOP_JAAS_DEBUG=true
Add the following line to NiFi's logback config in Ambari to allow Hadoop to log debug info regarding Hadoop privileged operations <logger name="org.apache.hadoop.security" level="DEBUG"/>
Add the following line to NiFi's Advanced nifi-bootstrap config site to enable krb5 debug on the cluster (you can change java.arg.100 to java.arg.somenumber, as long as there's no other entry with that number elsewhere in the bootstrap config) java.arg.100=-Dsun.security.krb5.debug=true Please provide nifi-app and nifi-boostrap logs after restarting NiFi and observing the stuck threads, and I'll take a look at them.
... View more
10-18-2017
05:21 PM
@Saikrishna Tarapareddy, you may want to create a process group that contains the instantiation of your template, and then create connections from the areas of your flow to that process group. That way, you have one "instance" of the template created, and you'll only need to do your modifications once. You can always save a new template (for other instantiations, exporting, etc) with your modifications. I do admit that making connections across process groups in order to reuse a specific group may make the flow a bit harder to read, but eventually NiFi will support some improvements to make this easier/cleaner to do in a flow.
... View more
07-11-2017
06:47 PM
@Ian Neethling Could you provide a bit more about how you've installed NiFi? Are you running HDF, or just NiFi by itself? Did you install from an RPM, or did you install from a tar/gzip?
... View more
06-29-2017
06:50 PM
1 Kudo
@dhieru singh You'll need run-nifi.bat to be run as a service, or be able to run when the user is not logged on. This answer from a user on serverfault.com has some in-depth instructions on how to set up running a batch file with the Task Scheduler.
... View more
06-29-2017
06:41 PM
1 Kudo
@Ilya Li I agree with @Bryan Bende that the best approach is to refactor things such that shared classes are moved to something under nifi-extension-utils. I did this mainly for ListAzureBlobStorage, since it used the AbstractListProcessor code. You can take a look at https://github.com/apache/nifi/pull/1719, and take a look at the last four commits before the PR was merged to master, for an example of the refactoring.
... View more
05-30-2017
04:07 PM
@Alejandro A. Did this answer end up solving your use case?
... View more
05-24-2017
08:51 PM
3 Kudos
@Alejandro A. Are you saying you would like, at the end of this particular portion of your flow, to have the original content in a flowfile, and a second flowfile with the output generated by your external jar? If so, there are a few ways you could do this... One of them would be to use a DuplicateFlowFile processor to create a second copy of your flowfile, and use a ReplaceText processor on flowfile with the attribute value as content. You can use the Wait and Notify processors to wait for the processing of that flowfile. An example usage of the Wait/Notify processors can be found here. For the Release Signal Identifier, you can use ${filename} as the example suggests, but if your filenames aren't unique, you could use an UpdateAttribute processor to capture the original UUID of the flowfile before the DuplicateFlowFile processor. This is probably the easiest way to be able to know when that second flowfile has been processed. You could use MergeContent with a Correlation Attribute Name set to the same value as the Release Signal Identifier (and Max Number of Entries set to 2), and make sure the original flowfile gets routed from its Wait processor success relationship to the MergeContent processor, along with the success relationship of the second flowfile. If you're processing many different files concurrently, make sure that Maximum Number of Bins is equal to or greater than the number of concurrent files. I could probably create a sample flow of this, if you have trouble putting it together.
... View more
05-10-2017
09:08 PM
Is it possible to provide your flow and/or code? Does the user configuring the processors have read permissions on the controller services? How are you specifying the controller service in the property? Are you creating the controller service from a Processor Group's Configuration page and the going into the processor's configuration and selecting it from the dropdown, or are you clicking on "Create new service" in the dropdown?
... View more
05-10-2017
08:01 PM
1 Kudo
@Patrick Sharkey How are you configuring your processor to use multiple services? Do you have a separate property for each controller service to which it can be assigned? In the property descriptor for each property, has the proper type been passed in to the identifiesControllerService method on the descriptor? Looking at the code for the PostHTTP processor: public static final PropertyDescriptor SSL_CONTEXT_SERVICE = new PropertyDescriptor.Builder()
.name("SSL Context Service")
.description("The Controller Service to use in order to obtain an SSL Context")
.required(false)
.identifiesControllerService(SSLContextService.class)
.build();
If a second property descriptor was added to allow another type of controller service (let's say a custom serialization controller service) to be used in PostHTTP, the type of the service would have to be specified. public static final PropertyDescriptor CUSTOM_SERIALIZATION_SERVICE =
new PropertyDescriptor.Builder()
.name("Serialization Service")
.description("The Serialization Service to use to serialize data")
.required(false)
.identifiesControllerService(CustomSerializationService.class)
.build();
In the NiFi UI, once SSL Context Service and Custom Serialization Service instances have been created in the appropriate Process Group(s), they can be assigned to the properties of the processor.
... View more
05-04-2017
07:04 PM
@Jatin Kheradiya In nifi.properties on each node of your cluster, nifi.state.management.embedded.zookeeper.start set to false?
... View more
04-29-2017
03:45 PM
@Raphaël MARY Are you splitting the JSON into separate flowfiles, so that each JSON element (row of your data) is being processed individually by the ReplaceText processor? Take a look at two StackOverflow answers that I found dealing with handling multiline data: http://stackoverflow.com/a/3652392 http://stackoverflow.com/a/17825571 It looks like you may still need to use (?s) to turn on DOTALL mode so that EOL characters are matched/consumed by your regex expression. If ReplaceText is looking at a single JSON element (row of data), DOTALL mode will make the "." character consume EOL characters. For further information, here's a link that contains documentation on the match flags (such as the "s" flag in (?s)): https://docs.oracle.com/javase/tutorial/essential/regex/pattern.html I'm not a regex expert by any means, but hopefully some of this information helps you!
... View more
03-16-2017
08:31 PM
@Kibrom Gebrehiwot The keytab and principal you are trying to use in the PutHDFS processor was created using kadmin.local on the KDC running on your HDP cluster, correct? That, along with NiFi configured (via the nifi.kerberos.krb5.file property in nifi.properties) to use a krb5.conf file that has the realm for the HDP cluster defined in it should be all you need for PutHDFS to talk to HDFS on your HDP cluster.
... View more
03-13-2017
03:06 PM
@Mohammed El Moumni Are other, smaller files merging? I notice in both of your screenshots that the MergeContent processor is stopped, which will prevent files from being merged. Was the processor stopped just to take the screenshots?
... View more