Member since
09-23-2016
35
Posts
20
Kudos Received
12
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
497 | 06-01-2017 11:21 AM | |
1289 | 05-15-2017 12:20 PM | |
2099 | 05-03-2017 08:53 AM | |
3473 | 05-03-2017 07:53 AM | |
1597 | 02-21-2017 08:27 AM |
06-01-2017
11:21 AM
1 Kudo
Simran, you can merge single JSON objects into a larger file before you put it to HDFS. There is a dedicated processor for this: Merge Content https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.2.0/org.apache.nifi.processors.standard.MergeContent/index.html The processor also allows you to configure a property specifying the number of JSON you want to be merged into one single file: 'Minimum Number of Entries' As a side note, when you have a processor on your canvas, you can right click on it and go to 'Usage' to display the documentation of the processor. Hope that helps.
... View more
05-19-2017
11:51 AM
Actually you do not need to assign the values fix. You can pass the file-name and path dynamically to the next processors. Please check out the documentation at https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#flowfile For example, ${filename} will return the value of the “filename” attribute. Other values in this context are:
Filename ("filename"): The filename of the FlowFile. The filename should not contain any directory structure. UUID ("uuid"): A unique universally unique identifier (UUID) assigned to this FlowFile. Path ("path"): The FlowFile’s path indicates the relative directory to which a FlowFile belongs and does not contain the filename. Absolute Path ("absolute.path"): The FlowFile’s absolute path indicates the absolute directory to which a FlowFile belongs and does not contain the filename.
... View more
05-19-2017
10:01 AM
Hi, yes, this works as intended. GetFile is a Flow-starting processor, you can not connect to it from other processors - think about it like a process instance trigger. Please use the FetchFile Processor: GetFile -> PutFile - > Fetch File > PutHDFS Hope that helps.
... View more
05-18-2017
07:22 AM
Have you tried out: testkey=(\w+)
... View more
05-15-2017
03:59 PM
1 Kudo
Yes, Wing Lo. Switch it off and you will be able to migrate and download the existing Flows. This is an acceptable workaround here. As said, will be fixed in Nifi 1.1 or above. BR
... View more
05-15-2017
02:46 PM
2 Kudos
Unfortunately there was a bug in Nifi 1.0 that prevented downloading (templates, flowfile content, etc) when authenticated using LDAP or Kerberos. Please upgrade to Nifi 1.1 or 1.2, that will solve your problem.
... View more
05-15-2017
12:20 PM
3 Kudos
Hi, as of today you can not have different NIFI versions within the same NIFI Cluster (managed by Zookeeper), however you can setup different and separated clusters since Nifi as well as Zookeper can be run multiple times on the same servers since you only have to copy them into different folders and separate the config-files (nifi.properties, zoo.cfg etc.), and set different data-dirs / provenance-dirs etc. I would recommend to start with having two separated NIFI instances /opt/nifi1 and /opt/nifi2, each one with its own paths and ports configured in the nifi.properties file of each copy. And especially take care of the paths for
content_repository database_repository flowfile_repository provenance_repository work directory logs directory This is often forgotten when copying a nifi instance 1 as a base to setup a second one.
Just have a look at the parameters of the properties file. Hope that helps.
... View more
05-10-2017
08:48 AM
And please be aware that the WebUI/Canvas is the IDE against the NIFI instance / cluster. You are directly working on the NIFI instance, it is not an remote modelling approach where you will have to deploy to the server after having finished your development. All you model and do is done directly on the NIFI instance.
... View more
05-10-2017
08:41 AM
4 Kudos
Yes. Although there is only one canvas when you open the WebUI that canvas ca have many logical dataflows. Typically you can organize each logical dataflow into a process group, and then start and stop the whole process group. Also you can give / limit permissions to the process groups for different users/groups to allow multiple teams to share the same NIFI Instance/Cluster. Here is the link to the docs: https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#process_group_anatomy There is a very good article about how to use the process groups as reusable components: https://community.hortonworks.com/articles/16461/nifi-understanding-how-to-use-process-groups-and-r.html
... View more
05-09-2017
07:35 AM
@Amol Kulkarni - does that answer your question? Solved?
... View more
05-03-2017
08:53 AM
That is always a night mare in JAVA based tools. Hive relies on JAVA (plus SQL) so it respects the IEEE standard for number semantics. That means especially NaN (not a number) values in float columns are a tricky thing.
First of all: Have you tested what is returned for the '#N/A' columns when you do a select? I guess it is rather 'NaN' than '#N/A'.
So after testing the return value, I would suggest to test two approaches. Either try to use cast(): cast(dollar as String) <>'NaN'
(because all possible NaN values are displayed as "NaN" even if they are not strictly "equal" in the arithmetical sense)
or do the old trick and test the value of the column to fit a mathematical operation like e.g. dollar +1.0 > dollar
... View more
05-03-2017
07:53 AM
1 Kudo
It is not supposed to generate unique values. The hash() function is working with ranges. It is supposed to index different ranges with integer values. Think about grouping similar ranges of values in a large data set into smaller subsets and have an index to find the respective subset. A good explanation can be found there: http://preshing.com/20110504/hash-collision-probabilities/ If you want to generate unique values, have a look at using UDF (reflect("java.util.UUID", "randomUUID"))
... View more
02-21-2017
10:34 AM
Glad that it helped. Could you please klick to accept the above answer, so that others see that this is the solution. Thanks! 🙂
... View more
02-21-2017
08:27 AM
1 Kudo
Hi,
the PutEmail Processor supports the NIFI expression language for the parameter Subject. That means you can access all the attributes of your flow file and all your custom attributes or variables that you defined within the flow.
To have a custom subject in your PutEmail processor for the error-handling case, you should connect the PutHDFS (or GetFile or both) processor to the PutEmail Processor for the failure path and configure the PutEmail processor.
An example for a custom Subject could be: 'Hello from '${hostname}' the file '${filename}' caused an error at '${now} More Examples and some guidelines for the Nifi Expression Language are listed here: https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html
... View more
02-06-2017
09:54 AM
The two values were just examples. Try to change them to something less that fits your system environment. Either go for less than 512 (might do the job) or increase the ram assigned to the container:
increase VirtualBox memory from (I guess) 4096 to (e.g.) 8192 Log into Ambari from http://my.local.host:8080 change the values of yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb from the defaults to 4096 Save and restart (at lease yarn, oozie, spark)
... View more
02-06-2017
08:41 AM
Ah, sorry:) Yes, here you can't specify driver related parameters using <spark-opts>--driver-memory 10g</spark-opts> because your driver (oozie launcher job) is already launched before that point. It's a oozie launcher (which is a mapreduce job) launches your actual spark job and so spark-opts is not relevant. But the Oozie spark action doc says: The configuration element, if present, contains configuration properties that are passed to the Spark job. This is shouldn't be spark configuration. It should be mapreduce configuration for launcher job. So, please try to add the following <configuration> <property> <name>oozie.launcher.mapreduce.map.memory.mb</name> <value>4096</value> </property> <property> <name>oozie.launcher.mapreduce.map.java.opts</name> <value>-Xmx3072m</value>
</property> </configuration>
... View more
02-06-2017
08:12 AM
1 Kudo
It seems your spark driver is running with very small heap size, please try increasing the java.driver memory and see if it helps. Use this parameter (e.g.) when submitting the job: --driver-memory 1g
... View more
02-06-2017
07:57 AM
1 Kudo
Do you have another application that reads from the Event Hub using EventProcessorHost? EventProcessorHost sets an epoch on the receiver to ensure that only one active reader for a given consumer group and an event hub partition is active. You can try it with a different consumer group. Other scenario where this could happen is if you turn on checkpointing on EventProcessorHost. Here is some guidelines from MS how to use the Epoch settings for asynch receivers: https://blogs.msdn.microsoft.com/gyan/2014/09/02/event-hubs-receiver-epoch/
... View more
01-03-2017
08:59 AM
This error message indicates that Hive can not find the file under the given path. I assume you are using the Sandbox, right? So no permission issues for user Admin for Hive and HDFS... - are you using the Sandbox? Can you please check that the path you entered does not contain a leading or tailing blankspace '/tmp/data/geolocation.csv'
... View more
11-30-2016
09:53 AM
Very good, glad to help. I transformed the comment that helped finally to an answer and would be happy if you accept it 😉 Thanks.
... View more
11-30-2016
09:13 AM
You can also use the commandline: https://community.hortonworks.com/questions/49338/using-the-nifi-rest-api-to-execute-a-data-flow-cre.html
... View more
11-30-2016
09:08 AM
Thanks Bhanu, the error comes from Hive. Can you please also have a look at the hive metastore log file and share any related error message?
... View more
11-30-2016
09:08 AM
1 Kudo
Ah, Hive has reached the maximum number of open transactions. There is the parameter hive.max.open.txns that limits the number (compare to https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions ). Default Value is 100000 - this should be high enough, and I guess you did not change it, but maybe you want to check it. Not sure what is going on in your environment but maybe you want to have a look Chapter 6 of http://hortonworks.com/hadoop-tutorial/using-hive-acid-transactions-insert-update-delete-data/ to see how to manage your open transactions. Hope this helps.
... View more
11-30-2016
08:35 AM
Avijeet, yes this is possible. You will have to use the "Update Processor" API from NIFI. There is a great description written by Andrew Grande about How to update Nifi Flows on the fly: https://community.hortonworks.com/articles/3160/update-nifi-flow-on-the-fly-via-api.html
... View more
11-30-2016
08:06 AM
2 Kudos
Actually the NIFI canvas can be used by multiple sessions in parallel - just test it on your environment: Open it once in WebBrowser A and then in B (should be different ones eg. FireFox and Chrome etc.).
However you are right - there needs to go some more improvement into the features to better support the management.
Good news is that it is already considered important, have a look at https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows Right now a good approach would be to group the different data flows in process groups for management and version controlling (via templates).
... View more
11-30-2016
07:36 AM
Bhanu, can you please share the entry of the nifi-app.log for the error to see what exactly happened?
... View more
11-30-2016
07:19 AM
Hi Mud, well this is not to simple. The reason for this is that MS SQL does not support UPSERT ootb. If it would, then you could simple create an INSERT Statement in NIFI and replace the INSERT with UPSERT using the ReplaceText processor. (This approach is used when interacting with Apache Phoenix that supports the UPSERT SQL verb). However there are two options you have. First you could work with Triggers and simply catch your insert to check if it should be changed into an update.
This will cost performance on DB side and it really depends on a) number of inserts and b) who else is interacting with your table. The second option would be to use the ExecuteProcess / Execute Script processor to invoke a shell or groovy script that will do the UPSERT for you. This approach will also cost some performance / additional I/O, but will do the job. This also allows you to either do the magic on DB layer (e.g. with an stored procedure) or in the script itself. Here is an example of a stored procedure approach (http://www.sergeyv.com/blog/archive/2010/09/10/sql-server-upsert-equivalent.aspx). There is also a entry here in the community where calling an stored procedure is explained: https://community.hortonworks.com/questions/26170/does-executesql-processor-allow-to-execute-stored.html HTH. PS: Same approach for using MERGE, even much simpler since MERGE is completely MS SQL Server based - you will first have to populate a source table in MS SQL Server and then invoke the MERGE script ....
... View more
11-29-2016
08:37 AM
Great. Then please drop table and create it again using the STORED AS TEXTFILE parameter or use my above described procedure to import the data using a temp table inbetween if you really need it as ORC stored data.
... View more
11-29-2016
08:24 AM
2 Kudos
To shortcut the procedure you can also set the other java version for the whole server: /usr/sbin/alternatives --config java This will change the java version for the whole system and not just Ambari and the HDP components.
... View more