Member since
09-23-2016
35
Posts
20
Kudos Received
12
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
972 | 06-01-2017 11:21 AM | |
2663 | 05-15-2017 12:20 PM | |
3246 | 05-03-2017 08:53 AM | |
5319 | 05-03-2017 07:53 AM | |
3038 | 02-21-2017 08:27 AM |
06-01-2017
11:21 AM
1 Kudo
Simran, you can merge single JSON objects into a larger file before you put it to HDFS. There is a dedicated processor for this: Merge Content https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.2.0/org.apache.nifi.processors.standard.MergeContent/index.html The processor also allows you to configure a property specifying the number of JSON you want to be merged into one single file: 'Minimum Number of Entries' As a side note, when you have a processor on your canvas, you can right click on it and go to 'Usage' to display the documentation of the processor. Hope that helps.
... View more
05-19-2017
11:51 AM
Actually you do not need to assign the values fix. You can pass the file-name and path dynamically to the next processors. Please check out the documentation at https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#flowfile For example, ${filename} will return the value of the “filename” attribute. Other values in this context are:
Filename ("filename"): The filename of the FlowFile. The filename should not contain any directory structure. UUID ("uuid"): A unique universally unique identifier (UUID) assigned to this FlowFile. Path ("path"): The FlowFile’s path indicates the relative directory to which a FlowFile belongs and does not contain the filename. Absolute Path ("absolute.path"): The FlowFile’s absolute path indicates the absolute directory to which a FlowFile belongs and does not contain the filename.
... View more
05-19-2017
10:01 AM
Hi, yes, this works as intended. GetFile is a Flow-starting processor, you can not connect to it from other processors - think about it like a process instance trigger. Please use the FetchFile Processor: GetFile -> PutFile - > Fetch File > PutHDFS Hope that helps.
... View more
05-15-2017
12:20 PM
3 Kudos
Hi, as of today you can not have different NIFI versions within the same NIFI Cluster (managed by Zookeeper), however you can setup different and separated clusters since Nifi as well as Zookeper can be run multiple times on the same servers since you only have to copy them into different folders and separate the config-files (nifi.properties, zoo.cfg etc.), and set different data-dirs / provenance-dirs etc. I would recommend to start with having two separated NIFI instances /opt/nifi1 and /opt/nifi2, each one with its own paths and ports configured in the nifi.properties file of each copy. And especially take care of the paths for
content_repository database_repository flowfile_repository provenance_repository work directory logs directory This is often forgotten when copying a nifi instance 1 as a base to setup a second one.
Just have a look at the parameters of the properties file. Hope that helps.
... View more
05-09-2017
07:35 AM
@Amol Kulkarni - does that answer your question? Solved?
... View more
05-03-2017
08:53 AM
That is always a night mare in JAVA based tools. Hive relies on JAVA (plus SQL) so it respects the IEEE standard for number semantics. That means especially NaN (not a number) values in float columns are a tricky thing.
First of all: Have you tested what is returned for the '#N/A' columns when you do a select? I guess it is rather 'NaN' than '#N/A'.
So after testing the return value, I would suggest to test two approaches. Either try to use cast(): cast(dollar as String) <>'NaN'
(because all possible NaN values are displayed as "NaN" even if they are not strictly "equal" in the arithmetical sense)
or do the old trick and test the value of the column to fit a mathematical operation like e.g. dollar +1.0 > dollar
... View more
05-03-2017
07:53 AM
1 Kudo
It is not supposed to generate unique values. The hash() function is working with ranges. It is supposed to index different ranges with integer values. Think about grouping similar ranges of values in a large data set into smaller subsets and have an index to find the respective subset. A good explanation can be found there: http://preshing.com/20110504/hash-collision-probabilities/ If you want to generate unique values, have a look at using UDF (reflect("java.util.UUID", "randomUUID"))
... View more
02-21-2017
10:34 AM
Glad that it helped. Could you please klick to accept the above answer, so that others see that this is the solution. Thanks! 🙂
... View more
02-21-2017
08:27 AM
1 Kudo
Hi,
the PutEmail Processor supports the NIFI expression language for the parameter Subject. That means you can access all the attributes of your flow file and all your custom attributes or variables that you defined within the flow.
To have a custom subject in your PutEmail processor for the error-handling case, you should connect the PutHDFS (or GetFile or both) processor to the PutEmail Processor for the failure path and configure the PutEmail processor.
An example for a custom Subject could be: 'Hello from '${hostname}' the file '${filename}' caused an error at '${now} More Examples and some guidelines for the Nifi Expression Language are listed here: https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html
... View more
02-06-2017
09:54 AM
The two values were just examples. Try to change them to something less that fits your system environment. Either go for less than 512 (might do the job) or increase the ram assigned to the container:
increase VirtualBox memory from (I guess) 4096 to (e.g.) 8192 Log into Ambari from http://my.local.host:8080 change the values of yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb from the defaults to 4096 Save and restart (at lease yarn, oozie, spark)
... View more