About Shu_ashu

Shu_ashu · ‎01-25-2018

@Salda Murrah ListFTP -> FetchFTP -> ConvertRecord (CSV to JSON) -> SplitJSON //split the json array to individual messages -> EvaluateJSONPath //extract date value and keep it as attribute -> RouteonAttribute //check date attribute value is it contains last day date or not -> ReplaceText (insert into) -> PUTCassandraQL SplitJson Configs:- JsonPath Expression $.* EvaluateJsonPath Configs:- Destination flowfile-attribute date $.date if you want you can add more properties and all those values will be added as flowfile attributes. Routeonattribute Configs:- Routing Strategy Route to Property name yesterday ${date:contains("${now():toNumber():minus(86400000):format('yyyyMMdd')}")} Connect Yesterday relation to next Replace text processor so route on attribute processor only gives date attribute that having yesterday's date in it(we are comparing by using contain function i.e expression checks if the date attribute contains 20180124 or not). Then in replace text processor prepare your insert into statement then use PUTCassandraQL processor. . If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of errors.

Shu_ashu · ‎01-24-2018

@Anil Reddy if you are using Replacement Strategy Regex Replace In ReplaceText processor, it will add \ before $ sign because $ is special character(end of line) character in Regular expression. If you use Replacement Strategy Always Replace this Strategy won't add any \ to before $ sign. I'm not sure if you are going to achieve your desired result with Always Replace strategy. What you can do to overcome \ issue is after your first Replace Text Processor(that adds \ before $) add another Replace text processor with below configs. Search Value \ //search for \(backslash) literal Replacement Value ${literal('')} //replace \ with empty (or) Instead of keeping above value(${literal('')}) you can click on check box that is shown below Both methods above will result same empty string as Replacement Value. Maximum Buffer Size 1 MB Replacement Strategy Regex Replace Evaluation Mode Entire text Configs:- Example:- Input:- \$\$\$\$\$\$\$\$hi\$\$\$\$\$\$\$\$ Output:- $$$$$$$$hi$$$$$$$$ By using two Replace Text processors in series we are going to replace \ that is added in first replace text processor.

Shu_ashu · ‎01-24-2018

@Anil Reddy use 9$ instead of 3$ ${myattr:prepend('$$$$$$$$$'):append('$$$$$$$$$')} in expression language if you use 3$ are results as 1$ so we need to use 9$ if you want 3$ sign to be added to the attribute value. Input:- "myattr" value is "This is Test" expression:- ${myattr:prepend('$$$$$$$$$'):append('$$$$$$$$$')} Output:- $$$This is Test$$$

Shu_ashu · ‎01-19-2018

@Martin Mucha By using RouteOnContent processor we can check the contents of flowfile like is the flowfile message having array in it (or) not, if it is having array then you can route the flowfile to some processors, if not having array route to some other processors. Example:- Let's consider i'm having array of json messages and i'm splitting them individual messages Input:- [{"Id": "1","name":"HDP"}, {"Id": "2","name":"HDF"}] i'm having 2 json messages in an array and i need to add id,name values as attributes for this case i need to split the json array into individual flowfiles. SplitJSON:- JsonPath Expression $.* this property will splits json array into individual flowfiles. For out input the output would be 2 flowfiles then i will use Evaljson path to extract attributes and add them to the flowfile. But i'm not sure everytime i will get json array (or) not, some times i get only single json message like {"Id": "1","name":"HDP"} for this input my splitjson processor won't give expected results. So for this case i'm going to use RouteOnContent processor:- to check the contents of flowfile and route them accordingly Case1:- if i get json array message then i need to route the flowfile to SplitJson processor. Configs:- Add new property as Array ^\[\{ //we are checking is the content starting with [ and followed by { that means it's an array of json messages in this case i'm going to transfer Array property relation to SplitJson processor. Case2:- if no data is in the json message then i'm checking with below property. NoData ^\{\} //checking the content having empty message in it i'm auto terminating this relation as we are not caring if there is no data. So now the unmatched relation will have single json messages i.e {"Id": "1","name":"HDP"} I'm routing this single json message to other processors(like EvalJsonPath..) to extract attribute values. RouteOnContent Configs:- Sample Flow:- Like this way you can check the contents of flowfile is content having arrays or not. for testing in the above example i'm just checking the starting characters of content, but you can check the whole contents of flowfiles by just adding new property with matching regex and change buffer size if needed. . If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎01-17-2018

@Richard Scott Yes it returns True as attribute value. ${anyDelineatedValue("${number_list}, ","):startsWith("1")} //split the numbers_list string with ,(based on delimiter) and check any of the number_list values is starting with 1 (or) not. As we are having just 1000 so number_list string then the expression language check for is the number_list any value is starting with 1 so the result is True. Example:- For ${anyDelineatedValue("${number_list}, ","):startsWith("1")} Case1:- number_list 1000,2000,3000 result True //because our expression language is checking for any of the delineated values and we are having 1000 which is starting with 1 so returns as True. Case2:- number_list 2000,2000,3000 result False //because any of the number list values are not starting with 1 so the result is False. In Addition You can evaluate the expression language by using Generate flowfile processor Configs:- Drag and drop generate flowfile processor Right click and click on Configure In properties tab click on + sign that is highlighted in above screenshot and add new property as number_list 1000 4.Click on Scheduling Tab and keep Timerdriven with Run schedule as 11111111110 (or) some big number, so the processor trigger 1 flowfile after those many sec that we have given in Run Schedule value. UpdateAttribute Configs:- Follow same steps as above and add new property as ad(your property name) ${anyDelineatedValue("${number_list}",","):startsWith("1")} After update attribute processor we are going to add ad attribute to the flowfile as result of expression that we mentioned i.e ${anyDelineatedValue("${number_list}",","):startsWith("1")} Output:- In above screenshot we are having ad attribute having value as true. Flow Screenshot:- Like this way you can test expression language using NiFi. . If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of errors.

Shu_ashu · ‎01-17-2018

@Murat Menteşe In Extract Text processor add new properties in configure menu by clicking on the + sign at right corner. Configs:- Attribute 1 (.*?); Attribute 2 .*?;(.*?); Attribute 3 ;.*?;(.*?); Attribute 4 ^.*;.*;(.*);.*; Attribute 5 ^.*;.*;.*;(.*); Attribute 6 ^.*;.*;.*;.*;(.*) Screenshot:- (Or) Add properties with below regex matches Attribute 1 (.*?); Attribute 2 .*?;(.*?); Attribute 3 ;.*?;(.*?); Attribute 4 .*?;.*?;.*?;(.*?); Attribute 5 .*?;.*?;.*?;.*?;(.*?); Attribute 6 .*?;.*?;.*?;.*?;.*?;(.*) Screenshot:- Output:- As you can see in above output screenshot both regex matches and gives same results once we extract all the contents as attributes then you can use expression language like ${Attribute 1} to get 1096 value ..etc. **Note** Keep no space in attribute names like Attribute_1 instead of Attribute 1,that would be easy to retrieve attribute value inside NiFi Flow. Change the Attribute names without spaces in Extract Text Processor. . If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of errors.

Shu_ashu · ‎01-16-2018

@JAROB Try with the below Record Schema property ConvertJSONToAvro Configs:- Record schema { "type": "record", "name": "NifiHeartBeat_v3", "fields": [{ "name": "MeasureId", "type": "string", "default": "null" }, { "name": "Value", "type": "string", "default": "null" }, { "name": "AuditDateTime", "type": "string", "default": "null" }, { "name": "DATA_SOURCE", "type": "string", "default": "null" }] } And if you are testing with one json message then don’t enclose the whole json message in [](square brackets) because when we enclose the message in [] convertjsontoavro won’t parse the array message, we need to split the array if array having one message in it. Try with json message { "MeasureId": "nifiHeartBeat", "Value": "1", "AuditDateTime": "Tue Jan 16 13:48:58 CET 2018", "DATA_SOURCE": "20083" } ConvertJSONToAvro configs:- (or) If you need to enclose message with [] then use split json processor to split the message [{ "MeasureId": "nifiHeartBeat", "Value": "1", "AuditDateTime": "Tue Jan 16 13:48:58 CET 2018", "DATA_SOURCE": "20083"}] SplitJson Configs:- Because ConvertJSONtoAVRO processor won't parse array of json messages so we need to split them individually(even array having one message in it) and send to the processor. 1.Add the property value as JsonPath Expression $.* Configs:- ConvertJSONToAvro configs:- Same configs as mentioned above. If you are using ConvertRecord processor then we don’t need to use SplitJSON processor as convert record processor works with array of json messages. In addition for your reference i have attached nifi flow templates using convertjsontoavro processor and using convertrecord processor to convert json message to avro, so that you can save and reuse the templates. json-to-avro-using-convertrecord.xml json-to-avro-conversion-using-convertjsontoavro-pr.xml . If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of errors.

Shu_ashu · ‎01-15-2018

@Mark I have tried with same configurations as you mentioned in the question in extract text and in replace text processors, The output flowfile contents from replace text processor resulting with \ in it. Output:- Here is my configurations:- Extract Text:- Output from extract text processor:- abc attribute value having \ in it Replace Text:- As Regex Replace as Replacement Strategy (Or) Replacement Strategy as Always Replace Output:- Both of these Replacement strategies are resulting same output that i mentioned as first screenshot. Once check the attribute abc value after Extract Text processor having \ in it or not. If you still having issue then share your flow and configs of the processors please..

Shu_ashu · ‎01-13-2018

@Vignesh Asokan 1.Do hive# desc formatted <hive-external-partitioned-table>; get the Location details from desc formatted statement and do bash$ hdfs dfs -ls <hdfs-location> Check is there any partitions created (or) not. 2.In pyspark shell after executing below statement df.write.mode("overwrite").partitionBy("col1","col2").insertInto("Hive external Partitioned Table") in pyspark shell logs shows where the partition directory is creating in HDFS location Example:- 18/01/13 17:47:52 INFO FileUtils: Creating directory if it doesn't exist: hdfs://******/apps/hive/warehouse/partition_table/daily=2017-12-23 As you can see above in pyspark shell logs it has shown creating directory in /apps/hive/warehouse/partition_table/daily=2017-12-23, my table name is partition_table and having daily is the partition column spark has created partition in HDFS /apps/hive/warehouse/partition_table/ directory. if you are not able to figure out the issue share more details (pyspark shell logs, table location details and statements that you are executing in pyspark shell).

Shu_ashu · ‎01-13-2018

@Surendra Shringi The question got addressed here https://community.hortonworks.com/questions/159780/parse-file-in-nifi.html Need to make some config changes to the processors that are used in the above community link to work with the sample data. Changes in Extract Text processor:- Change existing Age,Name Property values in Extract text processor Age \|.*\|(.*?)\| Name \|(.*?)\| Add new id property as id ^(.*?)\| Changes in Replace text:- Change the below property Replacement Value ${id},${Name},${Age},${Address_city},${Address_state},${Address_zipcode} I think these are the changes you need to make to work the NiFi process to work with your input data.

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	516

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: Inserting data (daily) of a specific date in C...

Re: replacetext processor: replacing '$$$$' with '...

Re: Nifi Expression language: append or prepend th...

Re: SplitJson behavior for non array input

Re: Nifi Expression Language: anyDelineatedValue Q...

Re: How to ExtractText from flow file using Nifi P...

Re: Avro Schema for Array of Json Object

Re: problem with replace text

Re: Inserting into exiting partitioned Hive table ...

Re: Problem : while parsing JSON formatted CSV fil...