About Shu_ashu

Shu_ashu · ‎08-20-2018

@Lakshmana Maddineni The issue is because of duplicate rows. When 'not match' is combined with 'match' under the Merge statement, then the cardinality check is applied by default. The Cardinality check needs to be disabled when using both 'matched' and 'not matched'. Set the following property in your hive shell and then try to execute the merge statement again. Set hive.merge.cardinality.check=false; Refer to this support KB article for more details regards to the same exact issue..!! - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎08-20-2018

@CHEH YIH LIM I think you are using extract text processor to extract the content and keep as attribute to the flowfile if yes then change Maximum Buffer Size 1 MB Specifies the maximum amount of data to buffer (per file) in order to apply the regular expressions. Files larger than the specified maximum will not be fully evaluated. Maximum Capture Group Length 1024 Specifies the maximum number of characters a given capture group value can have. Any characters beyond the max will be truncated. These two property values as per your flowfile size. - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎08-18-2018

@Saravanan Subramanian Try with below Jolt Spec: [ { "operation": "shift", "spec": { "*": "Data.&" } }, { "operation": "default", "spec": { "Data": {} } } ] Input: { "primary": { "value": 4 }, "quality": { "value": 3 } } Output: { "Data" : { "primary" : { "value" : 4 }, "quality" : { "value" : 3 } } } Another way of doing this is using Replace Text processor by capturing all the content of flowfile and replacement value as Search Value (^.*$) Replacement Value { "Data": $1 } Character Set UTF-8 Maximum Buffer Size 1 MB Replacement Strategy Regex Replace Evaluation Mode Entire text - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎08-16-2018

@Parth Karkhanis Could you try with introducing SplitAvro Processor in your flow after QueryDatabaseTable processor and configure the processor to create small chunks of flowfile instead of one big AVRO file then try to run your commands again.

Shu_ashu · ‎08-15-2018

@Sai Krishna Makineni You can use either ListHDFS (or) GetHDFSFileInfo processors and then processor will not store the state and you can schedule this processor to run at nightly and once you list the files from HDFS then you can use hdfs.lastModified attribute(or) you can use your filename with substringAfter function and check the timestamp value in your RouteOnAttribute processor. Once you filterout the files that are more than specific time then feed to DeleteHDFS processor to delete them. In addition ListHDFS processor stores the state and runs only incrementally so if you want to clear the state then use RestAPI with /processors/{id}/state/clear-requests To clear the state and run the processor once you clear the state. Flow: 1.ListHDFS2.RouteOnAttribute //check the filename (or) lastmodified time3.DeleteHDFS //delete the files in hdfs Flow: 1.GenerateFlowFile 2.GetHDFSFileINFO 3.RouteOnAttribute 4.DeleteHDFS (or) You can use GetHDFS processor(Keep source file to true) which doesn't store the state but in this processor we are fetching the files from HDFS if the file is big then we are keeping lot of load on NiFi.

Shu_ashu · ‎08-15-2018

@shraddha srivastav Stop Update Record and add some sample records after GetFile processor by listing the queue i.e. Input sample records like 10(not screenshots) and Expected output. That would be helpful to recreated and resolve the issue..!!

Shu_ashu · ‎08-15-2018

@shraddha srivastav Use UpdateRecord processor below configs in CSVRecordSetWriter controller service add filename column with string type as last field in the avro schema. UpdateRecord Configs: Add new property in UpdateRecord processor as /filename concat(/UutId,/Test) //column names will be case sensitive As we are using Record Path Value as Replacement Value Strategy now update record processor will concat UutId,Test values to filename column value. Refer to this link for more details regarding Update Record processor. Example: InputData: UutId,Test 1,2 CsvReaderConfigs: CsvRecordSetWriter avro schema: { "namespace": "nifi", "name": "balances", "type": "record", "fields": [ {"name": "UutId", "type": ["null", "string"]}, {"name": "Test", "type": ["null", "string"]}, {"name": "filename", "type": ["null", "string"]} ] } Configs: Ouput: UutId,Test,filename 1,2,12 - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎08-14-2018

@Gillu Varghese Both cron triggers in the screenshot are same you can use either of them for scheduling purpose. We cannot trigger just at 3AM, the largest time that we can trigger is at 2:59:59AM with one cron expression.

Shu_ashu · ‎08-14-2018

@Gillu Varghese Quartz cron expression needs to be atleast 6 fileds and last field will be optional 0 0/15 2 ? * * //no specific value for day of month as we have scheduled at 2 AM so cron triggers starting at 2AM 0 0/15 2 1/1 ? //invalid as month field doesn’t allow ? in it. 0 0/15 2 1/1 * ? (or) 0 0/15 2 * * ? //start first day of the month and execute each 1 day at 2AM In this case both expressions will be same. Please refer to this awesome explanation regards to Quartz cron: 0 0 0/1 1/1 * ? * | | | | | | | | | | | | | +-- Year (range: 1970-2099) | | | | | +---- Day of the Week (range: 1-7 or SUN-SAT) | | | | +------ Month of the Year (range: 0-11 or JAN-DEC) | | | +--------- Day of the Month (range: 1-31) | | +------------- Hour (range: 0-23) | +---------------- Minute (range: 0-59) +------------------ Second (range: 0-59) * (“all values”) used to select all values within a field. For example, “” in the minute field means *“every minute”. ? (“no specific value”) useful when you need to specify something in one of the two fields in which the character is allowed, but not the other. For example, if I want my trigger to fire on a particular day of the month (say, the 10th), but don’t care what day of the week that happens to be, I would put “10” in the day-of-month field, and “?” in the day-of-week field. / used to specify increments. For example: “0/15” in the seconds field means “the seconds 0, 15, 30, and 45”. And “5/15” in the seconds field means “the seconds 5, 20, 35, and 50”. You can also specify ‘/’ after the ‘’ character - in this case ‘’ is equivalent to having ‘0’ before the ‘/’. ‘1/3’ in the day-of-month field means “fire every 3 days starting on the first day of the month”. To explain difference between ? and * in the expressions, first of all take a look at this table: Field Name Mandatory Allowed Values Allowed Special Characters Seconds YES 0-59 , - * / Minutes YES 0-59 , - * / Hours YES 0-23 , - * / Day of month YES 1-31 , - * ? / L W //allowed '?' Month YES 1-12 or JAN-DEC , - * / Day of week YES 1-7 or SUN-SAT , - * ? / L # //allowed '?' Year NO empty, 1970-2099 , - * /

Shu_ashu · ‎08-14-2018

@Gillu Varghese > NiFi uses quartz cron expression for your case use below expression to run processor 0 0/15 2 1/1 * ? * runs at 2:00AM,2:15AM,2:30AM,2:45AM. > If you want to run at 3:00 AM then we need to use another trigger processor to be scheduled separately with below cron expression. 0 0 3 1/1 * ? * Refer to this link to create/validate quartz cron expressions and this for more details regarding cron scheduling in NiFi. In addition we can also add minutes with comma seperators 59 0,15,30,45,59 2 1/1 * ? * runs at 2:00:59AM,2:15:59AM,2:30:59AM,2:45:59AM,2:59:59AM. - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	516

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: Hive - Merge command throwing error message

Re: Flow file exceeded 1024 characters, how to get...

Re: Jolt transform for moving json message to sub...

Re: Getting error of avro runtime exception invali...

Re: Nifi processor that deletes the older day file...

Re: Extract values from CSV and place it in a new ...

Re: Extract values from CSV and place it in a new ...

Re: setting cron driven jobs in nifi

Re: setting cron driven jobs in nifi

Re: setting cron driven jobs in nifi