Member since
10-20-2017
63
Posts
0
Kudos Received
0
Solutions
09-22-2018
11:32 AM
Hello everyone, I'm trying to configure Log4j for HDFS Namenode and Security Audit logs. In either way Size based rotation or Daily Rotation I should be able to rotate and delete the logs. However It is not working as expected. Can someone send me the best configurations in Log4j for HDFS NN/Audit Logs.Attaching the Log4j configurations . log4j.txt
Help me to delete the logs once it reached 10 Days .I would like to Gzip the older files. Please find the below screenshot for the size consumed on the disk We would like to remove Logs based on Size for Audit logs .
... View more
09-19-2018
06:58 PM
@Shu Could you please check on this
... View more
09-19-2018
01:46 AM
Attaching the right flow:hbase-iud.xml
... View more
09-18-2018
09:10 PM
Hello, We are trying to handle Inserts/Updates/Deletes in Nifi . Currently attached the flow which is taking longer time to process the records and inserts into Hbase . I need to use Compositeprimarykey(ServerSerialno,ServerName) .Based on the CompositeKey will do the Inserts/Updates/Deletes.Need to boost the performance by altering the flow/configs.Kindly assist me to use the right flow to achieve this. Attached the flow .
... View more
Labels:
09-18-2018
07:40 PM
@Shu Is it possible to add two RowIdentifier/RowIdentifier Field name in PutHbaseJson , Like ServerName,ServerNo or ${ServerName},${ServerNo} .
... View more
09-18-2018
05:54 AM
Hello everybody, I'm using Routetext processor to separate JSON records to different flow.Ultimately I need Flow1 should be inserted into HBASE table first , Flow 2 should be executed second and so on.How I can achieve this. Basically Inserts should insert data to HBase first,Updates should happen next and Delete should happen last . Attached the Sample test.xml
... View more
Labels:
09-15-2018
06:25 PM
Hi All, Good day, I'm just checking routeontext processor to route values if matched then route for different action and if not matched route for different action . My Sample flow would be like this : GenerateFlowFile | RouteOnText --- Flow 1 ---> Putfile --- Flow 2 --> Putfile My Text content is simple My car color is Blue My car color is Yellow If Color is Blue -> goes to Flow1 If Color is Yellow -> Goes to Flow 2 Now I have just started the processor I can see queue in Generate flow File is queued upto 10000 and Flow 1 is having 6000 and Flow2 is having somewhat around 1000 with N number of duplicates . Now How can I restrict it to only given content without any repetition .SO expected output is only two text files on respective flows . not n number of values . How can I configure that way . Can you please high level information of why is it happening . Cannot terminate relationship in GenerateFlowFile as the connection is already given to next processor . One important thing is Flow 1 should be executed first and flow 2 should be executed after the flow 1 is completed .
... View more
Labels:
09-04-2018
06:22 PM
Hi I need to load the JSON data into HBASE on the below conditions . If the record(composite primary key)(servername+serverid) is matched then update/delete the record , If not matched Insert the record. How Can I do this in Nifi with Hbase for larger dataset. forecasted data size would be 50-100TB approx for the next one year. optional if at possible , I need to discard the invalid entries from JSON on the flow using regular exp . Where I need to discard the entries and those discarded values creates empty space . Kindly let me know how to remove those empty space after removing the blank values. If possible How to avoid JSON breaking into bad values . I couldn't find good tutorial to Insert/Update in HBASE. Attached sample data test.txt
... View more
Labels:
08-30-2018
07:09 PM
Hi All, I'm trying to update the JSON data into Hive using this approach https://goo.gl/J7chi3 . While doing this approach I have two issues. 1.My Input record is 20k and output record count is 23.5k approx . some json's are breaking creating duplicates 2.My Input record size is 10k and output record count is 20k . As per the link it should be updating the records if it is already present in the table . Can anyone guide me to do upserts in hive . Apart from the above mentioned methods , Few other failed methods for upserts I have tried to use Merge options in hive refer link : https://community.hortonworks.com/articles/97113/hive-acid-merge-by-example.html --> This merge is not suitable to merge more than 5 GB or more . Taking more hours to complete or not to complete or getting Heap memory error for even 6 GB data. Someone Suggested Merge with Source and Destination as partition. We will be getting error if the destination is partitioned since Merge cannot update the partition key value . Cluster ram size is 250GB Can anyone help me in this Please with definitive steps. . But it should work for Upserts(when record matched then update , if not then Insert) for larger datasets more than 5TB . None of the solutions are working out there in the Internet so far more than a month.Could anyone let me know the valid steps for larger datasets with JSON.
... View more
Labels:
08-30-2018
06:15 PM
Accepeted the answer for his answer . But didn't work out well.:(
... View more
08-30-2018
04:57 AM
I'm currently trying to remove the blank line after discarding the lines in Nifi Removing those lines in Nifi {"eventType":"delete","ServerSerial":"0","ServerName":"","deletedat":"2018-08-24 17:56:34.944"} Since the serial number is 0 and empty servername . After removing there will be a blank space created in the final output file I want to replace the blanks space
"eventType":"delete","ServerSerial":"1556562030","ServerName":"XYZ_U_O","deletedat":"2018-08-24 17:56:39.974"} {"eventType":"delete","ServerSerial":"0","ServerName":"","deletedat":"2018-08-24 17:56:34.944"}
{"eventType":"delete","ServerSerial":"0","ServerName":"","deletedat":"2018-08-24 17:56:34.944"}
So I'm currently using ReplaceText to search for the value : {"eventType":"delete","ServerSerial":"0","ServerName":""[^}]+},?
and replacing with empty spaces . It failed to remove the blank lines Replaced with ^[ \t]*$\r?\n --> Failed to remove . Replaced with \r\n --> Failed to remove . Replace value is not considered the regex value . What shall I do . How can i replace blank lines after removing the lines based on the Search value.
... View more
Labels:
08-24-2018
07:30 PM
@Shu When using for larger dataset , The merge is taking longer time to complete . Final Table is having 150GB added every day . So Scanning the final table and adding updates is really taking more than an hour . Any other alternative approach .
... View more
08-24-2018
07:02 PM
We are receiving Hourly JSON data into HDFS. The size of the data would be 7GB per hour . when matched record found on the final table then Update (or) Delete if the record not matched in the final dataset then insert the record.
What is best way to do Upserts(Updates and Inserts in Hadoop) for large dataset. Hive or HBase or Nifi . What is flow . Can anyone help us in the flow .updates.txt
... View more
Labels:
08-23-2018
07:01 PM
Hi @Shu I did try your steps , i have few queries Answer 1: I cannot insert the partition key if the source and target tables are partitioned Updating partition not supported --> Error: Error while compiling statement: FAILED: SemanticException [Error 10292]: Updating values of partition columns is not supported (state=42000,code=10292) Answer2: 1.Raw data is in JSON and landing to HDFS
2.For Merging we are converting the data into ORC,transaction, bucketed 3. With ORC, How I can use Input_file_name . It is not possible to merge with JSON raw data files right . Also , Could you please highlight on point 3&4 .
... View more
08-21-2018
06:55 PM
Hi @Shu I followed your tutorial here: https://community.hortonworks.com/articles/191760/create-dynamic-partitions-based-on-flowfile-conten.html I'm stuck with the error attached . Also , Is it possible to do update+insert in the flow as mentioned the URL above using PutHiveQL. Am able to execute Merge option in Hiveserver2. However due to large amount of data Merge is not working for huge dataset I'm checking using the URL above , can I achieve the below use case nifi-predicateerror.jpg
when matched record found on the final table then define which action you need to take either Update (or) Delete if the record not matched in the final dataset then insert the record.
... View more
Labels:
08-17-2018
09:04 AM
capture1.png@Jay Kumar SenSharma . Everything is configured properly in the Ambari . However it is not working as expected . Please refer attached screenshot . you will understand and not sure why it is not removing the older logs which is more than 5 days as configured capture.png
... View more
08-17-2018
08:01 AM
Hiveserver 2 log4j settings for daily rotation and delete more than 10 days logs . Can anyone share the config file to do that. I have already configured that ( attached ) but it is not working as expected . So I would like to know where exactly we need to configure it (Advanced-hive-execlog4j,Advanced hive-log4j,Advanced hive-log4j2,Advanced beeline-log4j2 ) we are usinglog.txt beeline hs2.HA enabled HS2.d
... View more
Labels:
07-13-2018
02:16 PM
from Generatetablefetch I'm able to get the flow file . Upon running the flow, the select query failed to execute on DB2. On investigating we found that the query generated by GenerateTableFetch looked like this select userid,timestamp from user11 where timestamp<='01-01-2018 12:00:00' order by timestamp limit 10000 And I have used the Nifi Expression language as per the https://community.hortonworks.com/articles/167733/using-generatetablefetch-against-db2.html
and created a query like this select ${generatetablefetch.columnnames} from ${generatetablefetch.tablename} where ${generatefetchtable.whereClause} order by ${generatetablefetch.maxColumnNames} fetch first ${generatetablefetch.limit} rows only
select userid,timestamp from user11 where timestamp >= '01-01-2018 12:00:00' order by timestamp limit 1000
but i'm getting
select userid,timestamp from user11 where order by timestamp limit 1000 In the above example where condition is not taking the value. please refer the screenshots for my configuration80483-nififlow.png I think I have made halfway through this and stuck here .What is missing in this .
... View more
07-12-2018
05:38 PM
I'm using GenerateFetchtable instead of QueryDatabaseTable . I'm getting error "error during database query or conversion of records to avro" . How to resolve the above error and to Avoid duplicated records in my table.
... View more
07-12-2018
04:52 PM
nififlow.png , Attached the working flow with configurations . Everything is working except duplicates . Kindly assist
... View more
07-12-2018
04:39 PM
@Matt Burgess . I still cannot eliminate duplicates . I have changed Avro logical types to false and it allowed me to insert data into Hive with DUPLICATES . Lets assume . I have 100 records If I insert 1 record and I should be getting 101 records instead I get 201 records.Anything I'm missing right now .Kindly assist
... View more
07-12-2018
01:55 PM
Outputof the ExecuteSQL flow : "TIMESTAMP" : "2018-06-21T19:07:27.000Z" and Hive Datatype is Timestamp and not accepting the format . Is it because Maximum-value Columns aren't working..
... View more
07-12-2018
01:42 PM
@Matt Burgess Im using Maximum-value Columns as timestamp (YYYY-MM-DD hh:mm:ss) . Do you think Nifi Is unable to take the Maximum-value as a timestamp in the format as it is(YYYY-MM-DD hh:mm:ss)
... View more
07-12-2018
11:50 AM
@Matt Burgess Is that QueryDatabasetable or GeneratetableFetch. Let's say I have 100 records in my source table(DB2) . When I add one record in my source I should be getting 101 records ,instead i get around 201 records . I get duplicated records and keep on adding duplicates for every new record @Shu or @Matt Burgess Kindly assist . Im missing a small thing . Kindly clarify.nf11.png Refer attached for the flow .
... View more
07-12-2018
02:36 AM
There was no error . But I can't see data in hive . Are my configuration for flow 1 is right ? Without Queryfetchtable I'm able to put data in hive . But can't incremental load to Hadoop . I'm using NiFi 1.2 version.
... View more
07-11-2018
03:44 PM
Hi, I have 3 columns , Userid , Serial no and Timestamp column(YYYY-DD-MM HH:MM:SS) in DB2 . I would like to create a incremental load using Nifi . userid,serialno are random numbers not increased primary numbers . TS is increamental. I have tried to use FLOW 1 QueryFetchTable->ExecuteSQL->PutHIveStreaming --> logattribute (to avoid duplicates)-- Not working , No Error in the logs (refer attached pic for configuration ) nf.pngnf1.pngnf2.png I'm not able to load even single data to hive using this method as I have used max_column is TS . -- Hive table is Partitioned , Bucketed , ORC and transantional FLOW 2 ExecuteSQL->PutHiveStreaming->logattribute --> I'm able to manage the data into hive , but I will not be able to incremental load the data -- Hive table is Partitioned , Bucketed , ORC and transantional Could you please help me to setup a Simple Incremental load using any flow files . Ultimately , I would like a incremental load without duplicates Hive table is Partitioned , Bucketed , ORC and transnational I'm open to any processors in Nifi for an incremental load without duplicates . Can you please create a sample workflow with sample config and sample data @Shu , Any input please . Thanks a lot in advance . Expecting a awesome answer from you 🙂
... View more
Labels:
07-04-2018
07:44 PM
Hi @Shu I have seen your article below and tried everything it was suggested there . You have helped me alot and thanks for that . https://community.hortonworks.com/questions/177112/nifi-puthivestreaming-processor-error.html I'm stuck with the error Refer Using
QueryDatabasetable PutHiveStreaming Tables are created with same column names as like Source table , Partitioned , bucketed , and orc. Also could you please help us understand how the flow read the data and how it should be given . we are trying to connect from Db2 to Hive . Do we really need to convert inbetween from db2 to avro or anything of that sort
... View more
- Tags:
- Data Ingestion & Streaming
- nifi-hive
- nifi-processor
- nifi-streaming
- puthivestreaming
- querydatabasetable
Labels:
07-02-2018
08:37 AM
Kindly do the step by step to do in HDFS , NiFI and Hive and ZK
I need daily log rotation and delete backup more than a week .
... View more
Labels:
06-26-2018
11:39 AM
I have 8 node cluster . NIFI is installed in only one cluster . i want to use a NiFi processor to trigger shell/python script on a remote machine . Example : Machine 1 - Nifi Installed in Machine 1 Machine 4 : Script has to be executed and files supporting the script is available in Machine 4(Script cannot be moved to Nifi node) Please tell me 1.what processors should I use and how the flow should be . . 2.How to trigger Shell Script using Nifi on a remote machine . 3.How to Log the flow if possible in case of any error / failures and trigger a mail ( optional) i need this scenario for many use cases. I have googled a lot , Execute script processor is having only ( Python,Ruby,grovvy etc..) not shell script in the list of options. How to provide SSH username and key/password. @Shu
... View more
Labels: