Member since
10-20-2017
63
Posts
0
Kudos Received
0
Solutions
09-22-2018
11:32 AM
Hello everyone, I'm trying to configure Log4j for HDFS Namenode and Security Audit logs. In either way Size based rotation or Daily Rotation I should be able to rotate and delete the logs. However It is not working as expected. Can someone send me the best configurations in Log4j for HDFS NN/Audit Logs.Attaching the Log4j configurations . log4j.txt
Help me to delete the logs once it reached 10 Days .I would like to Gzip the older files. Please find the below screenshot for the size consumed on the disk We would like to remove Logs based on Size for Audit logs .
... View more
09-19-2018
06:58 PM
@Shu Could you please check on this
... View more
09-19-2018
01:46 AM
Attaching the right flow:hbase-iud.xml
... View more
09-18-2018
09:10 PM
Hello, We are trying to handle Inserts/Updates/Deletes in Nifi . Currently attached the flow which is taking longer time to process the records and inserts into Hbase . I need to use Compositeprimarykey(ServerSerialno,ServerName) .Based on the CompositeKey will do the Inserts/Updates/Deletes.Need to boost the performance by altering the flow/configs.Kindly assist me to use the right flow to achieve this. Attached the flow .
... View more
Labels:
- Labels:
-
Apache NiFi
09-18-2018
07:40 PM
@Shu Is it possible to add two RowIdentifier/RowIdentifier Field name in PutHbaseJson , Like ServerName,ServerNo or ${ServerName},${ServerNo} .
... View more
09-18-2018
05:54 AM
Hello everybody, I'm using Routetext processor to separate JSON records to different flow.Ultimately I need Flow1 should be inserted into HBASE table first , Flow 2 should be executed second and so on.How I can achieve this. Basically Inserts should insert data to HBase first,Updates should happen next and Delete should happen last . Attached the Sample test.xml
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache NiFi
09-04-2018
06:22 PM
Hi I need to load the JSON data into HBASE on the below conditions . If the record(composite primary key)(servername+serverid) is matched then update/delete the record , If not matched Insert the record. How Can I do this in Nifi with Hbase for larger dataset. forecasted data size would be 50-100TB approx for the next one year. optional if at possible , I need to discard the invalid entries from JSON on the flow using regular exp . Where I need to discard the entries and those discarded values creates empty space . Kindly let me know how to remove those empty space after removing the blank values. If possible How to avoid JSON breaking into bad values . I couldn't find good tutorial to Insert/Update in HBASE. Attached sample data test.txt
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache NiFi
08-30-2018
06:15 PM
Accepeted the answer for his answer . But didn't work out well.:(
... View more
08-30-2018
04:57 AM
I'm currently trying to remove the blank line after discarding the lines in Nifi Removing those lines in Nifi {"eventType":"delete","ServerSerial":"0","ServerName":"","deletedat":"2018-08-24 17:56:34.944"} Since the serial number is 0 and empty servername . After removing there will be a blank space created in the final output file I want to replace the blanks space
"eventType":"delete","ServerSerial":"1556562030","ServerName":"XYZ_U_O","deletedat":"2018-08-24 17:56:39.974"} {"eventType":"delete","ServerSerial":"0","ServerName":"","deletedat":"2018-08-24 17:56:34.944"}
{"eventType":"delete","ServerSerial":"0","ServerName":"","deletedat":"2018-08-24 17:56:34.944"}
So I'm currently using ReplaceText to search for the value : {"eventType":"delete","ServerSerial":"0","ServerName":""[^}]+},?
and replacing with empty spaces . It failed to remove the blank lines Replaced with ^[ \t]*$\r?\n --> Failed to remove . Replaced with \r\n --> Failed to remove . Replace value is not considered the regex value . What shall I do . How can i replace blank lines after removing the lines based on the Search value.
... View more
Labels:
- Labels:
-
Apache NiFi
08-24-2018
07:30 PM
@Shu When using for larger dataset , The merge is taking longer time to complete . Final Table is having 150GB added every day . So Scanning the final table and adding updates is really taking more than an hour . Any other alternative approach .
... View more
08-24-2018
07:02 PM
We are receiving Hourly JSON data into HDFS. The size of the data would be 7GB per hour . when matched record found on the final table then Update (or) Delete if the record not matched in the final dataset then insert the record.
What is best way to do Upserts(Updates and Inserts in Hadoop) for large dataset. Hive or HBase or Nifi . What is flow . Can anyone help us in the flow .updates.txt
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Hive
-
Apache NiFi
08-23-2018
07:01 PM
Hi @Shu I did try your steps , i have few queries Answer 1: I cannot insert the partition key if the source and target tables are partitioned Updating partition not supported --> Error: Error while compiling statement: FAILED: SemanticException [Error 10292]: Updating values of partition columns is not supported (state=42000,code=10292) Answer2: 1.Raw data is in JSON and landing to HDFS
2.For Merging we are converting the data into ORC,transaction, bucketed 3. With ORC, How I can use Input_file_name . It is not possible to merge with JSON raw data files right . Also , Could you please highlight on point 3&4 .
... View more
08-21-2018
06:55 PM
Hi @Shu I followed your tutorial here: https://community.hortonworks.com/articles/191760/create-dynamic-partitions-based-on-flowfile-conten.html I'm stuck with the error attached . Also , Is it possible to do update+insert in the flow as mentioned the URL above using PutHiveQL. Am able to execute Merge option in Hiveserver2. However due to large amount of data Merge is not working for huge dataset I'm checking using the URL above , can I achieve the below use case nifi-predicateerror.jpg
when matched record found on the final table then define which action you need to take either Update (or) Delete if the record not matched in the final dataset then insert the record.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache NiFi
08-17-2018
09:04 AM
capture1.png@Jay Kumar SenSharma . Everything is configured properly in the Ambari . However it is not working as expected . Please refer attached screenshot . you will understand and not sure why it is not removing the older logs which is more than 5 days as configured capture.png
... View more
08-17-2018
08:01 AM
Hiveserver 2 log4j settings for daily rotation and delete more than 10 days logs . Can anyone share the config file to do that. I have already configured that ( attached ) but it is not working as expected . So I would like to know where exactly we need to configure it (Advanced-hive-execlog4j,Advanced hive-log4j,Advanced hive-log4j2,Advanced beeline-log4j2 ) we are usinglog.txt beeline hs2.HA enabled HS2.d
... View more
Labels:
- Labels:
-
Apache Hive
07-13-2018
02:16 PM
from Generatetablefetch I'm able to get the flow file . Upon running the flow, the select query failed to execute on DB2. On investigating we found that the query generated by GenerateTableFetch looked like this select userid,timestamp from user11 where timestamp<='01-01-2018 12:00:00' order by timestamp limit 10000 And I have used the Nifi Expression language as per the https://community.hortonworks.com/articles/167733/using-generatetablefetch-against-db2.html
and created a query like this select ${generatetablefetch.columnnames} from ${generatetablefetch.tablename} where ${generatefetchtable.whereClause} order by ${generatetablefetch.maxColumnNames} fetch first ${generatetablefetch.limit} rows only
select userid,timestamp from user11 where timestamp >= '01-01-2018 12:00:00' order by timestamp limit 1000
but i'm getting
select userid,timestamp from user11 where order by timestamp limit 1000 In the above example where condition is not taking the value. please refer the screenshots for my configuration80483-nififlow.png I think I have made halfway through this and stuck here .What is missing in this .
... View more
07-12-2018
05:38 PM
I'm using GenerateFetchtable instead of QueryDatabaseTable . I'm getting error "error during database query or conversion of records to avro" . How to resolve the above error and to Avoid duplicated records in my table.
... View more
07-12-2018
04:52 PM
nififlow.png , Attached the working flow with configurations . Everything is working except duplicates . Kindly assist
... View more
07-12-2018
04:39 PM
@Matt Burgess . I still cannot eliminate duplicates . I have changed Avro logical types to false and it allowed me to insert data into Hive with DUPLICATES . Lets assume . I have 100 records If I insert 1 record and I should be getting 101 records instead I get 201 records.Anything I'm missing right now .Kindly assist
... View more
07-12-2018
01:55 PM
Outputof the ExecuteSQL flow : "TIMESTAMP" : "2018-06-21T19:07:27.000Z" and Hive Datatype is Timestamp and not accepting the format . Is it because Maximum-value Columns aren't working..
... View more
07-12-2018
01:42 PM
@Matt Burgess Im using Maximum-value Columns as timestamp (YYYY-MM-DD hh:mm:ss) . Do you think Nifi Is unable to take the Maximum-value as a timestamp in the format as it is(YYYY-MM-DD hh:mm:ss)
... View more
07-12-2018
11:50 AM
@Matt Burgess Is that QueryDatabasetable or GeneratetableFetch. Let's say I have 100 records in my source table(DB2) . When I add one record in my source I should be getting 101 records ,instead i get around 201 records . I get duplicated records and keep on adding duplicates for every new record @Shu or @Matt Burgess Kindly assist . Im missing a small thing . Kindly clarify.nf11.png Refer attached for the flow .
... View more
07-12-2018
02:36 AM
There was no error . But I can't see data in hive . Are my configuration for flow 1 is right ? Without Queryfetchtable I'm able to put data in hive . But can't incremental load to Hadoop . I'm using NiFi 1.2 version.
... View more
07-11-2018
03:44 PM
Hi, I have 3 columns , Userid , Serial no and Timestamp column(YYYY-DD-MM HH:MM:SS) in DB2 . I would like to create a incremental load using Nifi . userid,serialno are random numbers not increased primary numbers . TS is increamental. I have tried to use FLOW 1 QueryFetchTable->ExecuteSQL->PutHIveStreaming --> logattribute (to avoid duplicates)-- Not working , No Error in the logs (refer attached pic for configuration ) nf.pngnf1.pngnf2.png I'm not able to load even single data to hive using this method as I have used max_column is TS . -- Hive table is Partitioned , Bucketed , ORC and transantional FLOW 2 ExecuteSQL->PutHiveStreaming->logattribute --> I'm able to manage the data into hive , but I will not be able to incremental load the data -- Hive table is Partitioned , Bucketed , ORC and transantional Could you please help me to setup a Simple Incremental load using any flow files . Ultimately , I would like a incremental load without duplicates Hive table is Partitioned , Bucketed , ORC and transnational I'm open to any processors in Nifi for an incremental load without duplicates . Can you please create a sample workflow with sample config and sample data @Shu , Any input please . Thanks a lot in advance . Expecting a awesome answer from you 🙂
... View more
Labels:
- Labels:
-
Apache NiFi
07-04-2018
07:44 PM
Hi @Shu I have seen your article below and tried everything it was suggested there . You have helped me alot and thanks for that . https://community.hortonworks.com/questions/177112/nifi-puthivestreaming-processor-error.html I'm stuck with the error Refer Using
QueryDatabasetable PutHiveStreaming Tables are created with same column names as like Source table , Partitioned , bucketed , and orc. Also could you please help us understand how the flow read the data and how it should be given . we are trying to connect from Db2 to Hive . Do we really need to convert inbetween from db2 to avro or anything of that sort
... View more
- Tags:
- Data Ingestion & Streaming
- nifi-hive
- nifi-processor
- nifi-streaming
- puthivestreaming
- querydatabasetable
Labels:
- Labels:
-
Apache Hive
-
Apache NiFi
07-02-2018
08:37 AM
Kindly do the step by step to do in HDFS , NiFI and Hive and ZK
I need daily log rotation and delete backup more than a week .
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop
-
Apache Hive
06-26-2018
11:39 AM
I have 8 node cluster . NIFI is installed in only one cluster . i want to use a NiFi processor to trigger shell/python script on a remote machine . Example : Machine 1 - Nifi Installed in Machine 1 Machine 4 : Script has to be executed and files supporting the script is available in Machine 4(Script cannot be moved to Nifi node) Please tell me 1.what processors should I use and how the flow should be . . 2.How to trigger Shell Script using Nifi on a remote machine . 3.How to Log the flow if possible in case of any error / failures and trigger a mail ( optional) i need this scenario for many use cases. I have googled a lot , Execute script processor is having only ( Python,Ruby,grovvy etc..) not shell script in the list of options. How to provide SSH username and key/password. @Shu
... View more
Labels:
- Labels:
-
Apache NiFi
-
Cloudera DataFlow (CDF)
06-25-2018
01:44 PM
Hi, Unable to start the Spark Livy Server 2 ExecutionFailed: Execution of 'curl -s -o /dev/null -w'%{http_code}' --negotiate -u: -k http://random_host.net:8999/sessions | grep 200 ' returned 1. curl: option --negotiate: the installed libcurl version doesn't support this
curl: try 'curl --help' or 'curl --manual' for more information
) I have tried to install libcurl . but it has latest package # yum install libcurl
Loaded plugins: fastestmirror, langpacks
Package matching libcurl-7.29.0-35.el7.centos.x86_64 already installed. Checking for update.
Nothing to do Ambari Says it is started and running .But the error wont go. Refer image attached. The port is opened and running in the background: netstat -plant | grep 8999tcp6
0 0 :::8999 :::* LISTEN 4677/java I have Spark2 livy server in two machines. 1.Machine 1 Spark livy server 2 is running. 2.Machine 2 Spark Livy server 2 is running as per Ambari and terminal . However Ambari says error. I have verified both the machines logs , it shows both error are similar (ie Hadoop folder missing , java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.) n(java.io.FileNotFoundException and File does not exist: /livy2-recovery/v1/batch/st ) the error im getting from 2nd machine not from the 1st machine. I have disabled the alert and enabled the alert . still it doesn't work.I have ambari agent restarted twice . . Still it doesn't work. sp.png Attached log : sp.txt
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Spark
06-10-2018
04:09 AM
Thanks Sandeep . What about adding two instances through ambari . In Hiveserver 2 Node 1: I have few tables lets say , Table1 and table 2 . after enabled the hiveserver 2 in node 2 . I can't find the hiveserver 2 folder in the node2 machine to make the configuration edits in hive-site.xml and as well as I can't view the tables from Node 1 in node 2 Hive server 2 machine . Do I need to configure anything apart from adding HS2 in node 2.Have restarted the services . 1) Do i need to configure Hive remote metastore to view the tables in node 1 . If so ,Kindly provide me a link please.
... View more