About NBharadwaj

NBharadwaj · ‎09-08-2021

Hi All, I have faced some issues. We want to dedicate one of the Zeppelin node to a different agency. So, if they create any notebook, those are getting synced to every one after a restart because we are using the central storage system as HDFS. So, Zeppelin nodes are picking all notebooks and interpreter settings from the HDFS single path. To avoid this, we isolated the settings and configurations by changing Zeppelin to use the local storage, so each node will act independently. For the same, the following are helpful steps in case anyone is facing the same. Thanks. Verify the procedures in a lower environment before executing the same in production. sudo -u <zeppelin-user> mkdir -p /var/lib/zeppelin/conf /var/lib/zeppelin/notebook kinit <zeppelin-user> hdfs dfs -get "/user/zeppelin/notebook/*" /var/lib/zeppelin/notebook hdfs dfs -get /user/zeppelin/conf/notebook-authorization.json /var/lib/zeppelin/conf In the Zeppelin configuration page, search for and verify the following settings: zeppelin.notebook.storage=org.apache.zeppelin.notebook.repo.FileSystemNotebookRepo zeppelin.config.fs.dir = file:///var/lib/zeppelin/conf zeppelin.notebook.dir = file:///var/lib/zeppelin/notebook If the settings are missing or incorrect, add or update them to the above values. In the Zeppelin UI, check the Interpreters page for any duplicate entries, If any. duplicates exist: Backup and then delete the interpreter.json file from HDFS (/user/zeppelin/conf/interpreter.json) and from the local filesystem on the Zeppelin server host (/var/lib/zeppelin/conf/interpreter.json). Restart the Zeppelin service in Ambari. In the Zeppelin UI, confirm that the duplicate entries no longer exist. If any custom interpreter settings are missing, add them again from the Interpreters page. Verify that your existing notebooks are still available. Disclaimer: This article is contributed by an external user. The steps may not be verified by Cloudera and may not be applicable for all use cases and are very specific to a particular distribution. Please follow with caution and at your own risk. If needed, raise a support case to get the confirmation.

NBharadwaj · ‎06-14-2020

I have removed the unwanted fields using the ReplaceText processor and I have achieved what I want. Thanx!

NBharadwaj · ‎06-12-2020

Hello All, After Long research I found that If I use cron driven strategy the query itself triggering 3 times so I changed this to timer driven strategy then query is triggering only one time so I am getting only one flow file and this avoids duplicates to me. Thanx!

NBharadwaj · ‎04-16-2020

Hi All, I am newbie to NIFI! Today I got one problem of duplicate records, Below is the following scenario: We need to get data from Splunk to HDFS in Parquet format so we created a Data Flow with NIFI - I am getting data from Splunk using GetSplunk in JSON format and later putting in HDFS using the PutParquet processor with JsonTreeReader, AvroSchema. - I am successful in this but later I see there are duplicate records of each record, So seeking help here to fix this issue and below is the sample JSON record and PFA for the NIFI DataFlow screenshot. Thanks In Advance. Example Data Record: [ { "preview" : true, "offset" : 0, "result" : { "action" : "allowed", "app" : "", "dest" : "xxxx.xx.xxx.xxx", "dest_bunit" : "", "dest_category" : "", "dest_ip" : "xxx.xx.xxx.xxx", "dest_port" : "xxx", "dest_priority" : "", "direction" : "N/A", "duration" : "", "dvc" : "xxx.xx.xxx.xxx", "dvc_ip" : "xxx.xx.xxx.xxx", "protocol" : "HTTPS", "response_time" : "", "rule" : "/Common/ds_policy_2", "session_id" : "ad240f0634150d02", "src" : "xx.xxx.xxx.xx", "src_bunit" : "", "src_category" : "", "src_ip" : "xx.xxx.xxx.xx", "src_port" : "62858", "src_priority" : "", "tag" : "proxy,web", "usr" : "N/A", "user_bunit" : "", "user_category" : "", "user_priority" : "", "vendor_product" : "ASM", "vendor_product_uuid" : "", "ts" : "", "description" : "", "action_reason" : "", "severity" : "Informational", "user_type" : "", "service_type" : "", "dt" : "20200331", "hr" : "15" }, "lastrow" : null } ]

NBharadwaj · ‎04-02-2020

HI All, I am facing NIFI JSON to PutParquet avro schema issue, Please find below for the schema and actual incoming data format. When I use below schema I am getting data but second record fields are getting missed and table looks not good. And I want to achieve second record fields should come column by column, Can anyone help how to achieve that. SCHEMA: { "name": "MyClass", "type": "record", "namespace": "com.acme.avro", "fields": [ { "name": "preview", "type": "boolean" }, { "name": "offset", "type": "int" }, { "name": "result", "type": { "name": "result", "type": "record", "fields": [ { "name": "action", "type": "string" }, { "name": "app", "type": "string" }, { "name": "dest", "type": "string" }, { "name": "dest_bunit", "type": "string" }, { "name": "dest_category", "type": "string" }, { "name": "dest_ip", "type": "string" }, { "name": "dest_port", "type": "string" }, { "name": "dest_priority", "type": "string" }, { "name": "direction", "type": "string" }, { "name": "duration", "type": "string" }, { "name": "dvc", "type": "string" }, { "name": "dvc_ip", "type": "string" }, { "name": "protocol", "type": "string" }, { "name": "response_time", "type": "string" }, { "name": "rule", "type": "string" }, { "name": "session_id", "type": "string" }, { "name": "src", "type": "string" }, { "name": "src_bunit", "type": "string" }, { "name": "src_category", "type": "string" }, { "name": "src_ip", "type": "string" }, { "name": "src_port", "type": "string" }, { "name": "src_priority", "type": "string" }, { "name": "tag", "type": "string" }, { "name": "usr", "type": "string" }, { "name": "user_bunit", "type": "string" }, { "name": "user_category", "type": "string" }, { "name": "user_priority", "type": "string" }, { "name": "vendor_product", "type": "string" }, { "name": "vendor_product_uuid", "type": "string" }, { "name": "ts", "type": "string" }, { "name": "description", "type": "string" }, { "name": "action_reason", "type": "string" }, { "name": "severity", "type": "string" }, { "name": "user_type", "type": "string" }, { "name": "service_type", "type": "string" }, { "name": "dt", "type": "string" }, { "name": "hr", "type": "string" } ] } }, { "name": "lastrow", "type": [ "string", "null" ] } ] } DATA: [ { "preview" : true, "offset" : 0, "result" : { "action" : "allowed", "app" : "", "dest" : "xx.xxx.xx.xx", "dest_bunit" : "", "dest_category" : "", "dest_ip" : "xx.xxx.xx.xx", "dest_port" : "443", "dest_priority" : "", "direction" : "N/A", "duration" : "", "dvc" : "xx.xxx.xx.xx", "dvc_ip" : "xx.xxx.xx.xx", "protocol" : "HTTPS", "response_time" : "", "rule" : "/Common/ds_policy_2", "session_id" : "ad240f0634150d02", "src" : "xx.xxx.xx.xx", "src_bunit" : "", "src_category" : "", "src_ip" : "xx.xxx.xx.xx", "src_port" : "62858", "src_priority" : "", "tag" : "proxy,web", "usr" : "N/A", "user_bunit" : "", "user_category" : "", "user_priority" : "", "vendor_product" : "ASM", "vendor_product_uuid" : "", "ts" : "", "description" : "", "action_reason" : "", "severity" : "Informational", "user_type" : "", "service_type" : "", "dt" : "20200331", "hr" : "15" }, "lastrow" : null } ] Thanks In Advance.

Online	Offline
Last Visited	‎11-26-2021 01:30 AM

Member Since	‎10-13-2019 11:35 PM
Last Visited	‎11-26-2021 01:30 AM
Posts	6
Kudos received	1

Cloudera Community

Re: NIFI JSON to PutParquet Avro Schema Issue

Re: NIFI GetSplunk JSON to PutParquet seeing dupl...

Moving Zeppelin notebooks to local storage from HD...

Re: NIFI JSON to PutParquet Avro Schema Issue

Re: NIFI GetSplunk JSON to PutParquet seeing dupl...

NIFI GetSplunk JSON to PutParquet seeing duplicat...

NIFI JSON to PutParquet Avro Schema Issue