About sunile_manjee

sunile_manjee · ‎03-13-2017

I am using NiFi for my data flow and then I kick off ETL script which runs many (hive/Pig) MR/Tez jobs. Is there easy way to detect (ie trigger) once the job has finished. Creating a trigger manually per job is not scalable since this are many jobs. Going into each job and have it create a trigger is off the table.

sunile_manjee · ‎03-13-2017

are you still getting the error?

sunile_manjee · ‎03-13-2017

please advise if this has answered your quesiton

sunile_manjee · ‎03-13-2017

HDP service logs are available in ambari log search. the back end is solr so you can pull all or only relevant info based on your requirements. Also for service level metrics, ambari stores these now in grafana.

sunile_manjee · ‎03-13-2017

if you are looking for simple storage and analytics on logs then HDFS if you are looking for low latency reads/writes on log events then phoenix/hbase for cyber security, metron+nifi+hdfs For searching on logs, solr For low latency reads/writes and searching, HBase+solr (using lily indexer)

sunile_manjee · ‎03-13-2017

Have you confirmed that error messages exist on the bulletin? Also... NiFi logs (errors) are stored in ambari solr infra instance. You can grab all errors/warnings from ambari-infra and push to postgres.

sunile_manjee · ‎03-13-2017

Please set compression codec to NONE in the putHDFS process.

sunile_manjee · ‎03-13-2017

Documentation for PutHiveStreaming requires flow file to be in avro format. My understanding is hive streaming only support ORC format. When puthivestreaming is used, does it convert avro to orc prior to inserting into hive table? trying to understand the functionality.

sunile_manjee · ‎03-13-2017

The documentation for ListFile states: If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. How does the "new" primary node pick up where the previous node left off without flow file duplication? I ask since the previous primary node may have the file flow, when new primary node is elected how does it primary node get the flow file without duplicating or cloning it?

sunile_manjee · ‎03-10-2017

I have cluster which is NOT kerberized. Is it possible to enable user impersonation for hive queries run from zeppelin? All the documentation seems to require user princple.

Online	Offline
Last Visited	‎05-25-2022 10:07 AM

Member Since	‎05-30-2018 10:40 PM
Last Visited	‎05-25-2022 10:07 AM
Posts	1,322
Kudos received	713

Cloudera Community

Re: Iterate over ADLS files using spark?

Re: Install NiFi CA service post nifi cluster inst...

Re: Which storage format is optimum for training m...

Re: Ambari custom alert failing

Re: df.cache() is not working on jdbc table

Is there a way to easily detect when a MR/Tez job ...

Re: Nifi putHdfs Error Java.lang.illegalArgumentEx...

Re: mapred.output.committer.class

Re: Hadoop Log Monitoring

Re: How to decide storage to use for logs based on...

Re: Capture Error messages from NiFi bulletins

Re: Nifi putHdfs Error Java.lang.illegalArgumentEx...

NiFi PutHiveStreaming requires Avro?

ListFile primary node change

Is Kerberos required for zeppelin hive identity pr...