About sunile_manjee

sunile_manjee · ‎03-03-2017

I have flow files running through a nifi cluster. A remote process group is used to basically load balance flow files. if I added a new node to the cluster (while flow files are running, no stoppage), will nifi automatically distribute the load to the new nifi node? Does the cluster coorindator perform this function?

sunile_manjee · ‎03-02-2017

this is using ambari 2.4.2

sunile_manjee · ‎03-02-2017

I have tried to install HDP 2.5.3 on redhat 7 (aws) and it has failed due to error below: 2017-03-01 22:41:38,450 - Execution of '/usr/bin/yum -d 0 -e 0 -y install snappy-devel' returned 1. Error: Package: snappy-devel-1.0.5-1.el6.x86_64 (HDP-UTILS-1.1.0.21) Requires: snappy(x86-64) = 1.0.5-1.el6 Installed: snappy-1.1.0-3.el7.x86_64 (@anaconda/7.3) snappy(x86-64) = 1.1.0-3.el7 Available: snappy-1.0.5-1.el6.x86_64 (HDP-UTILS-1.1.0.21) snappy(x86-64) = 1.0.5-1.el6 You could try using --skip-broken to work around the problem You could try running: rpm -Va --nofiles --nodigest The work around is to uninstall snappy-1.1.0-3.el7.x86_64 sudo yum remove snappy-1.1.0-3.el7.x86_64 and then install correct package sudo yum install snappy-devel-1.0.5-1.el6.x86_64 then retry services install (ie data node) Is this known issue?

sunile_manjee · ‎03-01-2017

You can use built in DR capabilities with hbase. hbase support active/active and active/passive. as metadata changes/add are pushed to atlas (hbase), thos can be push to your atlas DR site via hbase replication. for solr you can use CDCR. more info here https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62687462

sunile_manjee · ‎03-01-2017

oh yes. you need to use ip address of the sandbox. the host file does not need to be updated unless you plan to use a dns (ie sandbox.hortonworks.com) to it. for your test use the sandbox ip. i assume all this exist on same box? if not then your windows machine needs to be able to communicate to the sandbox. open up firewalls for ip com.

sunile_manjee · ‎03-01-2017

I am not familiar with sas but for most BI tools they require the hostname for your hiveserver 2 and port which is generally 10010 or for llap 10500.

sunile_manjee · ‎02-28-2017

All suggestions above are good. Adding article on tools to use to benchmark the hardware. https://community.hortonworks.com/content/kbentry/56158/benchmark-your-hardware-for-hadoop-spark.html

sunile_manjee · ‎02-28-2017

I would go with querydatabasetable. this will provide you state and also another important feature to break up return records into flow files. for example if 1000 records are expected to be a output of query, you can set Max Rows Per Flow File to x, and process data in smaller chunks. if you use selecthiveql, then build your query using update attribute and use a state via distributed map cache (DMC). maintain the last state in DMC and use that state in your updateattribute to run query in selecthiveql. Flow DMC fetch (state field) --> update attribute (build query) --> selecthiveql You will have to set dmc to initial state value or set in logic via update attribute.

sunile_manjee · ‎02-27-2017

You can shut down all services via ambari by going to right side tool bar and select stop all. This is clean shutdown. Info here https://docs.hortonworks.com/HDPDocuments/Ambari-2.1.0.0/bk_Ambari_Users_Guide/content/_starting_and_stopping_all_services.html Via api info here: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=41812517

sunile_manjee · ‎02-26-2017

yes you can do this but first you will need to install oozie client on the nifi nodes. This will become easier once there is 1 ambari managing HDF and HDP. However I would recommend using NiFi to ingest and stream data (using hive streaming processor) into hive tables or just use the putHiveQL. why? the operational capabilities in nifi (back pressure, data linage, event replay, stats on performance) you simply don't get with oozie. Lastly You can reuse this common processing logic or isolate in other terms but using nifi process group.

Online	Offline
Last Visited	‎05-25-2022 10:07 AM

Member Since	‎05-30-2018 10:40 PM
Last Visited	‎05-25-2022 10:07 AM
Posts	1,322
Kudos received	713

Cloudera Community

Re: Iterate over ADLS files using spark?

Re: Install NiFi CA service post nifi cluster inst...

Re: Which storage format is optimum for training m...

Re: Ambari custom alert failing

Re: df.cache() is not working on jdbc table

Load distribution of nifi flow files adding new no...

Re: HDP 2.5.3 install failing on Redhat 7

HDP 2.5.3 install failing on Redhat 7

Re: Evaluate storage option for Atlas meta data

Re: How to connect a BI tool to a HiveServer2 inst...

Re: How to connect a BI tool to a HiveServer2 inst...

Re: bench-marking and performance matrix test case...

Re: NIFI: SelectHiveQL vs QueryDatabaseTable

Re: Shutdown the Hadoop Cluster Cleanly

Re: Nifi as complete data flow, start to end? Is i...