About andrewg

andrewg · ‎12-30-2015

Your wrapper script will be generating a sqoop command and substituting values when invoking sqoop multiple times. Use any scripting language you are comfortable with and which can access your RDBMS.

andrewg · ‎12-26-2015

Vlad, I think there's a misunderstanding here. See if I understood the problem below. In the GetHTTP the Filename is something you assign. You can use an expression, but that also means that you must invoke the UUID() function to generate the id. It doesn't exist yet. The ${uuid} syntax reads an existing UUID. It will work down the line in your data flow, but it will not generate any id for you. In your case, it is correct to use the UUID() function.

andrewg · ‎12-23-2015

I would highly recommend chroot'ing the SolrCloud config, otherwise it dumps all entries at the root of a ZooKeeper tree. See https://community.hortonworks.com/content/kbentry/7081/best-practice-chroot-your-solr-cloud-in-zookeeper.html for details.

andrewg · ‎12-22-2015

With EC2 your challenge is the IP addresses. They will change on restart/reboot of a node, and you need to make them all static. There's no 'how to have core-site.xml update automatically', rather 'how to ensure my node doesn't move around the network on reboot'. Hope this helps.

andrewg · ‎12-21-2015

@Artur Bukowski - with large volumes of data and in this classic setup of RDBMS->HDFS you might be better off with sqoop, i.e. if your goal is to move those large datasets in parallel. If you are after data provenance of those datasets, then NiFi will be a better fit.

andrewg · ‎12-21-2015

When connecting remotely via JDBC the function library should be accessible to this environment. It's not the same as hive cli. What tier are you getting this error in? Are you using hiveserver2? You would need to ensure those middle tiers have your custom jar as well.

andrewg · ‎12-21-2015

Hi, everything is in the error - verify that you can ping the Ambari host (by name) from the node where you're installing the agent. Usually some edits in /etc/hosts help. If you are performing a manual install of the agent, verify the ambari host it's pointing to in its ini file.

andrewg · ‎12-18-2015

list-and-fetch-sftp-templatexml.zipTry the attached template. Also check out the repo our team is populating here: https://community.hortonworks.com/repos/6119/apache-nifi-template-rpo.html

andrewg · ‎12-17-2015

These instructions assume you don't need to preserve Solr indexes. If you do, modify ZK commands to move nodes instead of removing them. Stop every SolrCloud node: # on every SolrCloud node su - solr cd /opt/lucidworks-hdpsearch/solr/bin/ ./solr stop -all Connect to your ZooKeeper quorum and run zkCli shell: su - zookeeper cd /usr/hdp/current/zookeeper-client/bin/ # point it to a ZK quorum (or just a single ZK server is ok, e.g. localhost) ./zkCli.sh -server lake02:2181,lake03:2181,lake04:2181 If you followed best practices and chroot'ed your SolrCloud already, then things are easy (and you probably are done by now): # in ZK cli shell rmr /solr More often this wasn't the case, so perform the following operations on your ZK tree: # in ZK cli shell rmr /clusterstate.json rmr /aliases.json rmr /live_nodes rmr /overseer rmr /overseer_elect rmr /collections Next, follow this best practices article, create a new ZK home for your SolrCloud cluster and start it up in chroot'ed mode.

andrewg · ‎12-17-2015

When running Solr in a clustered mode (SolrCloud), it has a runtime dependency on a ZooKeeper, where it stores configs, coordinates leader election, tracks replicas allocation, etc. All-in-all, there's a whole tree of ZK nodes created with sub-nodes. Deploying SolrCloud into a Hadoop cluster usually means re-using the centralized ZK quorum already maintained by HDP. Unfortunately, if not explicitly taken care of, SolrCloud will happily dump all its ZK content in ZK root, which really complicates things for an admin down the line. If you need to clean up your ZK first, take a look at this how-to. Solution is to put all SolrCloud ZK entries under its own ZK node (e.g. /solr). Here's how one does it: su - zookeeper cd /usr/hdp/current/zookeeper-client/bin/ # point it a ZK quorum (or just a single ZK server is ok, e.g. localhost) ./zkCli.sh -server lake02:2181,lake03:2181,lake04:2181 # in zk shell now # note the empty brackets are _required_ create /solr [] # verify the zk node has been created, must not complain the node doesn't exist ls /solr quit # back in the OS shell # start SolrCloud and tell it which ZK node to use su - solr cd /opt/lucidworks-hdpsearch/solr/bin/ # note how we add '/solr' to a ZK quorum address. # it must be added to the _last_ ZK node address # this keeps things organized and doesn't pollute root ZK tree with Solr artifacts ./solr start -c -z lake02:2181,lake03:2181,lake04:2181/solr # alternatively, if you have multiple IPs on your Hadoop nodes and have # issues accessing Solr UI and dashboards, try binding it to an address explicitly: ./solr start -c -z lake02:2181,lake03:2181,lake04:2181/solr -h $HOSTNAME

Online	Offline
Last Visited	‎11-29-2021 04:12 PM

Member Since	‎07-30-2019 11:14 AM
Last Visited	‎11-29-2021 04:12 PM
Posts	333
Kudos received	330

Cloudera Community

Re: getfile : nifi does not have sufficient permi...

Re: Back pressure settings not Honored when a Funn...

Re: Urgent need for ListSFTP & FetchSFTP working e...

Re: Raise alert from NiFi if file not available fr...

Re: NiFi: PutHiveQL reflect UDF not working

Re: Sqoop - dynamically import from SQL server

Re: NiFi GetHTTP processor's uuid attribute

Re: Index Documents using HDPSearch in HDP 2.3

Re: Best Practices -how to stop and start Ambari s...

Re: What is the most recommended way (best practic...

Re: Error executing a hive UDF through jbdc

Re: Error while Installing HDFS using Ambari tool.

Re: NiFi - How to use the GetSFTP processor withou...

How-To: Cleanup SolrCloud entries in ZooKeeper

Best Practice: 'chroot' your Solr Cloud in ZooKeep...