Member since
07-30-2019
333
Posts
356
Kudos Received
76
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
9632 | 02-17-2017 10:58 PM | |
2193 | 02-16-2017 07:55 PM | |
7779 | 12-21-2016 06:24 PM | |
1695 | 12-20-2016 01:29 PM | |
1202 | 12-16-2016 01:21 PM |
12-30-2015
03:48 PM
Your wrapper script will be generating a sqoop command and substituting values when invoking sqoop multiple times. Use any scripting language you are comfortable with and which can access your RDBMS.
... View more
12-26-2015
06:56 PM
Vlad, I think there's a misunderstanding here. See if I understood the problem below. In the GetHTTP the Filename is something you assign. You can use an expression, but that also means that you must invoke the UUID() function to generate the id. It doesn't exist yet. The ${uuid} syntax reads an existing UUID. It will work down the line in your data flow, but it will not generate any id for you. In your case, it is correct to use the UUID() function.
... View more
12-23-2015
03:20 PM
I would highly recommend chroot'ing the SolrCloud config, otherwise it dumps all entries at the root of a ZooKeeper tree. See https://community.hortonworks.com/content/kbentry/7081/best-practice-chroot-your-solr-cloud-in-zookeeper.html for details.
... View more
12-22-2015
01:43 PM
2 Kudos
With EC2 your challenge is the IP addresses. They will change on restart/reboot of a node, and you need to make them all static. There's no 'how to have core-site.xml update automatically', rather 'how to ensure my node doesn't move around the network on reboot'. Hope this helps.
... View more
12-21-2015
02:02 PM
1 Kudo
@Artur Bukowski - with large volumes of data and in this classic setup of RDBMS->HDFS you might be better off with sqoop, i.e. if your goal is to move those large datasets in parallel. If you are after data provenance of those datasets, then NiFi will be a better fit.
... View more
12-21-2015
01:24 PM
When connecting remotely via JDBC the function library should be accessible to this environment. It's not the same as hive cli. What tier are you getting this error in? Are you using hiveserver2? You would need to ensure those middle tiers have your custom jar as well.
... View more
12-21-2015
01:21 PM
Hi, everything is in the error - verify that you can ping the Ambari host (by name) from the node where you're installing the agent. Usually some edits in /etc/hosts help. If you are performing a manual install of the agent, verify the ambari host it's pointing to in its ini file.
... View more
12-18-2015
10:32 PM
list-and-fetch-sftp-templatexml.zipTry the attached template. Also check out the repo our team is populating here: https://community.hortonworks.com/repos/6119/apache-nifi-template-rpo.html
... View more
12-17-2015
03:32 PM
3 Kudos
These instructions assume you don't need to preserve Solr indexes. If you do, modify ZK commands to move nodes instead of removing them. Stop every SolrCloud node: # on every SolrCloud node
su - solr
cd /opt/lucidworks-hdpsearch/solr/bin/
./solr stop -all
Connect to your ZooKeeper quorum and run zkCli shell: su - zookeeper
cd /usr/hdp/current/zookeeper-client/bin/
# point it to a ZK quorum (or just a single ZK server is ok, e.g. localhost)
./zkCli.sh -server lake02:2181,lake03:2181,lake04:2181 If you followed best practices and chroot'ed your SolrCloud already, then things are easy (and you probably are done by now): # in ZK cli shell
rmr /solr More often this wasn't the case, so perform the following operations on your ZK tree: # in ZK cli shell
rmr /clusterstate.json
rmr /aliases.json
rmr /live_nodes
rmr /overseer
rmr /overseer_elect
rmr /collections
Next, follow this best practices article, create a new ZK home for your SolrCloud cluster and start it up in chroot'ed mode.
... View more
Labels:
12-17-2015
03:05 PM
1 Kudo
When running Solr in a clustered mode (SolrCloud), it has a runtime dependency on a ZooKeeper, where it stores configs, coordinates leader election, tracks replicas allocation, etc. All-in-all, there's a whole tree of ZK nodes created with sub-nodes. Deploying SolrCloud into a Hadoop cluster usually means re-using the centralized ZK quorum already maintained by HDP. Unfortunately, if not explicitly taken care of, SolrCloud will happily dump all its ZK content in ZK root, which really complicates things for an admin down the line. If you need to clean up your ZK first, take a look at this how-to. Solution is to put all SolrCloud ZK entries under its own ZK node (e.g. /solr). Here's how one does it: su - zookeeper
cd /usr/hdp/current/zookeeper-client/bin/
# point it a ZK quorum (or just a single ZK server is ok, e.g. localhost)
./zkCli.sh -server lake02:2181,lake03:2181,lake04:2181
# in zk shell now
# note the empty brackets are _required_
create /solr []
# verify the zk node has been created, must not complain the node doesn't exist
ls /solr
quit
# back in the OS shell
# start SolrCloud and tell it which ZK node to use
su - solr
cd /opt/lucidworks-hdpsearch/solr/bin/
# note how we add '/solr' to a ZK quorum address.
# it must be added to the _last_ ZK node address
# this keeps things organized and doesn't pollute root ZK tree with Solr artifacts
./solr start -c -z lake02:2181,lake03:2181,lake04:2181/solr
# alternatively, if you have multiple IPs on your Hadoop nodes and have
# issues accessing Solr UI and dashboards, try binding it to an address explicitly:
./solr start -c -z lake02:2181,lake03:2181,lake04:2181/solr -h $HOSTNAME
... View more
Labels: