Member since
05-28-2019
46
Posts
16
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6728 | 01-04-2017 07:03 PM | |
1396 | 01-04-2017 06:00 PM |
02-24-2022
01:04 PM
@HaiderNaveed As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post. Thanks!
... View more
05-17-2020
08:41 PM
Hi @kettle
As this thread was marked 'Solved' in June of 2016 you would have a better chance of receiving a useful response by starting a new thread. This will also provide you with the opportunity to provide details specific to your use of the PutSQL processor and/or Phoenix that could aid others in providing a more tailored answer to your question.
... View more
10-02-2017
07:52 PM
3 Kudos
The use case is to get the updates using timestamp column from multiple Databases and do some simple transformations in nifi and stream the data into hive transactional tables using puthivestreaming processor in nifi. If not careful, this can easily lead to platform wide instability for hive.
ACID is for slowly changing tables and not for 100s of concurrent queries trying to update the same partition. ACID tables are bucketed tables. Have correct number of buckets and have uniform data distribution among buckets. Can easily lead to data skew and only one CPU core writes to a single bucket. Transactional manager and lock manager are stored in hive metastore. Transactional manager keeps the transactional state(open, commit and abort) and lock manager maintains the necessary locks for transactional tables. The recommendation is to separate Hive, oozie and ambari database and configure high availability for databases. Nifi can overwhelm hive ACID tables. Nifi will stream data using hive streaming API available with puthivestreaming processor. Default value for timer driven scheduling in nifi processors is 0 which will cause a hit on the hive metastore. The recommendation is to microbatch the data from nifi with scheduling time around 1 min or more(higher the better). Batching 5000-10000 records gave the best throughput. Compaction of the table necessary for ACID read performance tuning. Compaction can be automatic or on-demand. Make sure to enable email alerting on hive database when the count reached a threshold of around 100,000 transactions. hive=# select count(*) from TXNS where txn_state='a' and TXN_ID not in (select tc_txnid from txn_components);
count
------
3992
(1 row) One of the data source was overwhelming the metastore. After proper batching and scheduling, the metastore was able to clean by itself. Optimize the hive metastore DB for performance tuning. 10/02/2017 (7pm):
hive=# select txn_user, count(*) from txns where txn_state='a' group by txn_user ;
txn_user | count
hive | 73763
nifi | 1241297
(2 rows)
10/02/2017 (9am):
hive=# select txn_user, count(*) from txns where txn_state='a' group by txn_user ;
txn_user | count
hive | 58794
nifi | 26962
(2 rows)
... View more
Labels:
09-01-2017
01:43 AM
This article helps with configuration of HDF processors to integrate with HDP components
Integration for puthivestreaming processor:
Pre-requisites: In HDP ambari, enable the below properties for hive.
hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.compactor.initiator.on = true
hive.compactor.worker.threads > 0
BUG: HiveStreaming processor not picking the updated hive-meta store principal from hive-site. xml
Resolution:
1. Copy hive-site.xml, core-site.xml, hdfs-site.xml to the conf directory of NiFi
2. Clear the Hive Configuration Resources property
3. Create an ExecuteStream processor on the canvas scheduled to run as often as the ticket needs to be refreshed (every hour should do for most setups) with the following groovy script, replacing nifi@HDF.COM with your principal and /etc/nifi.headless.keytab with your keytab
Steps: Create /etc/hdp/tgt.groovy and copy the code below into the file on all nifi nodes.
Note: Change the kerberos principal and keytab location as necessary
import org.apache.nifi.nar.NarClassLoader
import org.apache.nifi.nar.NarClassLoaders NarClassLoaders.instance.extensionClassLoaders.each { c ->
if (c instanceof NarClassLoader && c.workingDirectory. absolutePath.contains('nifi-hive')) {
def originalClassloader = Thread.currentThread().
getContextClassLoader();
Thread.currentThread().setContextClassLoader(c); try {
def configClass = c.loadClass('org.apache.hadoop.conf. Configuration', true)
def hiveConfigurator = c.loadClass('org.apache.nifi.util. hive.HiveConfigurator', true).newInstance();
def config = hiveConfigurator.getConfigurationFromFiles('')
hiveConfigurator.preload(config)
c.loadClass('org.apache.hadoop.security.UserGroupInformation'
, true).getMethod('setConfiguration', configClass).invoke(null, config)
c.loadClass('org.apache.hadoop.security.UserGroupInformation' , true).getMethod('loginUserFromKeytab', String.class, String.clas s).invoke(null, 'nifi@HDF.NET', '/etc/security/keytabs/nifi. headless.keytab')
log.info('Successfully logged in')
session.transfer(session.create(), REL_SUCCESS) } catch (Exception e) {
log.error('Unable to login with keytab', e)
session.transfer(session.create(), REL_FAILURE) } finally {
Thread.currentThread().setContextClassLoader
(originalClassloader);
} }
}
Do the following on all the nifi nodes,
chown nifi:nifi /etc/hdp/tgt.groovy; chmod +x /etc/hdp/tgt.groovy;
cp /etc/hdp/hive-site.xml /etc/nifi/conf/; cp /etc/hdp/core-site.xml /etc/nifi/conf/; cp /etc/hdp /hdfs-site.xml /etc/nifi/conf/;
chown nifi:nifi /etc/nifi/conf/hive-site.xml /etc/nifi/conf/core-site.xml /etc/nifi/conf/hdfs-site. xml;
Integration for hbase processor:
Add hostname and IP of all HDP nodes to /etc/hosts file in all nifi nodes
<Hostname1> <IpAddress1>
<Hostname2> <IpAddress2>
Integration for kafka processor:
Create a file under the location "/etc/hdp/zookeeper-jaas.conf" and copy the code below
Note: Change the kerberos principal and keytab location as necessary
Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="/etc/security/keytabs/nifi.headless.keytab" storeKey=true
useTicketCache=false
principal="nifi@HDF.COM";
};
KafkaClient { com.sun.security.auth.module.Krb5LoginModule required useTicketCache=true
renewTicket=true
serviceName="kafka"
useKeyTab=true keyTab="/etc/security/keytabs/nifi.headless.keytab" principal="nifi@HDF.COM";
};
Add below configuration to advanced nifi-bootstrap.conf in HDP ambari and restart the nifi service.
java.arg.20=-Djava.security.auth.login.config=/etc/hdp/zookeeper-jaas.conf
... View more
Labels:
10-12-2018
02:44 PM
This is complex I believe your problem is you need to forward the traffic to/from the KDC to your Mac. You can do this by SSH tunnelling. That alone is not enough though since SSH port forwarding is only fit for TCP traffic and KDC traffic is UDP.
... View more
03-13-2017
04:55 AM
Worked. Thanks
... View more
11-27-2017
11:16 PM
Your character set is incorrect on the url: It should be specified as connection string - jdbc:teradata://<host>/database=<db>,CLIENT_CHARSET=ISO8859_1' charset is not processed the same as "client_charset" - see this reference for all character sets for Teradata. https://developer.teradata.com/doc/connectivity/jdbc/reference/current/jdbcug_chapter_1.html
... View more
10-31-2017
04:38 PM
Hi balaji what is the status of this Jira. did you have any referance to that jira I can refer it ?
... View more
12-06-2017
08:58 AM
What are the maven commands that need to be run before vagrant up, I use mvn clean compile -DskipTests in the top metron directory
... View more
01-09-2016
08:46 AM
You can check @Randy Gelhausen's docker-ambari repo Also check this webinar recording to see a demo http://hortonworks.com/partners/learn/#dev
... View more