About nbalaji-elangov

DianaTorres · ‎02-24-2022

@HaiderNaveed As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post. Thanks!

ask_bill_brooks · ‎05-17-2020

Hi @kettle As this thread was marked 'Solved' in June of 2016 you would have a better chance of receiving a useful response by starting a new thread. This will also provide you with the opportunity to provide details specific to your use of the PutSQL processor and/or Phoenix that could aid others in providing a more tailored answer to your question.

nbalaji-elangov · ‎10-02-2017

The use case is to get the updates using timestamp column from multiple Databases and do some simple transformations in nifi and stream the data into hive transactional tables using puthivestreaming processor in nifi. If not careful, this can easily lead to platform wide instability for hive. ACID is for slowly changing tables and not for 100s of concurrent queries trying to update the same partition. ACID tables are bucketed tables. Have correct number of buckets and have uniform data distribution among buckets. Can easily lead to data skew and only one CPU core writes to a single bucket. Transactional manager and lock manager are stored in hive metastore. Transactional manager keeps the transactional state(open, commit and abort) and lock manager maintains the necessary locks for transactional tables. The recommendation is to separate Hive, oozie and ambari database and configure high availability for databases. Nifi can overwhelm hive ACID tables. Nifi will stream data using hive streaming API available with puthivestreaming processor. Default value for timer driven scheduling in nifi processors is 0 which will cause a hit on the hive metastore. The recommendation is to microbatch the data from nifi with scheduling time around 1 min or more(higher the better). Batching 5000-10000 records gave the best throughput. Compaction of the table necessary for ACID read performance tuning. Compaction can be automatic or on-demand. Make sure to enable email alerting on hive database when the count reached a threshold of around 100,000 transactions. hive=# select count(*) from TXNS where txn_state='a' and TXN_ID not in (select tc_txnid from txn_components); count ------ 3992 (1 row) One of the data source was overwhelming the metastore. After proper batching and scheduling, the metastore was able to clean by itself. Optimize the hive metastore DB for performance tuning. 10/02/2017 (7pm): hive=# select txn_user, count(*) from txns where txn_state='a' group by txn_user ; txn_user | count hive | 73763 nifi | 1241297 (2 rows) 10/02/2017 (9am): hive=# select txn_user, count(*) from txns where txn_state='a' group by txn_user ; txn_user | count hive | 58794 nifi | 26962 (2 rows)

nbalaji-elangov · ‎09-01-2017

This article helps with configuration of HDF processors to integrate with HDP components Integration for puthivestreaming processor: Pre-requisites: In HDP ambari, enable the below properties for hive. hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager hive.compactor.initiator.on = true hive.compactor.worker.threads > 0 BUG: HiveStreaming processor not picking the updated hive-meta store principal from hive-site. xml Resolution: 1. Copy hive-site.xml, core-site.xml, hdfs-site.xml to the conf directory of NiFi 2. Clear the Hive Configuration Resources property 3. Create an ExecuteStream processor on the canvas scheduled to run as often as the ticket needs to be refreshed (every hour should do for most setups) with the following groovy script, replacing nifi@HDF.COM with your principal and /etc/nifi.headless.keytab with your keytab Steps: Create /etc/hdp/tgt.groovy and copy the code below into the file on all nifi nodes. Note: Change the kerberos principal and keytab location as necessary import org.apache.nifi.nar.NarClassLoader import org.apache.nifi.nar.NarClassLoaders NarClassLoaders.instance.extensionClassLoaders.each { c -> if (c instanceof NarClassLoader && c.workingDirectory. absolutePath.contains('nifi-hive')) { def originalClassloader = Thread.currentThread(). getContextClassLoader(); Thread.currentThread().setContextClassLoader(c); try { def configClass = c.loadClass('org.apache.hadoop.conf. Configuration', true) def hiveConfigurator = c.loadClass('org.apache.nifi.util. hive.HiveConfigurator', true).newInstance(); def config = hiveConfigurator.getConfigurationFromFiles('') hiveConfigurator.preload(config) c.loadClass('org.apache.hadoop.security.UserGroupInformation' , true).getMethod('setConfiguration', configClass).invoke(null, config) c.loadClass('org.apache.hadoop.security.UserGroupInformation' , true).getMethod('loginUserFromKeytab', String.class, String.clas s).invoke(null, 'nifi@HDF.NET', '/etc/security/keytabs/nifi. headless.keytab') log.info('Successfully logged in') session.transfer(session.create(), REL_SUCCESS) } catch (Exception e) { log.error('Unable to login with keytab', e) session.transfer(session.create(), REL_FAILURE) } finally { Thread.currentThread().setContextClassLoader (originalClassloader); } } } Do the following on all the nifi nodes, chown nifi:nifi /etc/hdp/tgt.groovy; chmod +x /etc/hdp/tgt.groovy; cp /etc/hdp/hive-site.xml /etc/nifi/conf/; cp /etc/hdp/core-site.xml /etc/nifi/conf/; cp /etc/hdp /hdfs-site.xml /etc/nifi/conf/; chown nifi:nifi /etc/nifi/conf/hive-site.xml /etc/nifi/conf/core-site.xml /etc/nifi/conf/hdfs-site. xml; Integration for hbase processor: Add hostname and IP of all HDP nodes to /etc/hosts file in all nifi nodes <Hostname1> <IpAddress1> <Hostname2> <IpAddress2> Integration for kafka processor: Create a file under the location "/etc/hdp/zookeeper-jaas.conf" and copy the code below Note: Change the kerberos principal and keytab location as necessary Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="/etc/security/keytabs/nifi.headless.keytab" storeKey=true useTicketCache=false principal="nifi@HDF.COM"; }; KafkaClient { com.sun.security.auth.module.Krb5LoginModule required useTicketCache=true renewTicket=true serviceName="kafka" useKeyTab=true keyTab="/etc/security/keytabs/nifi.headless.keytab" principal="nifi@HDF.COM"; }; Add below configuration to advanced nifi-bootstrap.conf in HDP ambari and restart the nifi service. java.arg.20=-Djava.security.auth.login.config=/etc/hdp/zookeeper-jaas.conf

jknulst · ‎10-12-2018

This is complex I believe your problem is you need to forward the traffic to/from the KDC to your Mac. You can do this by SSH tunnelling. That alone is not enough though since SSH port forwarding is only fit for TCP traffic and KDC traffic is UDP.

nbalaji-elangov · ‎03-13-2017

Worked. Thanks

jack_howard · ‎11-27-2017

Your character set is incorrect on the url: It should be specified as connection string - jdbc:teradata://<host>/database=<db>,CLIENT_CHARSET=ISO8859_1' charset is not processed the same as "client_charset" - see this reference for all character sets for Teradata. https://developer.teradata.com/doc/connectivity/jdbc/reference/current/jdbcug_chapter_1.html

vetriselvan_gan · ‎10-31-2017

Hi balaji what is the status of this Jira. did you have any referance to that jira I can refer it ?

gauravb117 · ‎12-06-2017

What are the maven commands that need to be run before vagrant up, I use mvn clean compile -DskipTests in the top metron directory

abajwa · ‎01-09-2016

You can check @Randy Gelhausen's docker-ambari repo Also check this webinar recording to see a demo http://hortonworks.com/partners/learn/#dev

Online	Offline
Last Visited	‎07-16-2020 02:31 PM

Member Since	‎05-28-2019 08:32 AM
Last Visited	‎07-16-2020 02:31 PM
Posts	46
Kudos received	16

Cloudera Community

Re: Can we create external hive table on top of Fi...

Re: Unable to save data from Spark into Hive

Re: Submitting Spark Jobs From Apache NiFi Using L...

Re: NiFi Phoenix processor?

Lessons learnt from nifi streaming data to hive tr...

Kerberized HDF 2.1.1 integration with kerberized H...

Re: Unable to Kinit on local mac with MIT KDC runn...

Re: Unable to connect spark python with phoenix in...

Re: How to Sqoop Teradata column with latin charse...

Re: SQOOP issues with ORACLE datatypes - RAW, BFIL...

Re: Metron Ansible deployment failure in metron_pc...

Re: Best practices for running HDP on docker toolb...