About TimothySpann

TimothySpann · ‎07-19-2016

In Apache NiFi 1.2, there are processors for Reading Hive data via HiveQL and Storing to Hive via HiveQL. These processors are SelectHiveQL and PutHiveQL. Configuring a HiveQL processor is simple, you need to enter your query and pick either AVRO or CSV format. AVRO is a better fit, I am waiting for ORC. Most important you need to set a Connection Pool to connect to your cluster. You can just enter a regular SQL that you are doing in Hive. For Hive to work, you must setup a HiveConnectionPool Controller Service. After configuration you will need to enable that and then you can enable your processor(s). For connecting to Hive on the Sandbox, set the Database Connection URL: jdbc://hive2://localhost:10000/default. For Hive Configuration Resources: you set the hive configuration files. You can set the Database User and Password of the user that has access you require for Hive. For documentation on the HiveConnectionPool. For a PutHiveQL, you just need to set a connection pool, batch size for updates and a character set. The defaults for this are ok. CAVEAT: Once you have it set make sure you have all the relationships terminated somewhere either in a Sink or with auto terminate.

TimothySpann · ‎07-17-2016

I just tried out the new NiFi 0.7.0 version's Slack. Source I used Twitter, since it has some fun data and gets you a nice big stream. Sometimes with Twitter Feeds will be limited and Twitter will give you the 420 Enhance Your Calm Message. https://httpstatusdogs.com/420-enhance-your-calm Usually you can just wait 5-20 minutes and you will be serving again. Sometimes you might need to use a different of your apps, reset the tokens in your app or create a new app (https://apps.twitter.com/). Processing Use the Pull Key Attributes, to Find Only Tweets (remove null) Sink to Slack For the PutToSlack Processor Set the Webhook URL to the URL generated by the incoming webhook page in slack.com. Set the Webhook Text to ${twitter.msg}, this will send your Twitter message to slack. Set the Channel to #general, or a channel of your choosing. I created a slack board for receiving my messages, https://nifi-se.slack.com/messages/general/. You can easily create your own (or using your existing Slack board). Just go to slack.com. You will need to create a webhook. To set it up, in the #general channel just type incoming webhook. You will get a link the screen to create one. Apache NiFi 0.70 Final Flow Now you can start seeing tweets turn into slack messages. Apache NiFi 0.70 now has 155 processors! Let's explore some more.

TimothySpann · ‎07-15-2016

su hdfs hadoop fs -mkdir /udf hadoop fs -put urldetector-1.0-jar-with-dependencies.jar /udf/ hadoop fs -put libs/url-detector-0.1.15.jar /udf/ hadoop fs -chown -R hdfs /udf hadoop fs -chgrp -R hdfs /udf hadoop fs -chmod -R 775 /udf Create Hadoop Directories and upload the two necessary libraries. CREATE FUNCTION urldetector as 'com.dataflowdeveloper.detection.URLDetector' USING JAR 'hdfs:///udf/urldetector-1.0-jar-with-dependencies.jar', JAR 'hdfs:///udf/url-detector-0.1.15.jar'; Create Hive Function with those HDFS referenced JARs select http_user_agent,urldetector(remote_host)asurls,remote_host from AccessLogs limit 100; Test the UDF via Hive QL @Description(name="urldetector", value="_FUNC_(string) - detectsurls") public final class URLDetector extends UDF{} Java Header for the UDF set hive.cli.print.header=true; add jar urldetector-1.0-jar-with-dependencies.jar;CREATE TEMPORARY FUNCTION urldetector as 'com.dataflowdeveloper.detection.URLDetector';select urldetector(description) from sample_07 limit 100; You can test with a temporary function through Hive CLI before making the function permanent. mvn compile assembly:single Build the Jar File for Deployment The library from LinkedIn (https://github.com/linkedin/URL-Detector) must be compiled and the JAR used in your code and deployed to Hive. References See: https://github.com/tspannhw/URLDetector for full source code.

TimothySpann · ‎07-07-2016

Adding HDF (with Apache NiFi) to your HDP 2.5 Sandbox is very quick, painless and easy. Get the most recent Hortonworks DataFlow (download😞 wget http://d3d0kdwqv675cq.cloudfront.net/HDF/centos6/1.x/updates/1.2.0.1/HDF-1.2.0.1-1.tar.gz tar -xvf HDF-1.2.0.1-1.tar.gz cd HDF-1.2.0.1-1/nifi/ Then change the port used by NiFi in the conf/nifi.properties file to: nifi.web.http.port=8090 Install NiFi as a Linux Service bin/nifi.sh install sudo service nifi start NiFi home: /opt/HDF-1.2.0.1-1/nifi Bootstrap Config File: /opt/HDF-1.2.0.1-1/nifi/conf/bootstrap.conf 2016-07-04 02:18:00,005 INFO [main] org.apache.nifi.bootstrap.Command Starting Apache NiFi... 2016-07-04 02:18:00,006 INFO [main] org.apache.nifi.bootstrap.Command Working Directory: /opt/HDF-1.2.0.1-1/nifi You can check the status of single NiFi server via status command: [root@sandbox nifi]# sudo service nifi status nifi.sh: JAVA_HOME not set; results may vary Java home: NiFi home: /opt/HDF-1.2.0.1-1/nifi Bootstrap Config File: /opt/HDF-1.2.0.1-1/nifi/conf/bootstrap.conf 2016-07-04 02:18:42,527 INFO [main] org.apache.nifi.bootstrap.Command Apache NiFi is currently running, listening to Bootstrap on port 43184, PID=4391 Make sure you add port 8090 to the sandbox networking. You are now ready to go. Now start flowing.

TimothySpann · ‎07-07-2016

Using Yahoo Kafka Manager Git clone the project (you need Java 8 to build). Then use SBT to do a clean distribution. This will take a while as it downloads a lot of jars. <code>kafka-manager.zkhosts="sandbox.hortonworks.com:2181" The build will produce a Zip file, unzip it, update configuration file (conf/application.conf) and then you can run it. ../kafka-manager/target/universal/kafka-manager-1.3.0.8.zip unzip ../kafka-manager/target/universal/kafka-manager-1.3.0.8.zip kafka-manager-1.3.0.8 git:(master) ✗ vi conf/application.conf kafka-manager-1.3.0.8 git:(master) ✗ bin/kafka-manager -Dconfig.file=conf/application.conf Access the Kafka Manager from Chrome http://localhost:9000/ Running Kafka Manager Resources https://github.com/yahoo/kafka-manager http://edbaker.weebly.com/blog/install-and-evaluation-of-yahoos-kafka-manager http://chennaihug.org/knowledgebase/yahoo-kafka-manager/ https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem Tools For Testing Kafka with a command-line client producer/consumer: https://github.com/edenhill/kafkacat (brew install kafkacat) For External Access You may need to set advertised.host.name http://stackoverflow.com/questions/31476679/send-kafkaproducer-from-local-machine-to-hortonworks-sandbox-on-virtualbox

TimothySpann · ‎07-07-2016

From the Sandbox as Root /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --zookeeper sandbox.hortonworks.com:2181 --replication-factor 1 --partitions 1 --topic people Test The Topic [root@sandbox kafka]# /usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --topic people --zookeeper sandbox.hortonworks.com:2181 {metadata.broker.list=sandbox.hortonworks.com:6667, request.timeout.ms=30000, client.id=console-consumer-10628, security.protocol=PLAINTEXT} Resources: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_secure-kafka-ambari/content/ch_secure-kafka-create-topics.html https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_secure-kafka-ambari/content/ch_secure-kafka-produce-events.html

TimothySpann · ‎07-07-2016

In HDP 2.5 Sandbox, I did a quick walk through. First thing I liked was the Visualization: The Data Explorer provided a nice query tool to view tables and graphs. The Data Visualization tab provides some nice graphing capabilities. After you run your queries you can look at the Tez results to see how it ran, it's a nice way to see what you may need to optimize. The Hive Ambari View is getting to be a very solid tool for working with Hive. From DDL (creating tables is easy) to viewing data, to updates and inserts.

TimothySpann · ‎07-07-2016

do you have the code available?

TimothySpann · ‎07-04-2016

This tutorial is great: https://github.com/hortonworks-gallery/ambari-vnc-service Eclipse Plugin https://github.com/winghc/hadoop2x-eclipse-plugin JDK 7 is best for most use case and Scala 2.10. Maven and SBT are necessary as well. Setup your Environment https://dzone.com/articles/spark-and-scala-resources https://dzone.com/articles/whats-on-your-laptop Lots of options: This is an eclipse project for Hbase Coprocessor https://github.com/tspannhw/hbasecoprocessor Artem has a great project for testing https://github.com/dbist/HBaseUnitTest Once all the ports are open and not firewalled it’s usually straight forward. Eclipse to Spark https://community.hortonworks.com/questions/36354/eclipse-to-sandbox-1.html https://community.hortonworks.com/questions/32567/scala-with-hive-in-ecplipse-scala.html Hadoop Eclipse Plugin https://community.hortonworks.com/questions/10404/hadoop-eclipse-plugin.html IntelliJ Project for Spark https://github.com/agilemobiledev/sparkworkshop https://community.hortonworks.com/questions/31077/how-to-setup-intellij-idea-16-to-run-hortonworks-s.html IntelliJ Settings https://community.hortonworks.com/questions/37410/recommended-idea-intellij-vmoptions-setting-for-de.html These configuration files must be in project or class path: core-site.xml hdfs-site.xml yarn-site.xml Add Jars for Access http://nivemaham.com/index.php/technical/22-java/hadoop/40-how-to-use-ide-for-hadoop-development-with-hortonworks-sandbox For Apache Kylin development http://kylin.apache.org/development/dev_env.html Remote Debugging Spark https://nicolasmaillard.com/2016/02/06/remote-debugging-201-spark/ Testing with Hadoop MiniClusters https://github.com/sakserv/hadoop-mini-clusters

TimothySpann · ‎07-01-2016

I follow the examples from 3 Pillar Global Post and Apache Hbase Blog Post and then updated for newer versions. To write an HBase Coprocessor you need Google Protocol Buffers. To use recent versions, on a Mac you need to install v2.50 as that works with HBase: brew tap homebrew/versions brew install protobuf250 Check your version protoc --version Source Code with Maven Build (pom.xml) is here. You will need Maven and Java 7 (or newer) JDK for compilation. Testing on Hadoop export HADOOP_CLASSPATH=`hbase classpath` hadoop jar hbasecoprocessor-1.0.jar com.dataflowdeveloper.hbasecoprocessor.SumEndPoint Upload your Jar to HDFS hadoop fs -mkdir /user/tspann hadoop fs -ls /user/tspann hadoop fs -put hbasecoprocessor-1.0.jar /user/tspann hadoop fs -chmod 777 /user/tspann/hbasecoprocessor-1.0.jar Install Dynamically disable 'stocks' alter 'stocks', 'coprocessor'=>'hdfs://sandbox.hortonworks.com/user/tspann/hbasecoprocessor-1.0.jar|com.dataflowdeveloper.hbasecoprocessor.SumEndPoint|1001|arg1=1' enable 'stocks' describe 'stocks' Testing Locally java -classpath `hbase classpath`:hbasecoprocessor-1.0.jar com.dataflowdeveloper.hbasecoprocessor.SumEndPoint Checking Table After Installation [root@sandbox demo]# hbase shell HBase Shell; enter Version 1.1.2.2.4.0.0-169, r61dfb2b344f424a11f93b3f086eab815c1eb0b6a, Wed Feb 10 07:08:51 UTC 2016 hbase(main):001:0> describe 'stocks' Table stocks is ENABLED stocks, {TABLE_ATTRIBUTES => {coprocessor$1 => 'hdfs://sandbox.hortonworks.com/user/tspann/hbasecoprocessor-1.0.jar|com.dataflowdeveloper.hbasecoprocessor.SumEndPoint|1001|arg1=1'} COLUMN FAMILIES DESCRIPTION {NAME => 'cf',DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => 'FOREVER', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} 1 row(s) in 0.3270 seconds You can see the coprocessor has been added and is enabled. References https://github.com/larsgeorge/hbase-book/blob/master/ch04/src/main/java/coprocessor/RowCountEndpoint.java https://github.com/Huawei-Hadoop/hindex https://github.com/apache/hbase/tree/branch-1.0/hbase-examples https://github.com/apache/hbase/blob/branch-1.0/hbase-examples/src/main/java/org/apache/hadoop/hbase/coprocessor/example/RowCountEndpoint.java http://bigdatazone.blogspot.com/2015/05/hbase-coprocessor-using-protobuf-250.html http://hbase.apache.org/book.html#cp http://hbase.apache.org/book.html#cp_loading https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/package-summary.html http://hbase.apache.org/book.html#cp_example http://hbase.apache.org/xref/org/apache/hadoop/hbase/coprocessor/example/RowCountEndpoint.html https://github.com/apache/hbase/blob/branch-1.1/hbase-examples/pom.xml https://community.hortonworks.com/questions/2577/hbase-coprocessor-and-security.html https://www.3pillarglobal.com/insights/hbase-coprocessors#(endpoints-coprocessor) https://blogs.apache.org/hbase/entry/coprocessor_introduction https://github.com/dbist/HBaseUnitTest

Online	Offline
Last Visited	‎02-05-2026 01:38 AM

Member Since	‎01-07-2019 11:58 AM
Last Visited	‎02-05-2026 01:38 AM
Posts	1,973
Kudos received	1121

Cloudera Community

Re: Has anyone tried NiFi consuming (JMSConsume) f...

Re: NiFi Crash after runing chain of lookups

Re: Recommend approach for listening to RSS Feed i...

Re: NiFi ListenFTP Processor Default Data Port

Re: Nifi: Kafka Producer with Avro format in both ...

Using HiveQL Processors in Apache NiFi 1.2

Using Apache NiFi 0.7.0's New PutSlack Processor

Making a Hive UDF From A Useful Existing Library

Quickly Adding HDF to HDP 2.5 Sandbox

Using Kafka Manager with HDP 2.5 Sandbox Kafka

Create Kafka Topic and Use From Apache NiFi for HD...

Accessing Hive on HDP 2.5 Sandbox

Re: NiFi OCR - Using Apache NiFi to read children’...

IntelliJ / Eclipse Usage Against HDP 2.5 Sandbox

Creating an HBase Coprocessor in Java