1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2557 | 04-03-2024 06:39 AM | |
| 3921 | 01-12-2024 08:19 AM | |
| 2122 | 12-07-2023 01:49 PM | |
| 3173 | 08-02-2023 07:30 AM | |
| 4306 | 03-29-2023 01:22 PM |
07-21-2016
02:18 AM
8 Kudos
In Apache NiFi 1.2, there are processors to Get and Put data to an MQTT broker, which is popular in IoT because of it's small footprint and speed. MQTT is supported by Eclipse and IBM. I created an example on the HDP 2.6. I downloaded and installed the latest Apache NiFi 1.2 there as well as an example MQTT Broker (Mosquitto) http://mosquitto.org/. To Install Mosquitto on HDP 2.6 (Centos 7.x) sudo wget http://download.opensuse.org/repositories/home:/oojah:/mqtt/CentOS_CentOS-6/home:oojah:mqtt.reposudo cp *.repo /etc/yum.repos.d/
sudo yum -y update
sudo yum -y install mosquitto To Verify the Settings and Prepare Logs [root@sandbox opt]# cat /etc/mosquitto/mosquitto.conf
# Place your local configuration in /etc/mosquitto/conf.d/
pid_file /var/run/mosquitto.pid
persistence true
persistence_location /var/lib/mosquitto/
#log_dest file /var/log/mosquitto/mosquitto.log
include_dir /etc/mosquitto/conf.d
[root@sandbox opt]# vi /etc/mosquitto/mosquitto.conf
[root@sandbox opt]# mkdir -p /var/log/mosquitto
[root@sandbox opt]# chmod 777 /var/log/mosquitto/
[root@sandbox opt]# touch /var/log/mosquitto/mosquitto.log
[root@sandbox opt]# chmod 777 /var/log/mosquitto/
Run MQTT Broker Server mosquitto -d The default port for MQTT and Mosquitto is 1883. Make sure that port is not blocked by Firewalls, Virus software and if one the sandbox it is exposed. Running Mosquitto on Sandbox NiFi PublishMQTT NiFi ConsumeMQTT
After Running [root@sandbox demo]# hdfs dfs -ls /mqtt
root hdfs 2783 2016-07-20 14:56 /mqtt/37115929161818
root hdfs 2805 2016-07-20 14:56 /mqtt/37115930927495
ConsumeMQTT Publish MQTT Resources:
http://mosquitto.org/man/mosquitto-8.html http://ceit.uq.edu.au/content/mqtt-and-growl http://growl.info/ http://www.eclipse.org/paho/
... View more
Labels:
07-19-2016
12:26 AM
8 Kudos
In Apache NiFi 1.2, there are processors for Reading Hive data via HiveQL and Storing to Hive via HiveQL. These processors are SelectHiveQL and PutHiveQL. Configuring a HiveQL processor is simple, you need to enter your query and pick either AVRO or CSV format. AVRO is a better fit, I am waiting for ORC. Most important you need to set a Connection Pool to connect to your cluster. You can just enter a regular SQL that you are doing in Hive. For Hive to work, you must setup a HiveConnectionPool Controller Service. After configuration you will need to enable that and then you can enable your processor(s). For connecting to Hive on the Sandbox, set the Database Connection URL: jdbc://hive2://localhost:10000/default. For Hive Configuration Resources: you set the hive configuration files. You can set the Database User and Password of the user that has access you require for Hive. For documentation on the HiveConnectionPool. For a PutHiveQL, you just need to set a connection pool, batch size for updates and a character set. The defaults for this are ok. CAVEAT: Once you have it set make sure you have all the relationships terminated somewhere either in a Sink or with auto terminate.
... View more
Labels:
07-17-2016
12:49 PM
4 Kudos
I just tried out the new NiFi 0.7.0 version's Slack. Source I used Twitter, since it has some fun data and gets you a nice big stream. Sometimes with Twitter Feeds will be limited and Twitter will give you the 420 Enhance Your Calm Message. https://httpstatusdogs.com/420-enhance-your-calm Usually you can just wait 5-20 minutes and you will be serving again. Sometimes you might need to use a different of your apps, reset the tokens in your app or create a new app (https://apps.twitter.com/). Processing Use the Pull Key Attributes, to Find Only Tweets (remove null) Sink to Slack For the PutToSlack Processor Set the Webhook URL to the URL generated by the incoming webhook page in slack.com. Set the Webhook Text to ${twitter.msg}, this will send your Twitter message to slack. Set the Channel to #general, or a channel of your choosing. I created a slack board for receiving my messages, https://nifi-se.slack.com/messages/general/. You can easily create your own (or using your existing Slack board). Just go to slack.com. You will need to create a webhook. To set it up, in the #general channel just type incoming webhook. You will get a link the screen to create one. Apache NiFi 0.70 Final Flow
Now you can start seeing tweets turn into slack messages. Apache NiFi 0.70 now has 155 processors! Let's explore some more.
... View more
Labels:
07-15-2016
11:35 AM
1 Kudo
su hdfs
hadoop fs -mkdir /udf
hadoop fs -put urldetector-1.0-jar-with-dependencies.jar /udf/
hadoop fs -put libs/url-detector-0.1.15.jar /udf/
hadoop fs -chown -R hdfs /udf
hadoop fs -chgrp -R hdfs /udf
hadoop fs -chmod -R 775 /udf
Create Hadoop Directories and upload the two necessary libraries. CREATE FUNCTION urldetector as 'com.dataflowdeveloper.detection.URLDetector' USING JAR 'hdfs:///udf/urldetector-1.0-jar-with-dependencies.jar', JAR 'hdfs:///udf/url-detector-0.1.15.jar'; Create Hive Function with those HDFS referenced JARs select http_user_agent,urldetector(remote_host)asurls,remote_host from AccessLogs limit 100; Test the UDF via Hive QL @Description(name="urldetector", value="_FUNC_(string) - detectsurls")
public final class URLDetector extends UDF{} Java Header for the UDF set hive.cli.print.header=true;
add jar urldetector-1.0-jar-with-dependencies.jar;CREATE TEMPORARY FUNCTION urldetector as 'com.dataflowdeveloper.detection.URLDetector';select urldetector(description) from sample_07 limit 100; You can test with a temporary function through Hive CLI before making the function permanent. mvn compile assembly:single Build the Jar File for Deployment The library from LinkedIn (https://github.com/linkedin/URL-Detector) must be compiled and the JAR used in your code and deployed to Hive. References See: https://github.com/tspannhw/URLDetector for full source code.
... View more
Labels:
07-07-2016
11:23 PM
2 Kudos
Adding HDF (with Apache NiFi) to your HDP 2.5 Sandbox is very quick, painless and easy. Get the most recent Hortonworks DataFlow (download😞 wget http://d3d0kdwqv675cq.cloudfront.net/HDF/centos6/1.x/updates/1.2.0.1/HDF-1.2.0.1-1.tar.gz
tar -xvf HDF-1.2.0.1-1.tar.gz
cd HDF-1.2.0.1-1/nifi/ Then change the port used by NiFi in the conf/nifi.properties file to: nifi.web.http.port=8090 Install NiFi as a Linux Service bin/nifi.sh install
sudo service nifi start
NiFi home: /opt/HDF-1.2.0.1-1/nifi
Bootstrap Config File: /opt/HDF-1.2.0.1-1/nifi/conf/bootstrap.conf
2016-07-04 02:18:00,005 INFO [main] org.apache.nifi.bootstrap.Command Starting Apache NiFi...
2016-07-04 02:18:00,006 INFO [main] org.apache.nifi.bootstrap.Command Working Directory: /opt/HDF-1.2.0.1-1/nifi
You can check the status of single NiFi server via status command: [root@sandbox nifi]# sudo service nifi status
nifi.sh: JAVA_HOME not set; results may vary
Java home:
NiFi home: /opt/HDF-1.2.0.1-1/nifi
Bootstrap Config File: /opt/HDF-1.2.0.1-1/nifi/conf/bootstrap.conf
2016-07-04 02:18:42,527 INFO [main] org.apache.nifi.bootstrap.Command Apache NiFi is currently running, listening to Bootstrap on port 43184, PID=4391
Make sure you add port 8090 to the sandbox networking. You are now ready to go. Now start flowing.
... View more
Labels:
07-07-2016
07:50 PM
2 Kudos
Using Yahoo Kafka Manager Git clone the project (you need Java 8 to build). Then use SBT to do a clean distribution. This will take a while as it downloads a lot of jars. <code>kafka-manager.zkhosts="sandbox.hortonworks.com:2181"
The build will produce a Zip file, unzip it, update configuration file (conf/application.conf) and then you can run it. ../kafka-manager/target/universal/kafka-manager-1.3.0.8.zip
unzip ../kafka-manager/target/universal/kafka-manager-1.3.0.8.zip
kafka-manager-1.3.0.8 git:(master) ✗ vi conf/application.conf
kafka-manager-1.3.0.8 git:(master) ✗ bin/kafka-manager -Dconfig.file=conf/application.conf
Access the Kafka Manager from Chrome http://localhost:9000/
Running Kafka Manager
Resources https://github.com/yahoo/kafka-manager http://edbaker.weebly.com/blog/install-and-evaluation-of-yahoos-kafka-manager http://chennaihug.org/knowledgebase/yahoo-kafka-manager/ https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem Tools For Testing Kafka with a command-line client producer/consumer: https://github.com/edenhill/kafkacat (brew install kafkacat) For External Access You may need to set advertised.host.name http://stackoverflow.com/questions/31476679/send-kafkaproducer-from-local-machine-to-hortonworks-sandbox-on-virtualbox
... View more
Labels:
07-07-2016
07:50 PM
1 Kudo
From the Sandbox as Root /usr/hdp/current/kafka-broker/bin/kafka-topics.sh
--create --zookeeper sandbox.hortonworks.com:2181 --replication-factor 1
--partitions 1 --topic people Test The Topic [root@sandbox kafka]# /usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --topic people --zookeeper sandbox.hortonworks.com:2181
{metadata.broker.list=sandbox.hortonworks.com:6667, request.timeout.ms=30000, client.id=console-consumer-10628, security.protocol=PLAINTEXT}
Resources: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_secure-kafka-ambari/content/ch_secure-kafka-create-topics.html https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_secure-kafka-ambari/content/ch_secure-kafka-produce-events.html
... View more
Labels:
07-07-2016
06:04 PM
2 Kudos
In HDP 2.5 Sandbox, I did a quick walk through. First thing I liked was the Visualization: The Data Explorer provided a nice query tool to view tables and graphs. The Data Visualization tab provides some nice graphing capabilities. After you run your queries you can look at the Tez results to see how it ran, it's a nice way to see what you may need to optimize. The Hive Ambari View is getting to be a very solid tool for working with Hive. From DDL (creating tables is easy) to viewing data, to updates and inserts.
... View more
Labels:
07-07-2016
03:01 PM
do you have the code available?
... View more
07-04-2016
05:11 PM
3 Kudos
This tutorial is great: https://github.com/hortonworks-gallery/ambari-vnc-service Eclipse Plugin https://github.com/winghc/hadoop2x-eclipse-plugin JDK 7 is best for most use case and Scala 2.10. Maven and SBT are necessary as well. Setup your Environment https://dzone.com/articles/spark-and-scala-resources https://dzone.com/articles/whats-on-your-laptop Lots of options: This is an eclipse project for Hbase Coprocessor https://github.com/tspannhw/hbasecoprocessor Artem has a great project for testing https://github.com/dbist/HBaseUnitTest Once all the ports are open and not firewalled it’s usually straight forward. Eclipse to Spark https://community.hortonworks.com/questions/36354/eclipse-to-sandbox-1.html https://community.hortonworks.com/questions/32567/scala-with-hive-in-ecplipse-scala.html Hadoop Eclipse Plugin https://community.hortonworks.com/questions/10404/hadoop-eclipse-plugin.html IntelliJ Project for Spark https://github.com/agilemobiledev/sparkworkshop https://community.hortonworks.com/questions/31077/how-to-setup-intellij-idea-16-to-run-hortonworks-s.html IntelliJ Settings https://community.hortonworks.com/questions/37410/recommended-idea-intellij-vmoptions-setting-for-de.html These configuration files must be in project or class path: core-site.xml hdfs-site.xml yarn-site.xml Add Jars for Access http://nivemaham.com/index.php/technical/22-java/hadoop/40-how-to-use-ide-for-hadoop-development-with-hortonworks-sandbox For Apache Kylin development http://kylin.apache.org/development/dev_env.html Remote Debugging Spark https://nicolasmaillard.com/2016/02/06/remote-debugging-201-spark/ Testing with Hadoop MiniClusters https://github.com/sakserv/hadoop-mini-clusters
... View more