1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2453 | 04-03-2024 06:39 AM | |
| 3802 | 01-12-2024 08:19 AM | |
| 2049 | 12-07-2023 01:49 PM | |
| 3033 | 08-02-2023 07:30 AM | |
| 4153 | 03-29-2023 01:22 PM |
07-19-2016
12:26 AM
8 Kudos
In Apache NiFi 1.2, there are processors for Reading Hive data via HiveQL and Storing to Hive via HiveQL. These processors are SelectHiveQL and PutHiveQL. Configuring a HiveQL processor is simple, you need to enter your query and pick either AVRO or CSV format. AVRO is a better fit, I am waiting for ORC. Most important you need to set a Connection Pool to connect to your cluster. You can just enter a regular SQL that you are doing in Hive. For Hive to work, you must setup a HiveConnectionPool Controller Service. After configuration you will need to enable that and then you can enable your processor(s). For connecting to Hive on the Sandbox, set the Database Connection URL: jdbc://hive2://localhost:10000/default. For Hive Configuration Resources: you set the hive configuration files. You can set the Database User and Password of the user that has access you require for Hive. For documentation on the HiveConnectionPool. For a PutHiveQL, you just need to set a connection pool, batch size for updates and a character set. The defaults for this are ok. CAVEAT: Once you have it set make sure you have all the relationships terminated somewhere either in a Sink or with auto terminate.
... View more
Labels:
07-17-2016
12:49 PM
4 Kudos
I just tried out the new NiFi 0.7.0 version's Slack. Source I used Twitter, since it has some fun data and gets you a nice big stream. Sometimes with Twitter Feeds will be limited and Twitter will give you the 420 Enhance Your Calm Message. https://httpstatusdogs.com/420-enhance-your-calm Usually you can just wait 5-20 minutes and you will be serving again. Sometimes you might need to use a different of your apps, reset the tokens in your app or create a new app (https://apps.twitter.com/). Processing Use the Pull Key Attributes, to Find Only Tweets (remove null) Sink to Slack For the PutToSlack Processor Set the Webhook URL to the URL generated by the incoming webhook page in slack.com. Set the Webhook Text to ${twitter.msg}, this will send your Twitter message to slack. Set the Channel to #general, or a channel of your choosing. I created a slack board for receiving my messages, https://nifi-se.slack.com/messages/general/. You can easily create your own (or using your existing Slack board). Just go to slack.com. You will need to create a webhook. To set it up, in the #general channel just type incoming webhook. You will get a link the screen to create one. Apache NiFi 0.70 Final Flow
Now you can start seeing tweets turn into slack messages. Apache NiFi 0.70 now has 155 processors! Let's explore some more.
... View more
Labels:
07-15-2016
11:35 AM
1 Kudo
su hdfs
hadoop fs -mkdir /udf
hadoop fs -put urldetector-1.0-jar-with-dependencies.jar /udf/
hadoop fs -put libs/url-detector-0.1.15.jar /udf/
hadoop fs -chown -R hdfs /udf
hadoop fs -chgrp -R hdfs /udf
hadoop fs -chmod -R 775 /udf
Create Hadoop Directories and upload the two necessary libraries. CREATE FUNCTION urldetector as 'com.dataflowdeveloper.detection.URLDetector' USING JAR 'hdfs:///udf/urldetector-1.0-jar-with-dependencies.jar', JAR 'hdfs:///udf/url-detector-0.1.15.jar'; Create Hive Function with those HDFS referenced JARs select http_user_agent,urldetector(remote_host)asurls,remote_host from AccessLogs limit 100; Test the UDF via Hive QL @Description(name="urldetector", value="_FUNC_(string) - detectsurls")
public final class URLDetector extends UDF{} Java Header for the UDF set hive.cli.print.header=true;
add jar urldetector-1.0-jar-with-dependencies.jar;CREATE TEMPORARY FUNCTION urldetector as 'com.dataflowdeveloper.detection.URLDetector';select urldetector(description) from sample_07 limit 100; You can test with a temporary function through Hive CLI before making the function permanent. mvn compile assembly:single Build the Jar File for Deployment The library from LinkedIn (https://github.com/linkedin/URL-Detector) must be compiled and the JAR used in your code and deployed to Hive. References See: https://github.com/tspannhw/URLDetector for full source code.
... View more
Labels:
07-07-2016
11:23 PM
2 Kudos
Adding HDF (with Apache NiFi) to your HDP 2.5 Sandbox is very quick, painless and easy. Get the most recent Hortonworks DataFlow (download😞 wget http://d3d0kdwqv675cq.cloudfront.net/HDF/centos6/1.x/updates/1.2.0.1/HDF-1.2.0.1-1.tar.gz
tar -xvf HDF-1.2.0.1-1.tar.gz
cd HDF-1.2.0.1-1/nifi/ Then change the port used by NiFi in the conf/nifi.properties file to: nifi.web.http.port=8090 Install NiFi as a Linux Service bin/nifi.sh install
sudo service nifi start
NiFi home: /opt/HDF-1.2.0.1-1/nifi
Bootstrap Config File: /opt/HDF-1.2.0.1-1/nifi/conf/bootstrap.conf
2016-07-04 02:18:00,005 INFO [main] org.apache.nifi.bootstrap.Command Starting Apache NiFi...
2016-07-04 02:18:00,006 INFO [main] org.apache.nifi.bootstrap.Command Working Directory: /opt/HDF-1.2.0.1-1/nifi
You can check the status of single NiFi server via status command: [root@sandbox nifi]# sudo service nifi status
nifi.sh: JAVA_HOME not set; results may vary
Java home:
NiFi home: /opt/HDF-1.2.0.1-1/nifi
Bootstrap Config File: /opt/HDF-1.2.0.1-1/nifi/conf/bootstrap.conf
2016-07-04 02:18:42,527 INFO [main] org.apache.nifi.bootstrap.Command Apache NiFi is currently running, listening to Bootstrap on port 43184, PID=4391
Make sure you add port 8090 to the sandbox networking. You are now ready to go. Now start flowing.
... View more
Labels:
07-07-2016
07:50 PM
2 Kudos
Using Yahoo Kafka Manager Git clone the project (you need Java 8 to build). Then use SBT to do a clean distribution. This will take a while as it downloads a lot of jars. <code>kafka-manager.zkhosts="sandbox.hortonworks.com:2181"
The build will produce a Zip file, unzip it, update configuration file (conf/application.conf) and then you can run it. ../kafka-manager/target/universal/kafka-manager-1.3.0.8.zip
unzip ../kafka-manager/target/universal/kafka-manager-1.3.0.8.zip
kafka-manager-1.3.0.8 git:(master) ✗ vi conf/application.conf
kafka-manager-1.3.0.8 git:(master) ✗ bin/kafka-manager -Dconfig.file=conf/application.conf
Access the Kafka Manager from Chrome http://localhost:9000/
Running Kafka Manager
Resources https://github.com/yahoo/kafka-manager http://edbaker.weebly.com/blog/install-and-evaluation-of-yahoos-kafka-manager http://chennaihug.org/knowledgebase/yahoo-kafka-manager/ https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem Tools For Testing Kafka with a command-line client producer/consumer: https://github.com/edenhill/kafkacat (brew install kafkacat) For External Access You may need to set advertised.host.name http://stackoverflow.com/questions/31476679/send-kafkaproducer-from-local-machine-to-hortonworks-sandbox-on-virtualbox
... View more
Labels:
07-07-2016
07:50 PM
1 Kudo
From the Sandbox as Root /usr/hdp/current/kafka-broker/bin/kafka-topics.sh
--create --zookeeper sandbox.hortonworks.com:2181 --replication-factor 1
--partitions 1 --topic people Test The Topic [root@sandbox kafka]# /usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --topic people --zookeeper sandbox.hortonworks.com:2181
{metadata.broker.list=sandbox.hortonworks.com:6667, request.timeout.ms=30000, client.id=console-consumer-10628, security.protocol=PLAINTEXT}
Resources: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_secure-kafka-ambari/content/ch_secure-kafka-create-topics.html https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_secure-kafka-ambari/content/ch_secure-kafka-produce-events.html
... View more
Labels:
07-07-2016
06:04 PM
2 Kudos
In HDP 2.5 Sandbox, I did a quick walk through. First thing I liked was the Visualization: The Data Explorer provided a nice query tool to view tables and graphs. The Data Visualization tab provides some nice graphing capabilities. After you run your queries you can look at the Tez results to see how it ran, it's a nice way to see what you may need to optimize. The Hive Ambari View is getting to be a very solid tool for working with Hive. From DDL (creating tables is easy) to viewing data, to updates and inserts.
... View more
Labels:
07-07-2016
03:01 PM
do you have the code available?
... View more
07-04-2016
05:11 PM
3 Kudos
This tutorial is great: https://github.com/hortonworks-gallery/ambari-vnc-service Eclipse Plugin https://github.com/winghc/hadoop2x-eclipse-plugin JDK 7 is best for most use case and Scala 2.10. Maven and SBT are necessary as well. Setup your Environment https://dzone.com/articles/spark-and-scala-resources https://dzone.com/articles/whats-on-your-laptop Lots of options: This is an eclipse project for Hbase Coprocessor https://github.com/tspannhw/hbasecoprocessor Artem has a great project for testing https://github.com/dbist/HBaseUnitTest Once all the ports are open and not firewalled it’s usually straight forward. Eclipse to Spark https://community.hortonworks.com/questions/36354/eclipse-to-sandbox-1.html https://community.hortonworks.com/questions/32567/scala-with-hive-in-ecplipse-scala.html Hadoop Eclipse Plugin https://community.hortonworks.com/questions/10404/hadoop-eclipse-plugin.html IntelliJ Project for Spark https://github.com/agilemobiledev/sparkworkshop https://community.hortonworks.com/questions/31077/how-to-setup-intellij-idea-16-to-run-hortonworks-s.html IntelliJ Settings https://community.hortonworks.com/questions/37410/recommended-idea-intellij-vmoptions-setting-for-de.html These configuration files must be in project or class path: core-site.xml hdfs-site.xml yarn-site.xml Add Jars for Access http://nivemaham.com/index.php/technical/22-java/hadoop/40-how-to-use-ide-for-hadoop-development-with-hortonworks-sandbox For Apache Kylin development http://kylin.apache.org/development/dev_env.html Remote Debugging Spark https://nicolasmaillard.com/2016/02/06/remote-debugging-201-spark/ Testing with Hadoop MiniClusters https://github.com/sakserv/hadoop-mini-clusters
... View more
07-01-2016
08:32 PM
1 Kudo
I follow the examples from 3 Pillar Global Post and Apache Hbase Blog Post and then updated for newer versions. To write an HBase Coprocessor you need Google Protocol Buffers. To use recent versions, on a Mac you need to install v2.50 as that works with HBase: brew tap homebrew/versions
brew install protobuf250
Check your version protoc
--version Source Code with Maven Build (pom.xml) is here. You will need Maven and Java 7 (or newer) JDK for compilation. Testing on Hadoop export HADOOP_CLASSPATH=`hbase classpath`
hadoop jar hbasecoprocessor-1.0.jar com.dataflowdeveloper.hbasecoprocessor.SumEndPoint
Upload your Jar to HDFS hadoop fs -mkdir /user/tspann
hadoop fs -ls /user/tspann hadoop fs -put hbasecoprocessor-1.0.jar /user/tspann hadoop fs -chmod 777 /user/tspann/hbasecoprocessor-1.0.jar Install Dynamically disable 'stocks'
alter 'stocks',
'coprocessor'=>'hdfs://sandbox.hortonworks.com/user/tspann/hbasecoprocessor-1.0.jar|com.dataflowdeveloper.hbasecoprocessor.SumEndPoint|1001|arg1=1'
enable 'stocks'
describe 'stocks'
Testing Locally java -classpath `hbase classpath`:hbasecoprocessor-1.0.jar com.dataflowdeveloper.hbasecoprocessor.SumEndPoint Checking Table After Installation [root@sandbox demo]# hbase shell
HBase Shell; enter
Version
1.1.2.2.4.0.0-169, r61dfb2b344f424a11f93b3f086eab815c1eb0b6a, Wed Feb 10
07:08:51 UTC 2016
hbase(main):001:0> describe 'stocks'
Table stocks is
ENABLED
stocks,
{TABLE_ATTRIBUTES => {coprocessor$1 =>
'hdfs://sandbox.hortonworks.com/user/tspann/hbasecoprocessor-1.0.jar|com.dataflowdeveloper.hbasecoprocessor.SumEndPoint|1001|arg1=1'} COLUMN FAMILIES
DESCRIPTION {NAME => 'cf',DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE
=> '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => 'FOREVER',
MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE =>
'65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
1 row(s) in 0.3270
seconds
You can see the coprocessor has been added and is enabled.
References
https://github.com/larsgeorge/hbase-book/blob/master/ch04/src/main/java/coprocessor/RowCountEndpoint.java https://github.com/Huawei-Hadoop/hindex https://github.com/apache/hbase/tree/branch-1.0/hbase-examples https://github.com/apache/hbase/blob/branch-1.0/hbase-examples/src/main/java/org/apache/hadoop/hbase/coprocessor/example/RowCountEndpoint.java http://bigdatazone.blogspot.com/2015/05/hbase-coprocessor-using-protobuf-250.html http://hbase.apache.org/book.html#cp http://hbase.apache.org/book.html#cp_loading https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/package-summary.html http://hbase.apache.org/book.html#cp_example http://hbase.apache.org/xref/org/apache/hadoop/hbase/coprocessor/example/RowCountEndpoint.html https://github.com/apache/hbase/blob/branch-1.1/hbase-examples/pom.xml https://community.hortonworks.com/questions/2577/hbase-coprocessor-and-security.html https://www.3pillarglobal.com/insights/hbase-coprocessors#(endpoints-coprocessor) https://blogs.apache.org/hbase/entry/coprocessor_introduction https://github.com/dbist/HBaseUnitTest
... View more
Labels: