1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2458 | 04-03-2024 06:39 AM | |
| 3806 | 01-12-2024 08:19 AM | |
| 2053 | 12-07-2023 01:49 PM | |
| 3037 | 08-02-2023 07:30 AM | |
| 4156 | 03-29-2023 01:22 PM |
01-24-2018
07:56 PM
if you do a PutHDFS it generates an attribute hive.ddl that can be used to create a hive table. you can also generate hive.ddl with updateattribute with your code ${hive.ddl} LOCATION '${absolute.hdfs.path}'
... View more
11-30-2016
06:38 PM
To get started with the HDCloud for AWS general availability version, visit http://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/index.html
... View more
07-15-2016
11:35 AM
1 Kudo
su hdfs
hadoop fs -mkdir /udf
hadoop fs -put urldetector-1.0-jar-with-dependencies.jar /udf/
hadoop fs -put libs/url-detector-0.1.15.jar /udf/
hadoop fs -chown -R hdfs /udf
hadoop fs -chgrp -R hdfs /udf
hadoop fs -chmod -R 775 /udf
Create Hadoop Directories and upload the two necessary libraries. CREATE FUNCTION urldetector as 'com.dataflowdeveloper.detection.URLDetector' USING JAR 'hdfs:///udf/urldetector-1.0-jar-with-dependencies.jar', JAR 'hdfs:///udf/url-detector-0.1.15.jar'; Create Hive Function with those HDFS referenced JARs select http_user_agent,urldetector(remote_host)asurls,remote_host from AccessLogs limit 100; Test the UDF via Hive QL @Description(name="urldetector", value="_FUNC_(string) - detectsurls")
public final class URLDetector extends UDF{} Java Header for the UDF set hive.cli.print.header=true;
add jar urldetector-1.0-jar-with-dependencies.jar;CREATE TEMPORARY FUNCTION urldetector as 'com.dataflowdeveloper.detection.URLDetector';select urldetector(description) from sample_07 limit 100; You can test with a temporary function through Hive CLI before making the function permanent. mvn compile assembly:single Build the Jar File for Deployment The library from LinkedIn (https://github.com/linkedin/URL-Detector) must be compiled and the JAR used in your code and deployed to Hive. References See: https://github.com/tspannhw/URLDetector for full source code.
... View more
Labels:
07-07-2016
11:23 PM
2 Kudos
Adding HDF (with Apache NiFi) to your HDP 2.5 Sandbox is very quick, painless and easy. Get the most recent Hortonworks DataFlow (download😞 wget http://d3d0kdwqv675cq.cloudfront.net/HDF/centos6/1.x/updates/1.2.0.1/HDF-1.2.0.1-1.tar.gz
tar -xvf HDF-1.2.0.1-1.tar.gz
cd HDF-1.2.0.1-1/nifi/ Then change the port used by NiFi in the conf/nifi.properties file to: nifi.web.http.port=8090 Install NiFi as a Linux Service bin/nifi.sh install
sudo service nifi start
NiFi home: /opt/HDF-1.2.0.1-1/nifi
Bootstrap Config File: /opt/HDF-1.2.0.1-1/nifi/conf/bootstrap.conf
2016-07-04 02:18:00,005 INFO [main] org.apache.nifi.bootstrap.Command Starting Apache NiFi...
2016-07-04 02:18:00,006 INFO [main] org.apache.nifi.bootstrap.Command Working Directory: /opt/HDF-1.2.0.1-1/nifi
You can check the status of single NiFi server via status command: [root@sandbox nifi]# sudo service nifi status
nifi.sh: JAVA_HOME not set; results may vary
Java home:
NiFi home: /opt/HDF-1.2.0.1-1/nifi
Bootstrap Config File: /opt/HDF-1.2.0.1-1/nifi/conf/bootstrap.conf
2016-07-04 02:18:42,527 INFO [main] org.apache.nifi.bootstrap.Command Apache NiFi is currently running, listening to Bootstrap on port 43184, PID=4391
Make sure you add port 8090 to the sandbox networking. You are now ready to go. Now start flowing.
... View more
Labels:
07-07-2016
07:50 PM
2 Kudos
Using Yahoo Kafka Manager Git clone the project (you need Java 8 to build). Then use SBT to do a clean distribution. This will take a while as it downloads a lot of jars. <code>kafka-manager.zkhosts="sandbox.hortonworks.com:2181"
The build will produce a Zip file, unzip it, update configuration file (conf/application.conf) and then you can run it. ../kafka-manager/target/universal/kafka-manager-1.3.0.8.zip
unzip ../kafka-manager/target/universal/kafka-manager-1.3.0.8.zip
kafka-manager-1.3.0.8 git:(master) ✗ vi conf/application.conf
kafka-manager-1.3.0.8 git:(master) ✗ bin/kafka-manager -Dconfig.file=conf/application.conf
Access the Kafka Manager from Chrome http://localhost:9000/
Running Kafka Manager
Resources https://github.com/yahoo/kafka-manager http://edbaker.weebly.com/blog/install-and-evaluation-of-yahoos-kafka-manager http://chennaihug.org/knowledgebase/yahoo-kafka-manager/ https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem Tools For Testing Kafka with a command-line client producer/consumer: https://github.com/edenhill/kafkacat (brew install kafkacat) For External Access You may need to set advertised.host.name http://stackoverflow.com/questions/31476679/send-kafkaproducer-from-local-machine-to-hortonworks-sandbox-on-virtualbox
... View more
Labels:
07-07-2016
07:50 PM
1 Kudo
From the Sandbox as Root /usr/hdp/current/kafka-broker/bin/kafka-topics.sh
--create --zookeeper sandbox.hortonworks.com:2181 --replication-factor 1
--partitions 1 --topic people Test The Topic [root@sandbox kafka]# /usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --topic people --zookeeper sandbox.hortonworks.com:2181
{metadata.broker.list=sandbox.hortonworks.com:6667, request.timeout.ms=30000, client.id=console-consumer-10628, security.protocol=PLAINTEXT}
Resources: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_secure-kafka-ambari/content/ch_secure-kafka-create-topics.html https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_secure-kafka-ambari/content/ch_secure-kafka-produce-events.html
... View more
Labels:
07-07-2016
06:04 PM
2 Kudos
In HDP 2.5 Sandbox, I did a quick walk through. First thing I liked was the Visualization: The Data Explorer provided a nice query tool to view tables and graphs. The Data Visualization tab provides some nice graphing capabilities. After you run your queries you can look at the Tez results to see how it ran, it's a nice way to see what you may need to optimize. The Hive Ambari View is getting to be a very solid tool for working with Hive. From DDL (creating tables is easy) to viewing data, to updates and inserts.
... View more
Labels:
07-04-2016
05:11 PM
3 Kudos
This tutorial is great: https://github.com/hortonworks-gallery/ambari-vnc-service Eclipse Plugin https://github.com/winghc/hadoop2x-eclipse-plugin JDK 7 is best for most use case and Scala 2.10. Maven and SBT are necessary as well. Setup your Environment https://dzone.com/articles/spark-and-scala-resources https://dzone.com/articles/whats-on-your-laptop Lots of options: This is an eclipse project for Hbase Coprocessor https://github.com/tspannhw/hbasecoprocessor Artem has a great project for testing https://github.com/dbist/HBaseUnitTest Once all the ports are open and not firewalled it’s usually straight forward. Eclipse to Spark https://community.hortonworks.com/questions/36354/eclipse-to-sandbox-1.html https://community.hortonworks.com/questions/32567/scala-with-hive-in-ecplipse-scala.html Hadoop Eclipse Plugin https://community.hortonworks.com/questions/10404/hadoop-eclipse-plugin.html IntelliJ Project for Spark https://github.com/agilemobiledev/sparkworkshop https://community.hortonworks.com/questions/31077/how-to-setup-intellij-idea-16-to-run-hortonworks-s.html IntelliJ Settings https://community.hortonworks.com/questions/37410/recommended-idea-intellij-vmoptions-setting-for-de.html These configuration files must be in project or class path: core-site.xml hdfs-site.xml yarn-site.xml Add Jars for Access http://nivemaham.com/index.php/technical/22-java/hadoop/40-how-to-use-ide-for-hadoop-development-with-hortonworks-sandbox For Apache Kylin development http://kylin.apache.org/development/dev_env.html Remote Debugging Spark https://nicolasmaillard.com/2016/02/06/remote-debugging-201-spark/ Testing with Hadoop MiniClusters https://github.com/sakserv/hadoop-mini-clusters
... View more
07-26-2018
02:26 PM
My Yarn UI is kerberos enabled. getHTTP complaining about 401 authentication error. Is there any work around for this?
... View more
07-01-2016
08:32 PM
1 Kudo
I follow the examples from 3 Pillar Global Post and Apache Hbase Blog Post and then updated for newer versions. To write an HBase Coprocessor you need Google Protocol Buffers. To use recent versions, on a Mac you need to install v2.50 as that works with HBase: brew tap homebrew/versions
brew install protobuf250
Check your version protoc
--version Source Code with Maven Build (pom.xml) is here. You will need Maven and Java 7 (or newer) JDK for compilation. Testing on Hadoop export HADOOP_CLASSPATH=`hbase classpath`
hadoop jar hbasecoprocessor-1.0.jar com.dataflowdeveloper.hbasecoprocessor.SumEndPoint
Upload your Jar to HDFS hadoop fs -mkdir /user/tspann
hadoop fs -ls /user/tspann hadoop fs -put hbasecoprocessor-1.0.jar /user/tspann hadoop fs -chmod 777 /user/tspann/hbasecoprocessor-1.0.jar Install Dynamically disable 'stocks'
alter 'stocks',
'coprocessor'=>'hdfs://sandbox.hortonworks.com/user/tspann/hbasecoprocessor-1.0.jar|com.dataflowdeveloper.hbasecoprocessor.SumEndPoint|1001|arg1=1'
enable 'stocks'
describe 'stocks'
Testing Locally java -classpath `hbase classpath`:hbasecoprocessor-1.0.jar com.dataflowdeveloper.hbasecoprocessor.SumEndPoint Checking Table After Installation [root@sandbox demo]# hbase shell
HBase Shell; enter
Version
1.1.2.2.4.0.0-169, r61dfb2b344f424a11f93b3f086eab815c1eb0b6a, Wed Feb 10
07:08:51 UTC 2016
hbase(main):001:0> describe 'stocks'
Table stocks is
ENABLED
stocks,
{TABLE_ATTRIBUTES => {coprocessor$1 =>
'hdfs://sandbox.hortonworks.com/user/tspann/hbasecoprocessor-1.0.jar|com.dataflowdeveloper.hbasecoprocessor.SumEndPoint|1001|arg1=1'} COLUMN FAMILIES
DESCRIPTION {NAME => 'cf',DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE
=> '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => 'FOREVER',
MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE =>
'65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
1 row(s) in 0.3270
seconds
You can see the coprocessor has been added and is enabled.
References
https://github.com/larsgeorge/hbase-book/blob/master/ch04/src/main/java/coprocessor/RowCountEndpoint.java https://github.com/Huawei-Hadoop/hindex https://github.com/apache/hbase/tree/branch-1.0/hbase-examples https://github.com/apache/hbase/blob/branch-1.0/hbase-examples/src/main/java/org/apache/hadoop/hbase/coprocessor/example/RowCountEndpoint.java http://bigdatazone.blogspot.com/2015/05/hbase-coprocessor-using-protobuf-250.html http://hbase.apache.org/book.html#cp http://hbase.apache.org/book.html#cp_loading https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/package-summary.html http://hbase.apache.org/book.html#cp_example http://hbase.apache.org/xref/org/apache/hadoop/hbase/coprocessor/example/RowCountEndpoint.html https://github.com/apache/hbase/blob/branch-1.1/hbase-examples/pom.xml https://community.hortonworks.com/questions/2577/hbase-coprocessor-and-security.html https://www.3pillarglobal.com/insights/hbase-coprocessors#(endpoints-coprocessor) https://blogs.apache.org/hbase/entry/coprocessor_introduction https://github.com/dbist/HBaseUnitTest
... View more
Labels: