Member since
09-17-2015
436
Posts
736
Kudos Received
81
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3843 | 01-14-2017 01:52 AM | |
5742 | 12-07-2016 06:41 PM | |
6619 | 11-02-2016 06:56 PM | |
2175 | 10-19-2016 08:10 PM | |
5683 | 10-19-2016 08:05 AM |
11-23-2015
05:57 AM
6 Kudos
Use OpenTSDB Ambari service to store/visualize stock data on HDP sandbox Goal:
OpenTSDB (Scalable Time Series DB) allows you to store and serve massive amounts of time series data without losing granularity (more details here). In this tutorial we will install it on Hbase on HDP sandbox using the Ambari sevice and use it to import and visualize stock data. Steps: Setup VM and install Ambari service
Download HDP latest sandbox VM image (.ova file) from Hortonworks website Import ova file into VMWare and ensure the VM memory size is set to at least 8GB Now start the VM After it boots up, find the IP address of the VM and add an entry into your machines hosts file e.g. 192.168.191.241 sandbox.hortonworks.com sandbox
Connect to the VM via SSH (password hadoop) ssh root@sandbox.hortonworks.com
Start HBase service from Ambari and ensure Hbase is up and root has authority to create tables. You can do this by trying to create a test table hbase shell
create 't1', 'f1', 'f2', 'f3'
If this fails with the below, you will need to provide appropriate access via Ranger (http://sandbox.hortonworks.com:6080) ERROR: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions for user 'root (auth:SIMPLE)' (global, action=CREATE) To deploy the OpenTSDB service, run below VERSION=`hdp-select status hadoop-client | sed 's/hadoop-client - \([0-9]\.[0-9]\).*/\1/'`
sudo git clone https://github.com/hortonworks-gallery/ambari-opentsdb-service.git /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/OPENTSDB
Restart Ambari #on sandbox
sudo service ambari restart
#on non-sandbox clusters
sudo service ambari-server restart
sudo service ambari-agent restart
Then you can click on 'Add Service' from the 'Actions' dropdown menu in the bottom left of the Ambari dashboard: On bottom left -> Actions -> Add service -> check OpenTSDB server -> Next -> Next -> Customize as needed -> Next -> Deploy You can customize the port, ZK quorum, ZK dir in the start command. Note that Hbase must be started if the option to automatically create OpenTSDB schema is selected On successful deployment you will see the OpenTSDB service as part of Ambari stack and will be able to start/stop the service from here: You can see the parameters you configured under 'Configs' tab One benefit to wrapping the component in Ambari service is that you can now automate its deployment via Ambari blueprints or monitor/manage this service remotely via REST API export SERVICE=OPENTSDB
export PASSWORD=admin
export AMBARI_HOST=sandbox.hortonworks.com
export CLUSTER=Sandbox
#get service status
curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X GET http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/$SERVICE
#start service
curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Start $SERVICE via REST"}, "Body": {"ServiceInfo": {"state": "STARTED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/$SERVICE
#stop service
curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Stop $SERVICE via REST"}, "Body": {"ServiceInfo": {"state": "INSTALLED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/$SERVICE
To remove the OpenTSDB service:
Stop the service via Ambari
Delete the service #Ambari password
export PASSWORD=admin
#Ambari host
export AMBARI_HOST=localhost
export SERVICE=OPENTSDB
#detect name of cluster
output=`curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' http://$AMBARI_HOST:8080/api/v1/clusters`
CLUSTER=`echo $output | sed -n 's/.*"cluster_name" : "\([^\"]*\)".*/\1/p'`
curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Stop $SERVICE via REST"}, "Body": {"ServiceInfo": {"state": "INSTALLED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/$SERVICE
curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X DELETE http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/$SERVICE
Remove artifacts rm -rf /root/opentsdb
rm -rf /var/lib/ambari-server/resources/stacks/HDP/2.2/services/opentsdb-service/
Import stock data Use below sample code (taken from here) to pull 30 day intraday stock prices for a few securities in both OpenTSDB and csv formats cd
/bin/rm -f prices.csv
/bin/rm -f opentsd.input
wget https://raw.githubusercontent.com/abajwa-hw/opentsdb-service/master/scripts/google_intraday.py
python google_intraday.py AAPL > prices.csv
python google_intraday.py GOOG >> prices.csv
python google_intraday.py HDP >> prices.csv
python google_intraday.py ORCL >> prices.csv
python google_intraday.py MSFT >> prices.csv
Review opentsd.input which contains the stock proces in OpenTSDB-compatible format tail opentsd.input
Import data from opentsd.input into OpenTSDB /root/opentsdb/build/tsdb import opentsd.input --zkbasedir=/hbase-unsecure --zkquorum=localhost:2181 --auto-metric
Open WebUI and import stock data The OpenTSDB webUI login page should be at the below link (or whichever port you configured) http://sandbox.hortonworks.com:9999 Query the data in OpenTSDB webUI by entering values for:
From: pick a date from 3 weeks ago To: pick todays date Check Autoreload Metric: (e.g. volume) Tags: (e.g. symbol GOOG) You can similarly create multiple tabs Tags: symbol ORCL Tags: symbol AAPL To make the charts smoother:
Under Style tab, check the 'Smooth' checkbox Under Axes tab, check the 'Log scale' checkbox You can also open it from within Ambari via iFrame view
... View more
Labels:
11-19-2015
09:40 PM
17 Kudos
Getting started with Nifi expression language and custom Nifi processors on HDP sandbox This tutorial is part of a webinar for partners on Hortonworks DataFlow. The recording will be made available at
http://hortonworks.com/partners/learn/ Background
For a primer on HDF, you can refer to the materials here to get a basic background A basic tutorial on using Nifi on HDP sandbox is also available here Goals
Build Nifi flow to analyze Nifi's network traffic using tcpdump. Use Expression Language to extract out source/target IPs/ports Build and use custom tcpdump processor to filter Nifi's source/target IPs/ports on HDP sandbox Note that:
Nifi can be installed independent of HDP The custom processor also can be built on any machine where Java and eclipse are installed Sandbox is being used for demo purposes, to have everything in one place Pre-Requisites: Install Nifi on sandbox
The lab is designed for the HDP Sandbox. Download the HDP Sandbox here, import into VMWare Fusion and start the VM After it boots up, find the IP address of the VM and add an entry into your machines hosts file e.g. 192.168.191.241 sandbox.hortonworks.com sandbox
Connect to the VM via SSH (root/hadoop), correct the /etc/hosts entry ssh root@sandbox.hortonworks.com
Deploy Nifi Ambari service on sandbox by running below VERSION=`hdp-select status hadoop-client | sed 's/hadoop-client - \([0-9]\.[0-9]\).*/\1/'`
sudo git clone https://github.com/abajwa-hw/ambari-nifi-service.git /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/NIFI
#sandbox
service ambari restart
#non sandbox
service ambari-server restart
To install Nifi, start the 'Install Wizard': Open Ambari (http://sandbox.hortonworks.com:8080) then:
On bottom left -> Actions -> Add service -> check NiFi server -> Next -> Next -> Change any config you like (e.g. install dir, port, setup_prebuilt or values in nifi.properties) -> Next -> Deploy. This will kick off the install which will run for 5-10min. Once installed, launch Nifi by opening http://sandbox.hortonworks.com:9090/nifi Steps Explore tcpdump
Tcpdump is a common packet analyzer that runs under the command line. It allows the user to display TCP/IP and other packets being transmitted or received over a network to which the computer is attached. Full details can be found here To install tcdump on sandbox: yum install -y tcpdump
Here is a common usage: tcpdump -n -nn
On sandbox, this will output something like below for each network connection being made, showing:
which socket (i.e. IP/port) was the source (to the left of >) and which was the target (to the right of >) 08:16:15.878652 IP 192.168.191.1.49270 > 192.168.191.144.9090: Flags [.], ack 2255, win 8174, options [nop,nop,TS val 1176961367 ecr 32747195], length 0
In the example above:
the source machine was 192.168.191.1 (port 49270) and the target machine was 192.168.191.144 (port 9090) Note that since Nifi is running on port 9090, by monitoring traffic to port 9090, we will be able to capture connections made by Nifi Build tcpdump flow using ExecuteProcess and EL
Download to local laptop (not sandbox) xml template for flow that uses ExecuteProcess/EL to parse tcpdump flow from https://raw.githubusercontent.com/abajwa-hw/nifi-network-processor/master/templates/TCPDump_EL_Example.xml On the Nifi webui, import flow template:
Import template by clicking on Templates (third icon from right) which will launch the 'Nifi Flow templates' popup Browse and navigate to where ever you downloaded TCPDump_EL_Exmple.xml on your local machine Click Import. Now the template should appear in 'Nifi Flow templates' popup window Close the popup window Instantiate the 'TCPDump EL Example' dashboard template:
Drag/drop the Template icon (7th icon form left) onto the canvas so that a picklist popup appears Select 'TCPDump EL Example' and click Add Run the flow. After a few seconds you should see all the counters increase Overview of flow:
ExecuteProcess: Runs tcpdump -n -nn SplitText: split output into lines ExtractText: extract the src/dest sockets using regex Expression Language
src.socket will store socket before the > : (\d+\.\d+\.\d+\.\d+\.\d+)\s+> dest.socket will store socket after the < : >\s+(\d+\.\d+\.\d+\.\d+\.\d+) RouteOnAttribute: filter by destination socket where port is 9090
web.server.dest = ${dest.socket:endsWith(".9090")} Logattribute: log attribute Check details of what events were logged:
Open Provenance window (5th icon from top right) In top right, filter by component type : LogAttribute and click on 'Show lineage' icon of first record (near top right) Right click on Route > View details . Click the Content tab and click View Notice that the destination socket for the event shows port 9090 For more details on Nifi Expression Language see Nifi docs Stop the flow using the stop button Build custom processor for tcpdump
Setup your sandbox for development by using VNC Ambari service to install VNC/eclipse/maven
Download Ambari service for VNC (details below) VERSION=`hdp-select status hadoop-client | sed 's/hadoop-client - \([0-9]\.[0-9]\).*/\1/'`
sudo git clone https://github.com/hortonworks-gallery/ambari-vnc-service.git /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/VNCSERVER
service ambari restart
Once the status of HDFS/YARN has changed from a yellow question mark to a green check mark... Setup Eclipse on the sandbox VM and remote desktop into it using an Ambari service for VNC In Ambari open, Admin > Stacks and Services tab. You can access this viahttp://sandbox.hortonworks.com:8080/#/main/admin/stack/services Deploy the service by selecting:
VNC Server -> Add service -> Next -> Next -> Enter password (e.g. hadoop) -> Next -> Proceed Anyway -> Deploy Make sure the password is at least 6 characters or install will fail Connect to VNC from local laptop using a VNC viewer software (e.g. Tight VNC viewer or Chicken of the VNC or just your browser). Detailed steps here (Optional): To install maven manually instead: curl -o /etc/yum.repos.d/epel-apache-maven.repo https://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo
yum -y install apache-maven-3.2*
In general, when starting a new project you would use the mvn archetype to create a custom processor. Details here:https://cwiki.apache.org/confluence/display/NIFI/Maven+Projects+for+Extensions
Command to run the wizard: cd /tmp
mvn archetype:generate -DarchetypeGroupId=org.apache.nifi -DarchetypeArtifactId=nifi-processor-bundle-archetype -DarchetypeVersion=0.2.1 -DnifiVersion=0.2.1
Sample inputs to generate a maven project archetype skeleton. Define value for property 'groupId': : com.hortonworks
Define value for property 'artifactId': : nifi-network-processors
Define value for property 'version': 1.0-SNAPSHOT: :
Define value for property 'artifactBaseName': : network
Define value for property 'package': com.hortonworks.processors.network: :
This will create an archetype maven project for a custom processor with the package name, artifactId, etc specified above. In this case we will download a previously built sample and walk through what changes you would need to make to the archetype to create a basic custom processor cd
sudo git clone https://github.com/abajwa-hw/nifi-network-processor.git
Open Eclipse using the shortcut on the Desktop Import to Eclipse
File > Import > Maven > Existing Maven projects Browse > root > nifi-network-processor > OK > Finish Here is a summary of code changes made to the generated archetype to create the sample tcpdump processor:
pom.xml: add commons-io dependency (for utils) here In org.apache.nifi.processor.Processor, add the class name here In GetTcpDumpAttributes.java:
Define the tags and description using @Tags and @CapabilityDescription here e.g. //Define the processor tags and description which will be displayed on Nifi UI
@Tags({"fetch","tcpdump","tcp", "network"})
@CapabilityDescription("Reads output of tcpdump and outputs the results as a Flowfile")
These would get displayed on the 'Add processor' screen of Nifi UI
Define properties for the processor here e.g. //Define properties for the processor
public static final PropertyDescriptor MY_PROPERTY = new PropertyDescriptor
.Builder().name("My Property")
.description("Example Property")
.required(true)
.addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
.build();
These would get displayed on the 'Properties' tab of the GetTcpDumpAttributes processor:
Define relationships for the processor here e.g. //Define relationships for the processor
public static final Relationship SUCCESS_RELATIONSHIP = new Relationship.Builder()
.name("success")
.description("Success relationship")
.build();
These would get displayed on the 'Settings' tab of the GetTcpDumpAttributes processor:
Any initializations to be done when Nifi starts would be done in init() here onTrigger() is the main method to override to define the logic when a flow file is passed to our processor. This is where we parse a line of tcpdump output and store the src and destination sockets here In GetTcpDumpAttributesTest.java, you can define a Junit to test that the processor is working correctly To run maven compile:
In Eclipse, under 'Package Explorer' select 'network-analysis' and then click:
Run > Run Configurations Then double click 'Maven Build'. It will prompt you for the configuration. Enter the below:
Name: nifi-network Base dir: /root/nifi-network-processor Under 'Goals': clean package Under Maven Runtime: (scroll down to see this option). We will be adding the location of the existing mvn install using the steps below as its runs faster than using the embedded one:
Configure > Add > click ‘Directory’ and navigate to mvn install: /usr/share/apache-maven > OK > Finish > Select apache-maven > Apply > OK So your maven run configuration should look as below Click Apply > Run to start compile To run Junit to confirm processor is working correctly
In Eclipse, under Package Explorer select nifi-network-processors and then click:
Run > Run as > JUnit test After a few seconds the test should pass and you should see below (in green):
To see what happens if test does not pass, try changing the value of the dest.socket by prefixing the values with random digits (as highlighted below), save your changes and re-run JUnit
This time you will see the test fail (in red below)
Press Control-Z to undo your changes Confirm the nar file (Nifi library file for your processor) file got built by running the maven build ls -la ~/nifi-network-processor/nifi-network-nar/target/nifi-network-nar-1.0-SNAPSHOT.nar
Deploy the nar into Nifi: copy the compiled nar file into Nifi lib dir and correct permissions cp ~/nifi-network-processor/nifi-network-nar/target/nifi-network-nar-1.0-SNAPSHOT.nar /opt/nifi-1.0.0.0-7/lib/
chown nifi:hadoop /opt/nifi-1.0.0.0-7/lib/nifi-network-nar-1.0-SNAPSHOT.nar
Restart Nifi from Ambari Download to local laptop (not sandbox), the xml template for the flow that uses Custom processor to parse tcpdump flow from https://github.com/abajwa-hw/nifi-network-processor/raw/master/templates/TCPDump_Custom_Processor_Example.xml Open Nifi UI and delete the existing flow by:
Control-A to select all the components and right click on any processor and select Delete Import the custom processor flow template info Nifi:
Import template by clicking on Templates (third icon from right) which will launch the 'Nifi Flow templates' popup Browse and navigate to where ever you downloaded TCPDump_Custom_Processor_Exmple.xml on your local machine Click Import. Now the template should appear in 'Nifi Flow templates' popup window Close the popup window Instantiate the 'TCPDump_Custom_Processor_Exmple' dashboard template:
Drag/drop the Template icon (7th icon form left) onto the canvas so that a picklist popup appears Select 'TCPDump_Custom_Processor_Exmple' and click Add Run the flow. After a few seconds you should see all the counters increase Overview of flow:
ExecuteProcess: Runs tcpdump -n -nn SplitText: split output into lines GetTcpDumpAttributes: extract the src/dest sockets using the custom processor we built
src.socket will store socket before the > : (\d+\.\d+\.\d+\.\d+\.\d+)\s+> dest.socket will store socket after the < : >\s+(\d+\.\d+\.\d+\.\d+\.\d+) RouteOnAttribute: filter by destination socket where port is 9090
web.server.dest = ${dest.socket:endsWith(".9090")} Logattribute: log attribute Open Provenance window and repeat previous steps to confirm that the destination socket for the events shows port 9090 You have successfully created flows to analyze network traffic using both expression languages and also a basic custom processor Further reading
https://nifi.apache.org/developer-guide.html https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html http://www.nifi.rocks/developing-a-custom-apache-nifi-processor-json/
http://bryanbende.com/development/2015/02/04/custom-processors-for-apache-nifi/
... View more
11-12-2015
08:31 AM
1 Kudo
This used to work in older versions of Ambari (see p. 201 onwards in white paper here where we had it working) In Ambari 2.1.0 I know it was broken (see JIRA) but according to the bug details it was fixed in 2.1.1 + @jeff@hortonworks.com in case he has more on this
... View more
11-12-2015
12:47 AM
1 Kudo
@sdutta@hortonworks.com may know
... View more
11-11-2015
06:17 PM
Yup I have done similar with pyspark in Zeppelin as well so should work
... View more
11-10-2015
11:38 PM
1 Kudo
Yes Ambari team has fixed this. Just open Ambari in a new incognito window in Chrome and reload the page and it should go away
... View more
11-07-2015
10:32 PM
Would not recommend creating these scripts manually because it is specific to Amabri version and you would need to change this script every release as new services are added as start order may change. Ambari knows the right order (check the role_command_order.json files under /var/lib/ambari-server). I would recommend to just use Ambaris APIs to start all and stop all so you don't have to worry about it every release. @yusaku@hortonworks.com what do you think?
... View more
11-06-2015
08:16 PM
2 Kudos
Yes I had tried this on cluster where NSLCD was setup so cluster recognizes LDAP users- would be the same for AD/SSSD sh-4.1$ whoami
ali
sh-4.1$ hadoop fs -ls /tmp/hive/zeppelin
ls: Permission denied: user=ali, access=READ_EXECUTE, inode="/tmp/hive/zeppelin":zeppelin:hdfs:drwx------
sh-4.1$ export HADOOP_USER_NAME=hdfs
sh-4.1$ hadoop fs -ls /tmp/hive/zeppelin
Found 4 items
drwx------ - zeppelin hdfs 0 2015-09-26 17:51 /tmp/hive/zeppelin/037f5062-56ba-4efc-b438-6f349cab51e4
... View more
11-06-2015
08:03 PM
10 Kudos
There is no security without kerberos. Before anyone goes down that road, just show them this first to make sure they are ok with it # su yarn
$ whoami
yarn
$ hadoop fs -ls /tmp/hive
ls: Permission denied: user=yarn, access=READ_EXECUTE, inode="/tmp/hive":ambari-qa:hdfs:drwx-wx-wx
$ export HADOOP_USER_NAME=hdfs
$ hadoop fs -ls /tmp/hive
Found 3 items
drwx------ - ambari-qa hdfs 0 2015-11-04 13:31 /tmp/hive/ambari-qa
drwx------ - anonymous hdfs 0 2015-11-04 13:31 /tmp/hive/anonymous
drwx------ - hive hdfs 0 2015-11-02 11:15 /tmp/hive/hive
... View more
11-06-2015
07:40 PM
+ @bganesan@hortonworks.com the recommendation is always enable Kerberos/Ranger now. If someone is unwilling to do kerberos show them what happens when you set HADOOP_USER_NAME and I'm sure they will come running
... View more