About abajwa

abajwa · ‎12-11-2015

Your Ambari throwing a weird error and you need to troubleshoot? No problem....just install this view and browse this forum from Ambari itself! 1. Install the view: wget https://github.com/hortonworks-gallery/ambari-iframe-view/raw/master/samples/answerhub-view-1.0-SNAPSHOT.jar -P /var/lib/ambari-server/resources/views/ 2. Then restart Ambari: #on sandbox service ambari restart #on non-sandbox service ambari-server restart 3. Now open the view:

abajwa · ‎12-04-2015

@Dhruv Kumar instead of pasting the raw .md file in AH, try copy/pasting the content from the rendered README.md into AH: it will preserve your github formatting

abajwa · ‎11-30-2015

Automated deployment of a fresh HDP cluster that includes Zeppelin (via blueprints) Background: The Zeppelin Ambari service has been updated and now supports installing latest Zeppelin version (0.5.5) on HDP using blueprints to automate creation of 'data science' cluster. SequenceIQ team have a datascientist blueprint that installs Zeppelin, but based on my conversations with @lpapp, its based on an older version of the Ambari service so does not install the latest or support as many options. Below is a writeup of how to deploy Zeppelin via blueprints. Note that if you already have a cluster running, you should just use the Add service wizard in Ambari to deploy Zeppelin using the steps on the github Purpose: Sample steps below for installing a 4-node HDP cluster that includes Zeppelin, using Ambari blueprints and Ambari bootstrap scripts (by @Sean Roberts) Pre-reqs: Bring up 4 VMs imaged with RHEL/CentOS 6.5 or later (e.g. called node1-4 in this case). Note that the VMs should not already have HDP related software installed on them at this point. Steps: On non-ambari nodes (e.g. nodes2-4), use Ambari bootstrap script to run pre-reqs, install ambari-agents and point them to ambari node (e.g. node1 in this case) export ambari_server=node1 curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh On Ambari node (e.g. node1), use bootstrap script to run pre-reqs and install ambari-server export install_ambari_server=true curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh yum install -y git git clone https://github.com/hortonworks-gallery/ambari-zeppelin-service.git /var/lib/ambari-server/resources/stacks/HDP/2.3/services/ZEPPELIN Edit the /var/lib/ambari-server/resources/stacks/HDP/2.3/role_command_order.json file to include below: "ZEPPELIN_MASTER-START": ["NAMENODE-START", "DATANODE-START"], Note that comma at the end. If you insert the above as the last line, you need to remove the comma Restart Ambari service ambari-server restart service ambari-agent restart Confirm 4 agents were registered and agent remained up curl -u admin:admin -H X-Requested-By:ambari http://localhost:8080/api/v1/hosts service ambari-agent status (Optional) - In general, you can generate a BP and cluster file for your cluster via Ambari recommendations API using these steps. However in this example we are providing some sample blueprints which you can edit, so this is not needed. These for reference only. For more details on the bootstrap scripts see bootstrap script git yum install -y python-argparse git clone https://github.com/seanorama/ambari-bootstrap.git #optional - limit the services for faster deployment #for minimal services export ambari_services="HDFS MAPREDUCE2 YARN ZOOKEEPER HIVE ZEPPELIN" #for most services #export ambari_services="ACCUMULO FALCON FLUME HBASE HDFS HIVE KAFKA KNOX MAHOUT OOZIE PIG SLIDER SPARK SQOOP MAPREDUCE2 STORM TEZ YARN ZOOKEEPER ZEPPELIN" export deploy=false cd ambari-bootstrap/deploy bash ./deploy-recommended-cluster.bash cd tmpdir* #edit the blueprint to customize as needed. You can use sample blueprints provided below to see how to add the custom services. vi blueprint.json #edit cluster file if needed vi cluster.json Download either minimal or full blueprint for 4 node setup #Pick one of the below blueprints #for minimal services download this one wget https://raw.githubusercontent.com/hortonworks-gallery/ambari-zeppelin-service/master/blueprint-4node-zeppelin-minimal.json -O blueprint-zeppelin.json #for most services download this one wget https://raw.githubusercontent.com/hortonworks-gallery/ambari-zeppelin-service/master/blueprint-4node-zeppelin-all.json -O blueprint-zeppelin.json (optional) If running on single node, download minimal blueprint for 1 node setup #Pick one of the below blueprints #for minimal services download this one wget https://raw.githubusercontent.com/hortonworks-gallery/ambari-zeppelin-service/master/blueprint-1node-zeppelin-minimal.json -O blueprint-zeppelin.json (optional) If needed, change the Zeppelin configs based on your setup by modifying these lines vi blueprint-zeppelin.json if deploying on public cloud, you will want to add "zeppelin.host.publicname":"<public IP or hostname of zeppelin node>" so the Zeppelin Ambari view is pointing to external hostname (instead of the internal name, which is the default) Upload selected blueprint and download a sample cluster.json that provides your host FQDN's. Modify the host FQDN's in the cluster.json file your own env. Finally deploy cluster and call it zeppelinCluster #upload the blueprint to Ambari curl -u admin:admin -H X-Requested-By:ambari http://localhost:8080/api/v1/blueprints/zeppelinBP -d @blueprint-zeppelin.json download sample cluster.json #for 4 node setup wget https://raw.githubusercontent.com/hortonworks-gallery/ambari-zeppelin-service/master/cluster-4node.json -O cluster.json #for single node setup wget https://raw.githubusercontent.com/hortonworks-gallery/ambari-zeppelin-service/master/cluster-1node.json -O cluster.json modify the host FQDNs in the cluster json file with your own. Also change the default_password to set the password for hive vi cluster.json deploy the cluster curl -u admin:admin -H X-Requested-By:ambari http://localhost:8080/api/v1/clusters/zeppelinCluster -d @cluster.json You can monitor the progress of the deployment via Ambari (e.g. http://node1:8080). Once install completes, you will have a 4 node HDP cluster including Zeppelin, along with some starter demo Zeppelin notebooks from the gallery github More details available on the github README here Similar steps are available here to deploy a 'security ready' cluster including demo KDC, OpenLDAP, NSLCD services.

abajwa · ‎11-25-2015

yup, very useful - if you go to http://sandbox.hortonworks.com:8888 and click advanced you will see this and other useful url links

abajwa · ‎11-25-2015

Some interesting talks for Hadoop Summit 2016 (EMEA). Please vote: the deadline is Dec 15 and you can vote for as many talks as you want Hadoop governance, security, deployment, operations 1. Rolling & Express Upgrades of the Stack in Ambari https://hadoopsummit.uservoice.com/forums/332061-hadoop-governance-security-deployment-and-operat/suggestions/10848390-rolling-express-upgrades-of-the-stack-in-ambari Data science 0. Analyzing Hollywood with Spark and Hadoop: https://hadoopsummit.uservoice.com/forums/332055-data-science-applications-for-hadoop/suggestions/10846959-analyzing-hollywood-with-spark-and-hadoop 1. Magellan (how to scale geospatial joins using Magellan): https://hadoopsummit.uservoice.com/forums/332079-the-future-of-apache-hadoop/suggestions/10848510-magellan-spark-as-a-geospatial-analytics-engine 2. Online Learning (how to do machine learning online and fast while being accurate): https://hadoopsummit.uservoice.com/forums/332082-hadoop-and-the-internet-of-things/suggestions/10847973-fast-distributed-online-classification-and-cluster 3. MLLeap (joint work with Truecar on migrating machine learning pipelines from offline modeling to online scoring): https://hadoopsummit.uservoice.com/forums/332055-data-science-applications-for-hadoop/suggestions/10847055-mlleap-or-how-to-productionize-data-science-workf 4. Entity Disambiguation (how to apply machine learning techniques to resolve entities): https://hadoopsummit.uservoice.com/forums/332055-data-science-applications-for-hadoop/suggestions/10847028-entity-disambiguation-scalably-mixing-public-and 5. Monte Carlo simulations using Spark: https://hadoopsummit.uservoice.com/forums/332055-data-science-applications-for-hadoop/suggestions/10847025-monte-carlo-simulation-for-ad-lift-measurement-usi 6. Finding Outliers with Spark and Storm: Guide to Keeping Your Sanity: https://hadoopsummit.uservoice.com/forums/332055-data-science-applications-for-hadoop/suggestions/10846992-finding-outliers-with-spark-and-storm-guide-to-ke Committer Track 1. Running Storm with 5 9's availability https://hadoopsummit.uservoice.com/forums/332931-apache-committer-insights/suggestions/10845702-running-storm-with-5-9-s-availability 2. The Future of Apache Storm https://hadoopsummit.uservoice.com/forums/332931-apache-committer-insights/suggestions/10845747-the-future-of-apache-storm Application Track 1. DataCube - Yahoo!'s Next-Generation Near Real-Time Ads Targeting Platform https://hadoopsummit.uservoice.com/forums/332076-applications-of-hadoop-and-the-data-driven-busines/suggestions/10846542-datacube-yahoo-s-next-generation-near-real-time 2. Practical Complex Event Processing with Storm https://hadoopsummit.uservoice.com/forums/332076-applications-of-hadoop-and-the-data-driven-busines/suggestions/10846485-practical-complex-event-processing-with-storm Hadoop and the Internet of Things 1. Streaming SQL on Storm https://hadoopsummit.uservoice.com/forums/332082-hadoop-and-the-internet-of-things/suggestions/10847967-streaming-sql-on-storm 2. From Device to Data Center to Insights: Architectural Considerations for the Internet of Anything https://hadoopsummit.uservoice.com/forums/332082-hadoop-and-the-internet-of-things/suggestions/10847958-from-device-to-data-center-to-insights-architectu

abajwa · ‎11-23-2015

Use OpenTSDB Ambari service to store/visualize stock data on HDP sandbox Goal: OpenTSDB (Scalable Time Series DB) allows you to store and serve massive amounts of time series data without losing granularity (more details here). In this tutorial we will install it on Hbase on HDP sandbox using the Ambari sevice and use it to import and visualize stock data. Steps: Setup VM and install Ambari service Download HDP latest sandbox VM image (.ova file) from Hortonworks website Import ova file into VMWare and ensure the VM memory size is set to at least 8GB Now start the VM After it boots up, find the IP address of the VM and add an entry into your machines hosts file e.g. 192.168.191.241 sandbox.hortonworks.com sandbox Connect to the VM via SSH (password hadoop) ssh root@sandbox.hortonworks.com Start HBase service from Ambari and ensure Hbase is up and root has authority to create tables. You can do this by trying to create a test table hbase shell create 't1', 'f1', 'f2', 'f3' If this fails with the below, you will need to provide appropriate access via Ranger (http://sandbox.hortonworks.com:6080) ERROR: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions for user 'root (auth:SIMPLE)' (global, action=CREATE) To deploy the OpenTSDB service, run below VERSION=`hdp-select status hadoop-client | sed 's/hadoop-client - $[0-9]\.[0-9]$.*/\1/'` sudo git clone https://github.com/hortonworks-gallery/ambari-opentsdb-service.git /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/OPENTSDB Restart Ambari #on sandbox sudo service ambari restart #on non-sandbox clusters sudo service ambari-server restart sudo service ambari-agent restart Then you can click on 'Add Service' from the 'Actions' dropdown menu in the bottom left of the Ambari dashboard: On bottom left -> Actions -> Add service -> check OpenTSDB server -> Next -> Next -> Customize as needed -> Next -> Deploy You can customize the port, ZK quorum, ZK dir in the start command. Note that Hbase must be started if the option to automatically create OpenTSDB schema is selected On successful deployment you will see the OpenTSDB service as part of Ambari stack and will be able to start/stop the service from here: You can see the parameters you configured under 'Configs' tab One benefit to wrapping the component in Ambari service is that you can now automate its deployment via Ambari blueprints or monitor/manage this service remotely via REST API export SERVICE=OPENTSDB export PASSWORD=admin export AMBARI_HOST=sandbox.hortonworks.com export CLUSTER=Sandbox #get service status curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X GET http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/$SERVICE #start service curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Start $SERVICE via REST"}, "Body": {"ServiceInfo": {"state": "STARTED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/$SERVICE #stop service curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Stop $SERVICE via REST"}, "Body": {"ServiceInfo": {"state": "INSTALLED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/$SERVICE To remove the OpenTSDB service: Stop the service via Ambari Delete the service #Ambari password export PASSWORD=admin #Ambari host export AMBARI_HOST=localhost export SERVICE=OPENTSDB #detect name of cluster output=`curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' http://$AMBARI_HOST:8080/api/v1/clusters` CLUSTER=`echo $output | sed -n 's/.*"cluster_name" : "$[^\"]*$".*/\1/p'` curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Stop $SERVICE via REST"}, "Body": {"ServiceInfo": {"state": "INSTALLED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/$SERVICE curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X DELETE http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/$SERVICE Remove artifacts rm -rf /root/opentsdb rm -rf /var/lib/ambari-server/resources/stacks/HDP/2.2/services/opentsdb-service/ Import stock data Use below sample code (taken from here) to pull 30 day intraday stock prices for a few securities in both OpenTSDB and csv formats cd /bin/rm -f prices.csv /bin/rm -f opentsd.input wget https://raw.githubusercontent.com/abajwa-hw/opentsdb-service/master/scripts/google_intraday.py python google_intraday.py AAPL > prices.csv python google_intraday.py GOOG >> prices.csv python google_intraday.py HDP >> prices.csv python google_intraday.py ORCL >> prices.csv python google_intraday.py MSFT >> prices.csv Review opentsd.input which contains the stock proces in OpenTSDB-compatible format tail opentsd.input Import data from opentsd.input into OpenTSDB /root/opentsdb/build/tsdb import opentsd.input --zkbasedir=/hbase-unsecure --zkquorum=localhost:2181 --auto-metric Open WebUI and import stock data The OpenTSDB webUI login page should be at the below link (or whichever port you configured) http://sandbox.hortonworks.com:9999 Query the data in OpenTSDB webUI by entering values for: From: pick a date from 3 weeks ago To: pick todays date Check Autoreload Metric: (e.g. volume) Tags: (e.g. symbol GOOG) You can similarly create multiple tabs Tags: symbol ORCL Tags: symbol AAPL To make the charts smoother: Under Style tab, check the 'Smooth' checkbox Under Axes tab, check the 'Log scale' checkbox You can also open it from within Ambari via iFrame view

abajwa · ‎11-19-2015

Getting started with Nifi expression language and custom Nifi processors on HDP sandbox This tutorial is part of a webinar for partners on Hortonworks DataFlow. The recording will be made available at http://hortonworks.com/partners/learn/ Background For a primer on HDF, you can refer to the materials here to get a basic background A basic tutorial on using Nifi on HDP sandbox is also available here Goals Build Nifi flow to analyze Nifi's network traffic using tcpdump. Use Expression Language to extract out source/target IPs/ports Build and use custom tcpdump processor to filter Nifi's source/target IPs/ports on HDP sandbox Note that: Nifi can be installed independent of HDP The custom processor also can be built on any machine where Java and eclipse are installed Sandbox is being used for demo purposes, to have everything in one place Pre-Requisites: Install Nifi on sandbox The lab is designed for the HDP Sandbox. Download the HDP Sandbox here, import into VMWare Fusion and start the VM After it boots up, find the IP address of the VM and add an entry into your machines hosts file e.g. 192.168.191.241 sandbox.hortonworks.com sandbox Connect to the VM via SSH (root/hadoop), correct the /etc/hosts entry ssh root@sandbox.hortonworks.com Deploy Nifi Ambari service on sandbox by running below VERSION=`hdp-select status hadoop-client | sed 's/hadoop-client - $[0-9]\.[0-9]$.*/\1/'` sudo git clone https://github.com/abajwa-hw/ambari-nifi-service.git /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/NIFI #sandbox service ambari restart #non sandbox service ambari-server restart To install Nifi, start the 'Install Wizard': Open Ambari (http://sandbox.hortonworks.com:8080) then: On bottom left -> Actions -> Add service -> check NiFi server -> Next -> Next -> Change any config you like (e.g. install dir, port, setup_prebuilt or values in nifi.properties) -> Next -> Deploy. This will kick off the install which will run for 5-10min. Once installed, launch Nifi by opening http://sandbox.hortonworks.com:9090/nifi Steps Explore tcpdump Tcpdump is a common packet analyzer that runs under the command line. It allows the user to display TCP/IP and other packets being transmitted or received over a network to which the computer is attached. Full details can be found here To install tcdump on sandbox: yum install -y tcpdump Here is a common usage: tcpdump -n -nn On sandbox, this will output something like below for each network connection being made, showing: which socket (i.e. IP/port) was the source (to the left of >) and which was the target (to the right of >) 08:16:15.878652 IP 192.168.191.1.49270 > 192.168.191.144.9090: Flags [.], ack 2255, win 8174, options [nop,nop,TS val 1176961367 ecr 32747195], length 0 In the example above: the source machine was 192.168.191.1 (port 49270) and the target machine was 192.168.191.144 (port 9090) Note that since Nifi is running on port 9090, by monitoring traffic to port 9090, we will be able to capture connections made by Nifi Build tcpdump flow using ExecuteProcess and EL Download to local laptop (not sandbox) xml template for flow that uses ExecuteProcess/EL to parse tcpdump flow from https://raw.githubusercontent.com/abajwa-hw/nifi-network-processor/master/templates/TCPDump_EL_Example.xml On the Nifi webui, import flow template: Import template by clicking on Templates (third icon from right) which will launch the 'Nifi Flow templates' popup Browse and navigate to where ever you downloaded TCPDump_EL_Exmple.xml on your local machine Click Import. Now the template should appear in 'Nifi Flow templates' popup window Close the popup window Instantiate the 'TCPDump EL Example' dashboard template: Drag/drop the Template icon (7th icon form left) onto the canvas so that a picklist popup appears Select 'TCPDump EL Example' and click Add Run the flow. After a few seconds you should see all the counters increase Overview of flow: ExecuteProcess: Runs tcpdump -n -nn SplitText: split output into lines ExtractText: extract the src/dest sockets using regex Expression Language src.socket will store socket before the > : (\d+\.\d+\.\d+\.\d+\.\d+)\s+> dest.socket will store socket after the < : >\s+(\d+\.\d+\.\d+\.\d+\.\d+) RouteOnAttribute: filter by destination socket where port is 9090 web.server.dest = ${dest.socket:endsWith(".9090")} Logattribute: log attribute Check details of what events were logged: Open Provenance window (5th icon from top right) In top right, filter by component type : LogAttribute and click on 'Show lineage' icon of first record (near top right) Right click on Route > View details . Click the Content tab and click View Notice that the destination socket for the event shows port 9090 For more details on Nifi Expression Language see Nifi docs Stop the flow using the stop button Build custom processor for tcpdump Setup your sandbox for development by using VNC Ambari service to install VNC/eclipse/maven Download Ambari service for VNC (details below) VERSION=`hdp-select status hadoop-client | sed 's/hadoop-client - $[0-9]\.[0-9]$.*/\1/'` sudo git clone https://github.com/hortonworks-gallery/ambari-vnc-service.git /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/VNCSERVER service ambari restart Once the status of HDFS/YARN has changed from a yellow question mark to a green check mark... Setup Eclipse on the sandbox VM and remote desktop into it using an Ambari service for VNC In Ambari open, Admin > Stacks and Services tab. You can access this viahttp://sandbox.hortonworks.com:8080/#/main/admin/stack/services Deploy the service by selecting: VNC Server -> Add service -> Next -> Next -> Enter password (e.g. hadoop) -> Next -> Proceed Anyway -> Deploy Make sure the password is at least 6 characters or install will fail Connect to VNC from local laptop using a VNC viewer software (e.g. Tight VNC viewer or Chicken of the VNC or just your browser). Detailed steps here (Optional): To install maven manually instead: curl -o /etc/yum.repos.d/epel-apache-maven.repo https://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo yum -y install apache-maven-3.2* In general, when starting a new project you would use the mvn archetype to create a custom processor. Details here:https://cwiki.apache.org/confluence/display/NIFI/Maven+Projects+for+Extensions Command to run the wizard: cd /tmp mvn archetype:generate -DarchetypeGroupId=org.apache.nifi -DarchetypeArtifactId=nifi-processor-bundle-archetype -DarchetypeVersion=0.2.1 -DnifiVersion=0.2.1 Sample inputs to generate a maven project archetype skeleton. Define value for property 'groupId': : com.hortonworks Define value for property 'artifactId': : nifi-network-processors Define value for property 'version': 1.0-SNAPSHOT: : Define value for property 'artifactBaseName': : network Define value for property 'package': com.hortonworks.processors.network: : This will create an archetype maven project for a custom processor with the package name, artifactId, etc specified above. In this case we will download a previously built sample and walk through what changes you would need to make to the archetype to create a basic custom processor cd sudo git clone https://github.com/abajwa-hw/nifi-network-processor.git Open Eclipse using the shortcut on the Desktop Import to Eclipse File > Import > Maven > Existing Maven projects Browse > root > nifi-network-processor > OK > Finish Here is a summary of code changes made to the generated archetype to create the sample tcpdump processor: pom.xml: add commons-io dependency (for utils) here In org.apache.nifi.processor.Processor, add the class name here In GetTcpDumpAttributes.java: Define the tags and description using @Former Member and @CapabilityDescription here e.g. //Define the processor tags and description which will be displayed on Nifi UI @Tags({"fetch","tcpdump","tcp", "network"}) @CapabilityDescription("Reads output of tcpdump and outputs the results as a Flowfile") These would get displayed on the 'Add processor' screen of Nifi UI Define properties for the processor here e.g. //Define properties for the processor public static final PropertyDescriptor MY_PROPERTY = new PropertyDescriptor .Builder().name("My Property") .description("Example Property") .required(true) .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) .build(); These would get displayed on the 'Properties' tab of the GetTcpDumpAttributes processor: Define relationships for the processor here e.g. //Define relationships for the processor public static final Relationship SUCCESS_RELATIONSHIP = new Relationship.Builder() .name("success") .description("Success relationship") .build(); These would get displayed on the 'Settings' tab of the GetTcpDumpAttributes processor: Any initializations to be done when Nifi starts would be done in init() here onTrigger() is the main method to override to define the logic when a flow file is passed to our processor. This is where we parse a line of tcpdump output and store the src and destination sockets here In GetTcpDumpAttributesTest.java, you can define a Junit to test that the processor is working correctly To run maven compile: In Eclipse, under 'Package Explorer' select 'network-analysis' and then click: Run > Run Configurations Then double click 'Maven Build'. It will prompt you for the configuration. Enter the below: Name: nifi-network Base dir: /root/nifi-network-processor Under 'Goals': clean package Under Maven Runtime: (scroll down to see this option). We will be adding the location of the existing mvn install using the steps below as its runs faster than using the embedded one: Configure > Add > click ‘Directory’ and navigate to mvn install: /usr/share/apache-maven > OK > Finish > Select apache-maven > Apply > OK So your maven run configuration should look as below Click Apply > Run to start compile To run Junit to confirm processor is working correctly In Eclipse, under Package Explorer select nifi-network-processors and then click: Run > Run as > JUnit test After a few seconds the test should pass and you should see below (in green): To see what happens if test does not pass, try changing the value of the dest.socket by prefixing the values with random digits (as highlighted below), save your changes and re-run JUnit This time you will see the test fail (in red below) Press Control-Z to undo your changes Confirm the nar file (Nifi library file for your processor) file got built by running the maven build ls -la ~/nifi-network-processor/nifi-network-nar/target/nifi-network-nar-1.0-SNAPSHOT.nar Deploy the nar into Nifi: copy the compiled nar file into Nifi lib dir and correct permissions cp ~/nifi-network-processor/nifi-network-nar/target/nifi-network-nar-1.0-SNAPSHOT.nar /opt/nifi-1.0.0.0-7/lib/ chown nifi:hadoop /opt/nifi-1.0.0.0-7/lib/nifi-network-nar-1.0-SNAPSHOT.nar Restart Nifi from Ambari Download to local laptop (not sandbox), the xml template for the flow that uses Custom processor to parse tcpdump flow from https://github.com/abajwa-hw/nifi-network-processor/raw/master/templates/TCPDump_Custom_Processor_Example.xml Open Nifi UI and delete the existing flow by: Control-A to select all the components and right click on any processor and select Delete Import the custom processor flow template info Nifi: Import template by clicking on Templates (third icon from right) which will launch the 'Nifi Flow templates' popup Browse and navigate to where ever you downloaded TCPDump_Custom_Processor_Exmple.xml on your local machine Click Import. Now the template should appear in 'Nifi Flow templates' popup window Close the popup window Instantiate the 'TCPDump_Custom_Processor_Exmple' dashboard template: Drag/drop the Template icon (7th icon form left) onto the canvas so that a picklist popup appears Select 'TCPDump_Custom_Processor_Exmple' and click Add Run the flow. After a few seconds you should see all the counters increase Overview of flow: ExecuteProcess: Runs tcpdump -n -nn SplitText: split output into lines GetTcpDumpAttributes: extract the src/dest sockets using the custom processor we built src.socket will store socket before the > : (\d+\.\d+\.\d+\.\d+\.\d+)\s+> dest.socket will store socket after the < : >\s+(\d+\.\d+\.\d+\.\d+\.\d+) RouteOnAttribute: filter by destination socket where port is 9090 web.server.dest = ${dest.socket:endsWith(".9090")} Logattribute: log attribute Open Provenance window and repeat previous steps to confirm that the destination socket for the events shows port 9090 You have successfully created flows to analyze network traffic using both expression languages and also a basic custom processor Further reading https://nifi.apache.org/developer-guide.html https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html http://www.nifi.rocks/developing-a-custom-apache-nifi-processor-json/ http://bryanbende.com/development/2015/02/04/custom-processors-for-apache-nifi/

abajwa · ‎11-06-2015

@Andrew Grande thanks updated. @Jonas Straub no haven't tested it on larger datasets yet - almost afraid to 😉

abajwa · ‎11-06-2015

There are new Data Visualization, Data Explorer tabs that come part of Hive view which are really nice. I found this by accident so thought there may be others who are probably not aware of this either. Environment details: Should work in vanilla cluster/sandbox as well but in my case the env is as below (setup using steps here😞 Kerborized HDP 2.3.2 w/ Ranger installed Secure Ambari 2.1.2 (authenticating to users in IPA LDAP) The sample salary data from sandbox has also been imported and was used for visualization 1. After logging to Ambari 2.1.2, open the Hive view (in this case I had to create a new instance of the view configured for kerberos). 2. Click the Data Visualization tab on the right and drag/drop description and salary fields onto x, y fields 3. Select from the various chart options to change the chart 4. Click transpose button: 5. Navigate to Data Explorer tab to explore your data using the different fields So all in all: visualizations look good, are responsive (at least on this dataset), and seem to work without issue on kerborized cluster!

abajwa · ‎11-03-2015

Exploring Apache Flink with HDP Apache Flink is an open source platform for distributed stream and batch data processing. More details on Flink and how it is being used in the industry today available here: http://flink-forward.org/?post_type=session. There are a few ways you can explore Flink on HDP 2.3: 1. Compilation on HDP 2.3.2 To compile Flink from source on HDP 2.3 you can use these commands: curl -o /etc/yum.repos.d/epel-apache-maven.repo https://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo yum -y install apache-maven-3.2* git clone https://github.com/apache/flink.git cd flink mvn clean install -DskipTests -Dhadoop.version=2.7.1.2.3.2.0-2950 -Pvendor-repos Note that with this option I ran into a classpath bug and raised it here: https://issues.apache.org/jira/browse/FLINK-3032 2. Run using precompiledtarball wget http://www.gtlib.gatech.edu/pub/apache/flink/flink-0.9.1/flink-0.9.1-bin-hadoop27.tgz tar xvzf flink-0.9.1-bin-hadoop27.tgzcd flink-0.9.1 export HADOOP_CONF_DIR=/etc/hadoop/conf./bin/yarn-session.sh -n 1 -jm 768 -tm 1024 3. Using Ambari service (demo purposes only for now) The Ambari service lets you easily install/compile Flink on HDP 2.3 Features: By default, downloads prebuilt package of Flink 0.9.1, but also gives option to build the latest Flink from source instead Exposes flink-conf.yaml in Ambari UI Setup Download HDP 2.3 sandbox VM image (Sandbox_HDP_2.3_1_VMware.ova) from Hortonworks website Import Sandbox_HDP_2.3_1_VMware.ova into VMWare and set the VM memory size to 8GB Now start the VM After it boots up, find the IP address of the VM and add an entry into your machines hosts file. For example: 192.168.191.241 sandbox.hortonworks.com sandbox Note that you will need to replace the above with the IP for your own VM Connect to the VM via SSH (password hadoop) ssh root@sandbox.hortonworks.com To download the Flink service folder, run below VERSION=`hdp-select status hadoop-client | sed 's/hadoop-client - $[0-9]\.[0-9]$.*/\1/'` sudo git clone https://github.com/abajwa-hw/ambari-flink-service.git /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/FLINK Restart Ambari #sandbox service ambari restart #non sandbox sudo service ambari-server restart Then you can click on 'Add Service' from the 'Actions' dropdown menu in the bottom left of the Ambari dashboard: On bottom left -> Actions -> Add service -> check Flink server -> Next -> Next -> Change any config you like (e.g. install dir, memory sizes, num containers or values in flink-conf.yaml) -> Next -> Deploy By default: Container memory is 1024 MB Job manager memory of 768 MB Number of YARN container is 1 On successful deployment you will see the Flink service as part of Ambari stack and will be able to start/stop the service from here: You can see the parameters you configured under 'Configs' tab One benefit to wrapping the component in Ambari service is that you can now monitor/manage this service remotely via REST API export SERVICE=FLINK export PASSWORD=admin export AMBARI_HOST=localhost #detect name of cluster output=`curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' http://$AMBARI_HOST:8080/api/v1/clusters` CLUSTER=`echo $output | sed -n 's/.*"cluster_name" : "$[^\"]*$".*/\1/p'` #get service status curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X GET http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/$SERVICE #start service curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Start $SERVICE via REST"}, "Body": {"ServiceInfo": {"state": "STARTED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/$SERVICE #stop service curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Stop $SERVICE via REST"}, "Body": {"ServiceInfo": {"state": "INSTALLED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/$SERVICE ...and also install via Blueprint. See example here on how to deploy custom services via Blueprints Use Flink Run word count job su flink export HADOOP_CONF_DIR=/etc/hadoop/conf cd /opt/flink ./bin/flink run ./examples/flink-java-examples-0.9.1-WordCount.jar This should generate a series of word counts Open the YARN ResourceManager UI. Notice Flink is running on YARN Click the ApplicationMaster link to access Flink webUI Use the History tab to review details of the job that ran: View metrics in the Task Manager tab: Other things to try Apache Zeppelin now also supports Flink. You can also install it via Zeppelin Ambari service for vizualization More details on Flink and how it is being used in the industry today available here: http://flink-forward.org/?post_type=session Remove service To remove the Flink service: Stop the service via Ambari Unregister the service export SERVICE=FLINK export PASSWORD=admin export AMBARI_HOST=localhost #detect name of cluster output=`curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' http://$AMBARI_HOST:8080/api/v1/clusters` CLUSTER=`echo $output | sed -n 's/.*"cluster_name" : "$[^\"]*$".*/\1/p'` curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X DELETE http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/$SERVICE #if above errors out, run below first to fully stop the service #curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Stop $SERVICE via REST"}, "Body": {"ServiceInfo": {"state": "INSTALLED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/$SERVICE Remove artifacts rm -rf /opt/flink* rm /tmp/flink.tgz

Online	Offline
Last Visited	‎02-05-2026 01:39 AM

Member Since	‎09-17-2015 07:33 PM
Last Visited	‎02-05-2026 01:39 AM
Posts	436
Kudos received	559

Cloudera Community

How to open HCC/AH as an Ambari view

Re: Twitter Sentiment Analysis using Spark ML and ...

Deploy HDP 2.3.x cluster with Zeppelin 0.5.5 using...

Re: Hidden Gem in HDP sandbox. SSH Web Server on p...

Interesting talks for Hadoop Summit 2016 (EMEA)

Use OpenTSDB to store/visualize stock data on HDP ...

Getting started with Nifi expression language and ...

Re: New Visualization Feature in Hive View

New Visualization Feature in Hive View

Exploring Apache Flink with HDP