Member since 
    
	
		
		
		09-17-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                436
            
            
                Posts
            
        
                736
            
            
                Kudos Received
            
        
                81
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 5082 | 01-14-2017 01:52 AM | |
| 7348 | 12-07-2016 06:41 PM | |
| 8705 | 11-02-2016 06:56 PM | |
| 2806 | 10-19-2016 08:10 PM | |
| 7067 | 10-19-2016 08:05 AM | 
			
    
	
		
		
		10-23-2015
	
		
		04:52 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 This has been updated recently, see my comment below 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-23-2015
	
		
		04:51 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 So we worked with LW on an official TechPreview for Ambari managed Solr but this have not yet been released yet. Someone from PM org probably has a better idea of when the official TP will be out (maybe @Tim Hall can comment).   For demo scenarios only you can try the one I built here. Last weekend as part of the Amabri hack-fest I updated it to support HDPsearch, although the default is still Apache Solr.   To use  HDPsearch instead, set the below property while installing the service  solr.download.location = HDPSEARCH  Having said that, there wasn't a whole lot of testing done so feel free to send me feedback offline if you have any issues 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-20-2015
	
		
		06:19 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Just delete the files mentioned in /root/kylin/storage/target/rat.txt  Usually they are log files or files user created/copied 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-16-2015
	
		
		04:47 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Paul Codding @Piotr Pruski any ideas on this? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-15-2015
	
		
		03:43 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Guilherme Braccialli here is the config we used for IPA. When providing the Ambari admin user/pass, this needs to exist in your IPA. So in my example I passes in admin/hortonworks. If you are passing in admin/admin it probably won't work:  https://github.com/abajwa-hw/security-workshops/blob/master/Setup-Ambari.md#authentication-via-ldap  Note that group memberships won't work since IPA uses openLDAP which does not expose the DN. @Paul Codding, @David Streever, @Sean Roberts and I found this the hard way. See BUG-45536 for more info (and up vote!) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-14-2015
	
		
		06:29 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Did not upgrade the existing Spark on sandbox, but installed in a separate location while playing with Zeppelin and it worked fine. Below is the script I used to set it up (see readme for Ambari service for Zeppelin for more info)  sudo useradd zeppelin
sudo su zeppelin
cd /home/zeppelin
wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.0-bin-hadoop2.6.tgz -O spark-1.5.0.tgz
tar -xzvf spark-1.5.0.tgz
export HDP_VER=`hdp-select status hadoop-client | sed 's/hadoop-client - \(.*\)/\1/'`
echo "spark.driver.extraJavaOptions -Dhdp.version=$HDP_VER" >> spark-1.5.0-bin-hadoop2.6/conf/spark-defaults.conf
echo "spark.yarn.am.extraJavaOptions -Dhdp.version=$HDP_VER" >> spark-1.5.0-bin-hadoop2.6/conf/spark-defaults.conf
exit 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-12-2015
	
		
		07:56 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 That is a supported upgrade so I believe the direct upgrade from 2.1 to 2.3.2 would be the optimal path 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-12-2015
	
		
		06:18 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 The wiki and our docs are probably a good start  https://cwiki.apache.org/confluence/display/AMBARI/AMS+-+distributed+mode  http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.0.0/bk_ambari_reference_guide/content/ch_amb_ref_configuring_ambari_metrics.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-12-2015
	
		
		08:41 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		46 Kudos
		
	
				
		
	
		
					
							 Build flow in HDF/Nifi to push tweets to HDP  In this tutorial, we will learn how to use HDF to create a simple event processing flow by:   Install HDF/Nifi on sandbox using the Ambari service  Setup Solr/Banana/Hive table  Import/Instantiate a prebuilt Nifi template  Verify tweets got pushed to HDFS, Hive using Ambari views  Visualize tweets in Solr using Banana dashboard  Explore provenance features of Nifi   Change log   9/30: Automation script to deploy HDP clusters (on any cloud) with this demo already setup, is available here  9/15: Updated: Demo Ambari service for Nifi updated to support HDP 2.5 sandbox and Nifi 1.0. Steps to manually install demo artifacts remains unchanged (but below Nifi screenshots need to be updated)   References  For a primer on HDF, you can refer to the materials here to get a basic background  Thanks to @bbende@hortonworks.com for his earlier blog post that helped make this tutorial possible  
  Pre-Requisites   The lab is designed for the HDP Sandbox VM. To run on Azure sandbox, Azure specific pre-req steps provided here  Download the HDP Sandbox here, import into VMWare Fusion and start the VM  If running on VirtualBox you will need to forward port 9090. See here for detailed steps   After it boots up, find the IP address of the VM and add an entry into your machines hosts file e.g.   192.168.191.241 sandbox.hortonworks.com sandbox      Connect to the VM via SSH (root/hadoop), correct the /etc/hosts entry  ssh root@sandbox.hortonworks.com   If using HDP 2.5 sandbox, you will also need to SSH into the docker based sandbox container:   ssh root@127.0.0.1 -p 2222  
 Deploy/update Nifi Ambari service on sandbox by running below
 
 Note: on HDP 2.5 sandbox, the Nifi service definition is already installed, so you can skip this and proceed to installing Nifi via 'Install Wizard'     VERSION=`hdp-select status hadoop-client | sed 's/hadoop-client - \([0-9]\.[0-9]\).*/\1/'`
rm -rf /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/NIFI
sudo git clone https://github.com/abajwa-hw/ambari-nifi-service.git   /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/NIFI   
#sandbox
service ambari restart
#non sandbox
service ambari-server restart
   To install Nifi, start the 'Install Wizard': Open Ambari (http://sandbox.hortonworks.com:8080) then: 
 
 On bottom left -> Actions -> Add service -> check NiFi server -> Next -> Next -> Change any config you like (e.g. install dir, port, setup_prebuilt or values in nifi.properties) -> Next -> Deploy. This will kick off the install which will run for 5-10min.       Steps  
 Import a simple flow to read Tweets into HDFS/Solr and visualize using Banana dashboard
  HDP sandbox comes LW HDP search. Follow the steps below to use it to setup Banana, start SolrCloud and create a collection
  On HDP 2.5 sandbox, HDPsearch can be installed via Ambari. Just use the same 'Install Wizard' used above and select all defaults  To install HDP search on non-sandbox, you can either:  install via Ambari (for this you will need to install its management pack in Ambari)   OR install HDPsearch manually:       yum install -y lucidworks-hdpsearch
sudo -u hdfs hadoop fs -mkdir /user/solr
sudo -u hdfs hadoop fs -chown solr /user/solr
   Ensure no log files owned by root (current sandbox version has files owned by root in log dir which causes problems when starting solr)   chown -R solr:solr /opt/lucidworks-hdpsearch/solr  
   Run solr setup steps as solr user   su solr
   Setup the Banana dashboard by copying default.json to dashboard dir   cd /opt/lucidworks-hdpsearch/solr/server/solr-webapp/webapp/banana/app/dashboards/
mv default.json default.json.orig
wget https://raw.githubusercontent.com/abajwa-hw/ambari-nifi-service/master/demofiles/default.json
   Edit solrconfig.xml by adding  <str>EEE MMM d HH:mm:ss Z yyyy</str>  under ParseDateFieldUpdateProcessorFactory  so it looks like below. This is done to allow Solr to recognize the timestamp format of tweets.   vi /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml
    <processor>
    <arr name="format">
      <str>EEE MMM d HH:mm:ss Z yyyy</str>
   Start/Restart Solr in cloud mode  If you installed Solr via Ambari, just use the 'Service Actions' dropdown to restart it  Otherwise, if you installed manually, start Solr as below after setting JAVA_HOME to the right location:     export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64 
/opt/lucidworks-hdpsearch/solr/bin/solr start -c -z localhost:2181   create a collection called tweets   /opt/lucidworks-hdpsearch/solr/bin/solr create -c tweets    -d data_driven_schema_configs    -s 1    -rf 1 
   Solr setup is complete. Return to root user   exit   Ensure the time on your sandbox is accurate or you will get errors using the GetTwitter processor. In case the time is not correct, run the below to fix it:   yum install -y ntp
service ntpd stop
ntpdate pool.ntp.org
service ntpd start   Now open Nifi webui (http://sandbox.hortonworks.com:9090/nifi) and run the remaining steps there:
  Download prebuilt Twitter_Dashboard.xml template to your laptop from here  Import flow template info Nifi:
  Import template by clicking on Templates (third icon from right) which will launch the 'Nifi Flow templates' popup     Browse and navigate to where ever you downloaded Twitter_Dashboard.xml on your local machine  Click Import. Now the template should appear:     Close the popup    Instantiate the Twitter dashboard template:
  Drag/drop the Template icon (7th icon form left) onto the canvas so that a picklist popup appears     Select 'Twitter dashboard' and click Add  This should create a box (i.e processor group) named 'Twitter Dashboard'. Double click it to drill into the actual flow    Configure GetTwitter processor
  Right click on 'GetTwitter' processor (near top) and click Configure  Under Properties:  Enter your Twitter key/secrets  ensure the 'Twitter Endpoint' is set to 'Filter Endpoint'  enter the search terms (e.g. AAPL,GOOG,MSFT,ORCL) under 'Terms to Filter on'          Configure PutContentSolrStream processor  Writes the selected attributes to Solr. In this case, assuming Solr is running in cloud mode with a collection 'tweets'  Confirm the Solr Location property is updated to reflect your Zookeeper configuration (for SolrCloud) or Solr standalone instance  If you installed Solr via Ambari, you will need to append /solr to the ZK string in the 'Solr Location':          Review the other processors and modify properties as needed:
  EvaluateJsonPath: Pulls out attributes of tweets  RouteonAttribute: Ensures only tweets with non-empty messages are processed  ReplaceText: Formats each tweet as pipe (|) delimited line entry e.g. tweet_id|unixtime|humantime|user_handle|message|full_tweet  MergeContent: Merges tweets into a single file (either 20 tweets or 120s, whichever comes first) to avoid having a large number of small files in HDFS. These values can be configured.  PutFile: writes tweets to local disk under /tmp/tweets/  PutHDFS: writes tweets to HDFS under /tmp/tweets_staging    If setup correctly, the top left hand of each processor on the canvas will show a red square (indicating the flow is stopped)  Click the Start button (green triangle near top of screen) to start the flow  After few seconds you will see tweets flowing     Create Hive table to be able to run queries on the tweets in HDFS   sudo -u hdfs hadoop fs -chmod -R 777 /tmp/tweets_staging
hive> create table if not exists tweets_text_partition(
  tweet_id bigint, 
  created_unixtime bigint, 
  created_time string, 
  displayname string, 
  msg string,
  fulltext string
)
row format delimited fields terminated by "|"
location "/tmp/tweets_staging"; 
   Viewing results  
 Verify that:
  tweets appear under /tmp/tweets_staging dir in HDFS. You can see this via Files view in Ambari:     tweets appear in Solr:
  http://sandbox.hortonworks.com:8983/solr/tweets_shard1_replica1/select?q=*:*  http://sandbox.hortonworks.com:8983/solr/#/tweets_shard1_replica1/query         Tweets appear in Banana:
  http://sandbox.hortonworks.com:8983/solr/banana/index.html#/dashboard   To search for tweets by language (e.g. Italian) enter the below in the search text box: 
  language_s:it    To search for tweets by a particular user (e.g. warrenbuffett) enter the below in the search text box:   screenName_s:warrenbuffett    To search for tweets containing some text (e.g. tax) enter the below in the search text box:   text_t:tax               Tweets appear in Hive:
  http://sandbox.hortonworks.com:8080/#/main/views/HIVE/1.0.0/Hive        Other Nifi features
  Flow statistics/graphs:
  Right click on one of the processors (e.g. PutHDFS) and select click 'Stats' to see a number of charts/metrics:     You should also see Nifi metrics in Ambari (assuming you started Ambari metrics earlier)      Data provenance in Nifi:
  In Nifi home screen, click Provenance icon (5th icon from top right corner) to open Provenance page:     Click Show lineage icon (2nd icon from right) on any row   
  Right click Send > View details > Content   
  From here you can view the tweet itself by  Clicking Content > View > formatted       You can also replay the event by  Replay > Submit    Close the provenance window using x icon on the inner window  Notice the event was replayed     Re-open the the provenance window on the row you you had originally selected      Notice that by viewing and replaying the tweet, you changed the provenance graph of this event: Send and replay events were added to the lineage graph   Also notice the time slider on the bottom left of the page which allows users to 'rewind' time and 'replay' the provenance events as they happened.     Right click on the Send event near the bottom of the flow and select Details      Notice that the details of request to view the tweet are captured here (who requested it, at what time etc)  Exit the Provenance window but clicking the x icon on the outer window     You have successfully created a basic Nifi flow that perfoms simple event processing to ingest tweets into HDP. Why was the processing 'simple'? There were no complex features like alerting users based on time windows (e.g. if a particular topic was tweeted about more than x times in 30s) etc which requires a higher fidelity form of transportation. For such functionality the recommendation would be to use Kafka/Storm. To see how you would use these technologies of the HDP stack to perform complex processing, take a look at the Twitter Storm demo at the Hortonworks Gallery under 'Sample Apps'
 Other things to try:  Learn more about Nifi expression language and how to get started building a custom Nifi processor: http://community.hortonworks.com/articles/4356/getting-started-with-nifi-expression-language-and.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
 
        













