Member since 
    
	
		
		
		06-20-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                488
            
            
                Posts
            
        
                433
            
            
                Kudos Received
            
        
                118
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 3604 | 08-25-2017 03:09 PM | |
| 2515 | 08-22-2017 06:52 PM | |
| 4197 | 08-09-2017 01:10 PM | |
| 8977 | 08-04-2017 02:34 PM | |
| 8949 | 08-01-2017 11:35 AM | 
			
    
	
		
		
		09-27-2016
	
		
		01:17 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 This produces the results you want:  RAW = LOAD 'filepath' USING PigStorage(';') as 
  (Employee:Chararray, Stock:Int, Furnisher:Chararray, Date:Chararray, Value:Double);
RANKING = rank RAW BY Employee, Date DENSE;
GRP = GROUP RANKING BY $0;
SUMMED = foreach GRP {
     summed = SUM(RANKING.Value);
     generate $0, summed as Ranksum;
}
JOINED = join RANKING by $0, SUMMED by $0;
FINAL= foreach JOINED generate $0, Employee, Stock, Furnisher, Date, Ranksum;
STORE FINAL INTO 'destinationpath' USING PigStorage(',');   Let me know this is what you are looking for by accepting the answer.  If I did not get the requirements correct, please clarify. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-27-2016
	
		
		12:27 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hmm.  Interesting. Downloaded latest version of sandbox 2.5 GA and now there simply are no contents  [root@sandbox scripts]# ls -l /var/lib/ambari-agent/cache/custom_actions/scripts/
total 0  [root@sandbox scripts]# sandbox-version
Sandbox information:
Created on: 13_09_2016_11_17_36 for
Hadoop stack version:  Hadoop 2.7.3.2.5.0.0-1245
Ambari Version: 2.4.0.0-1225
Ambari Hash: 59175b7aa1ddb74b85551c632e3ce42fed8f0c85
Ambari build:  Release : 1225
Java version:  1.8.0_101
OS Version:  CentOS release 6.8 (Final)  I will contact sandbox SMEs to communicate issue. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-26-2016
	
		
		12:59 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		5 Kudos
		
	
				
		
	
		
					
							 You need to FLATTEN your nested data  Your grouped data set has (is a bag of) fields, tuples, and bags.  You need to extract the fields from the bags and tuples using the FLATTEN operator.    Each of you grouped records can be seen as follows:  1;					-- field
(7287026502032012,18);			-- tuple
{(706)};				-- bag
{(101200010)};				-- bag
{(17286)};				-- bag
{(oz)};					-- bag
2.5 					-- field  Using FLATTEN with the tuple is simple but using it with a bag is more complicated.  Flattening tuples    To look at only tuples, let's assume your data looked like this:  1;					-- field
(7287026502032012,18);			-- bag  Then you would use:  data_flattened = FOREACH data GENERATE
   $0,
   FLATTEN $1;  which for the data above would produce 1; 7287026502032012; 18  Flattening bags  Flattening bags is more complicated, because it flattens them to tuples but cross joins them with the other data in your GENERATE statement.  From the Apache Pig docs  For bags, the situation becomes more complicated. When we un-nest a bag, we create new tuples. If we have a relation that is made up of tuples of the form ({(b,c),(d,e)}) and we apply GENERATE flatten($0), we end up with two tuples (b,c) and (d,e). When we remove a level of nesting in a bag, sometimes we cause a cross product to happen. For example, consider a relation that has a tuple of the form (a, {(b,c), (d,e)}), commonly produced by the GROUP operator. If we apply the expression GENERATE $0, flatten($1) to this tuple, we will create new tuples: (a, b, c) and (a, d, e).  Using Pig's builtin function BagToTuple() to help you out  Pig has a builtin function BagToTuple() which as it says converts a bag to a tuple.  By converting your bags to tuples, you can then easily flatten them as above.  Final code  Your final code will look like this:  data_flattened = FOREACH data GENERATE 
	$0, 
	FLATTEN $1,
	FLATTEN(BagToTuple($2)),
	FLATTEN(BagToTuple($3)),
	FLATTEN(BagToTuple($4)),
	FLATTEN(BagToTuple($5)),
	$6;   to produce your desired data.  Useful links:  https://pig.apache.org/docs/r0.10.0/basic.html#flatten
http://chimera.labs.oreilly.com/books/1234000001811/ch06.html#more_on_foreach
https://pig.apache.org/docs/r0.11.0/api/org/apache/pig/builtin/BagToTuple.html  If this answers your question, let me know by accepting the answer.  Else, let me know the gaps or issues that are remaining. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-23-2016
	
		
		03:28 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Yes, I tried these .. should have put in description  [root@sandbox ~]# ls -ld /var/lib/ambari-agent/cache/custom_actions/scripts/
drwxrwxrwx 1 root root 4096 Sep 22 18:55 /var/lib/ambari-agent/cache/custom_actions/scripts/  [root@sandbox ~]# chmod -R a+rX /var/lib/ambari-agent/cache/custom_actions/scripts/
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/check_host.py': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/check_host.pyo': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/clear_repocache.py': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/clear_repocache.pyo': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/install_packages.py': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/install_packages.pyo': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/remove_bits.py': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/remove_bits.pyo': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/ru_execute_tasks.py': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/ru_execute_tasks.pyo': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/ru_set_all.py': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/ru_set_all.pyo': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/update_repo.py': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/update_repo.pyo': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/validate_configs.py': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/validate_configs.pyo': No such file or directory 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-23-2016
	
		
		12:36 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 This is a great guide to what gets installed where on HDP: https://community.hortonworks.com/articles/16763/cheat-sheet-and-tips-for-a-custom-install-of-horto.html  You will notice that Kafka should be installed within the cluster and is best dedicated to its own nodes.  As a side note,  Hortwonworks Data Flow (HDF) is a separate distribution/product provided by Hortonworks.  It packages Kafka along with NiFi, Storm and Ambari and excels at acquiring, inspecting, routing, transforming, analyizing data in motion from a diverse number of sources (ranging from sensors to databases), which is typically outputted in Hadoop.  Exciting technology and a lot to talk ... check it out: http://hortonworks.com/products/data-center/hdf/    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-23-2016
	
		
		12:10 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Not sure which instructions you are using, but make sure these were followed: https://community.hortonworks.com/articles/34424/apache-zeppelin-on-hdp-242.html  If not followed, suggest uninstalling Zeppelin and reinstalling with the steps shown in the link.  Also, consider upgrading to HDP 2.5.  Zeppelin is GA in this version (not Technical Preview) and the install is 100% from the Ambari UI.  https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_zeppelin-component-guide/content/ch_installation.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-23-2016
	
		
		12:01 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 This is a good discussion on setting reducers: https://community.hortonworks.com/questions/28073/how-do-you-force-the-number-of-reducers-in-a-map-r.html  As with all performance tuning, best to isolate a bottleneck and tune that vs. simply trying a lot of things at the same time.  So yes, among other tuning ... set this and see if it works.  If not, move to the next suspected bottleneck. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-23-2016
	
		
		01:44 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 
	Note: This is in sandbox.  
	Simple workflow:  
	
 log into Ranger 	
 navigate to Ranger service (Ranger Admin, Ranger Usersync, Ranger Tagsyncs are all running) 	
 navigate to configs 	
 click "Test Connections 	
 Result: Connection failed   
	stderr says: 
 /usr/bin/python: can't open file '/var/lib/ambari-agent/cache/custom_actions/scripts/check_host.py': [Errno 2] No such file or directory
  
	When I ssh into the ranger host (sandbox) as root, and run ls -l /var/lib/ambari-agent/cache/custom_actions/scripts/  I get the following result (be sure to scroll to bottom): 
 ls: cannot access scripts/check_host.py: No such file or directory
ls: cannot access scripts/check_host.pyo: No such file or directory
ls: cannot access scripts/clear_repocache.py: No such file or directory
ls: cannot access scripts/clear_repocache.pyo: No such file or directory
ls: cannot access scripts/install_packages.py: No such file or directory
ls: cannot access scripts/install_packages.pyo: No such file or directory
ls: cannot access scripts/remove_bits.py: No such file or directory
ls: cannot access scripts/remove_bits.pyo: No such file or directory
ls: cannot access scripts/ru_execute_tasks.py: No such file or directory
ls: cannot access scripts/ru_execute_tasks.pyo: No such file or directory
ls: cannot access scripts/ru_set_all.py: No such file or directory
ls: cannot access scripts/ru_set_all.pyo: No such file or directory
ls: cannot access scripts/update_repo.py: No such file or directory
ls: cannot access scripts/update_repo.pyo: No such file or directory
ls: cannot access scripts/validate_configs.py: No such file or directory
ls: cannot access scripts/validate_configs.pyo: No such file or directory
total 0
?????????? ? ? ? ?            ? check_host.py
?????????? ? ? ? ?            ? check_host.pyo
?????????? ? ? ? ?            ? clear_repocache.py
?????????? ? ? ? ?            ? clear_repocache.pyo
?????????? ? ? ? ?            ? install_packages.py
?????????? ? ? ? ?            ? install_packages.pyo
?????????? ? ? ? ?            ? remove_bits.py
?????????? ? ? ? ?            ? remove_bits.pyo
?????????? ? ? ? ?            ? ru_execute_tasks.py
?????????? ? ? ? ?            ? ru_execute_tasks.pyo
?????????? ? ? ? ?            ? ru_set_all.py
?????????? ? ? ? ?            ? ru_set_all.pyo
?????????? ? ? ? ?            ? update_repo.py
?????????? ? ? ? ?            ? update_repo.pyo
?????????? ? ? ? ?            ? validate_configs.py
?????????? ? ? ? ?            ? validate_configs.pyo
  
	Any idea what is going on? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Ranger
 
			
    
	
		
		
		09-22-2016
	
		
		10:27 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Release notes state that Hive 2.1 is available in HDP 2.5 as tech preview.  LLAP is part of the tech preview  http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_release-notes/content/tech_previews.html  Release email to customers explain the same in a more readable way:  Apache Hive
  
 Includes Apache Hive 1.2.1 for production and Hive 2.1 (Technical Preview) for cutting-edge performance  Hive LLAP (Technical Preview): Persistent query servers and optimized in-memory caching for blazing fast SQL. Up to 25x faster for BI workloads. 100% compatible with existing Hive workloads  Hive ACID and Streaming Ingest certified for production use with Hive 1.2.1  Dynamic user-based security policies for data masking and filtering  HPL/SQL: Procedural programming within Hive  Hive View v1.5.0, improved robustness and security  Parquet format fully certified with Hive 1.2.1 / 2.1  
						
					
					... View more