Member since 
    
	
		
		
		11-04-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                44
            
            
                Posts
            
        
                18
            
            
                Kudos Received
            
        
                3
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 1720 | 03-17-2017 06:17 AM | |
| 73344 | 02-29-2016 12:25 PM | |
| 16414 | 02-03-2016 01:25 PM | 
			
    
	
		
		
		03-17-2017
	
		
		06:17 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 We can achieve this using JOIN as follows.   1. JOIN A and B BY Id.   B_joined = JOIN A by Id, B by Id;  2. JOIN A and C by Id:  C_joined = JOIN A by Id, C by Id;  Now, we can get the required fields of A and C from their respective joined data sets as follows:  B_filtered  = FOREACH B_joined GENERATE B::Id,B::t1;  C_filtered  =FOREACH C_joined GENERATE C::Id; 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-16-2017
	
		
		12:34 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I have many files. one of which,say, header.csv, serves as a header file,i.e., it contains primary key(in database analogy) which servers as foreign key in the rest of the files.  Now, I want to do FOREACH and FILTER as follows:  A =LOAD  'header.csv' AS (Id:chararray,f1:chararrat,f2:chararray);  B = LOAD  'file1.csv' AS (Id:chararray,t1:chararray);  C = LOAD  'file2.csv' AS (Id:chararray)  ..........  D = foreach A {  file1_filtered = FILTER file1 BY Id == A.Id;       file2_filtered = FILTER file2 BY Id == A.Id;  GENERATE file1_filtered,file2_filtered;  };  Finally I need to access the relations file1_filtered and file2_filtered.
When I follow this approach I got the following error:  "ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
<line 2651, column 28> expression is not a project expression: (Name: ScalarExpression) Type: null Uid: null)"  How can I achieve this in Pig? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Pig
 
			
    
	
		
		
		03-11-2017
	
		
		12:42 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 I have independent clusters of HDF and HDP.  I wonder if I can have a single KDC Admin server for both of the clusters.  If it is possible, how do I achieve that.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		03-07-2017
	
		
		06:09 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thank you very much @Lester Martin! This is exactly what I was looking for. If you don't mind I have another related question. This logic is done on more than 36 different files. In database concept, one of the files uses the ID and CreateDate fields as Primary Key and these fields are used as Foreign Keys in the rest of the files.
 * The files are dropped daily into Hadoop local directory
 *  The files have current date appended to their file names  So, I need to read all the files from Hadoop local directory, do the above logic on each of them, then store the results into HDFS.
Is Pig the optimal (or feasible at all)  solution for my use case.
Currently, I am doing this logic using C# program to read the files, do the logic and insert into relational database.
Why I am seeking for Pig is to improve the performance of the ETL process.  Any recommendation on this please?  Thanks! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-03-2017
	
		
		03:43 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I have two files on my hdfs. One of the file(latest file) contains some updates on the other file(previous file). Now,  I want to check if value of specific columns on the latest file also exist on the previous file(or if they have same value), and replace such records of the previous file with records of the  latest file.(i.e. delete such records from the previous file and replace with records from latest file).  That means, I need to check each record of the previous file against each record of the latest file based on specific columns. If matching is found, delete the whole record from the previous file , then replace with the record from latest file.   How can I achieve this with Pig?  thanks! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Pig
 
			
    
	
		
		
		02-22-2017
	
		
		05:20 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Pierre Villard 
Thank you very much dear! You made my day! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-21-2017
	
		
		02:14 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 I have multiple files on my SFTP server with different filenames(filenames  have date time appended).Now, I am using ListSftp, RouteOnAttribute, FetchSftp and putHdfs processors. But on the FetchSftp processor I have doubt on how to put all the files on the remote SFTP server to HDFS.
Is there any option to provide the list  of the file names to "Remote File" property of FetchSftp processor configuration?  Thanks! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache NiFi
 
			
    
	
		
		
		02-21-2017
	
		
		12:03 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
	Thank you very much @Pierre Villard
	
Your are answer was really helpful. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-21-2017
	
		
		09:19 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I have 7 node Kerberized HDP cluster. I have installed apache Nifi on one of my HDP cluster nodes just for testing purpose. When I try to configure putHdfs processor, the following warning pops up:      I tried to set the Kerberos properties as follows:      In addition to this, I set 
nifi.kerberos.krb5.file=/etc/krb5.conf   in the nifi.properties file.
What is the correct configuration(on the Nifi host or HDFS host) for the putHdfs processor to work properly in this case?  Do I need to create Kerberos principal and Kerberos keytab file for the Nifi?
Which service's principal or keytab file am I required to provide for "Kerberos Principal" and "Kerberos Keytab" fields in the putHdfs processor configuration?(is it the Nifi's or the Hdfs?)  Thanks,     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache NiFi
 - 
						
							
		
			Cloudera DataFlow (CDF)
 
			
    
	
		
		
		07-21-2016
	
		
		01:38 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @ Venkat ramanann  In addition to @Jon Maestas workaround, add the IP address of the host (on which PostgreSQL server is running) to the pg_hba.conf file and make sure that the method is set to "trust".   Note: Replace <ip address> with the IP address of your host to allow connections.  # TYPE  DATABASE    USER        CIDR-ADDRESS          METHOD   # IPv4 local connections:   host     all     all           127.0.0.1/32           md5        host     all      all         <ip address>/24       trust  # IPv6 local connections:
host    all         all         ::1/128               md5  Now, restart postgresql service.  For more details, visit https://confluence.atlassian.com/confkb/confluence-unable-to-connect-to-postgresql-due-to-unconfigured-pg_hba-conf-file-300814422.html  
 
						
					
					... View more