Member since 
    
	
		
		
		06-03-2019
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                7
            
            
                Posts
            
        
                0
            
            
                Kudos Received
            
        
                0
            
            
                Solutions
            
        
			
    
	
		
		
		07-15-2020
	
		
		01:38 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @MattWho      Your guide was really helpful and it worked well.  However, I think there is some problems if "Full Path" has deeper hierarchy.   Firstly, I set full path like below.  aa/bb/cc/dd/  It created only one flow file which was "dd", of course its type was a directory.  I removed "dd" and set a dir filter and file filter to get files only I wanted.  After changed full path from "aa/bb/cc/dd" to "aa/bb/cc", it made all flow file information under the cc directory.     Thanks for your advice.   Cheers. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-08-2020
	
		
		04:36 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							    @MattWho      Hi  Firstly, I appreciate your detail guide.   I changed Group Result option as you guided, but I saw only one attribute which was about directory.  This is my properties on GetHDFSFileInfo.         And this is what I got as a result of GetHDFSFileInfo.  hdfs.objectName : gfk <-- directory  hdfs.path : /paxatadata/export/prod <-- parent directory  hdfs.type : directory         Could you advice me more?    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-07-2020
	
		
		08:57 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi.  I'm trying to utilize GetHDFSFIleInfo(not GetHDFS) to start after previous step.  Actually, this is what I exactly want to do (https://community.cloudera.com/t5/Support-Questions/NiFi-fetchHDFS-without-ListHDFS/td-p/211708)     flow  streamcommand -> GetHDFSFileInfo -> FetchHDFS -> PutSFTP         I did make gethdfsfileinfo first, after then tried to get flow file info from previous job at FetchHDFS.  However, the result of gethdfsfileinfo was only one attribute and I don't know how to fetch all files at fetchHDFS.     This is an attribute of gethdfsfileinfo. I got an attribute as below and want to fetch those all files.  filename : faaa~~~~~  hdfs.count.dirs : 1  hdfs.count.files : 44  hdfs.full.tree : {"ojbectName";"gfk",""...., "content":[{"objectName":"Weekly_GfK_02_Merge_F_HP_AT.txt",".....]}  hdfs.objectName : gfk  ....  Could you please tell how to use it? or how to solve this problem?    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache NiFi
			
    
	
		
		
		06-25-2019
	
		
		04:12 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi all      I'm trying to get log files from under multiple directories which are not specified the name like below.    Is it possible that that files are transferred to the host server from hdfs using listHDFS?  It seems that I'm supposed to specify the path exactly, isn't it?  If there is a solution, please let me know.  I appreciate in advance.     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hadoop
- 
						
							
		
			Apache NiFi
			
    
	
		
		
		06-03-2019
	
		
		07:06 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi Ben,      Appreciate your quick and kind response.      I have one more question.  After reading your answer, I've started to develope using "time-seriese" api.  During programming, I found a problem with data before about 30 days.  I'd like to obtain a every single minute's data of whole period but, It might be only available every 10 minute's( or longer)  data set  when I tried to get old resource data.     I used this api below.  Could you help me know how to get every single minute's such as, cpu/mem usage ?     import time  import datetime  api_instance = cm_client.TimeSeriesResourceApi(api_client)  from_time = datetime.datetime.fromtimestamp(time.time() - 7776000)  to_time = datetime.datetime.fromtimestamp(time.time())  query = "select cpu_user_rate "\  " where entityname = 'xx' "  # Retrieve time-series data from the Cloudera Manager (CM) time-series data store using a tsquery.  result = api_instance.query_time_series(_from=from_time, query=query, to=to_time)#, desired_rollup='RAW', must_use_desired_rollup = 'true')  ts_list = result.items[0]  for ts in ts_list.time_series:  print (ts.metadata.attributes['entityName'], ts.metadata.metric_name)  for point in ts.data:  print (point.timestamp, point.value)     Appreciate your response.  Ben, 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-03-2019
	
		
		02:19 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello.     I'm a new guy using Coludera.  I have very simple question about a log.     I need to check every single minute's resource's log such as cpu usage, mem usage, disk I/O etc,  but it's hard to find that kinds of data in log files.  To be more specific, I want to utilize resource logs as data frame shape  for recognizing whether the server is going well or not.     so, can I obtain resource data ?     Aplogies for this stupid question. Appreciate your response.       
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Cloudera Manager
 
        


