Member since 
    
	
		
		
		06-01-2019
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                3
            
            
                Posts
            
        
                0
            
            
                Kudos Received
            
        
                0
            
            
                Solutions
            
        
			
    
	
		
		
		06-03-2019
	
		
		04:19 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello @lwang      Thank you for your reply!     I already tried these options.     The first one using subprocesses and trying to run some hdfs commands could be an option but I am not very familiar with how to obtain the metadata I need: file_extension, creation_time, etc.     The second link is more about how to read/write a specific file, for example, .txt files.     I basically want to access a location(directory) in HDFS, iterate over all files inside and extract metadata about the files.     If I find a working solution I can forget about that "folderstats" module and do it in another way. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-03-2019
	
		
		01:21 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							That is the location where I want the csv File to be generated.  It doesn’t even get to that line.  The script can not Access the directory located in hdfs://quickstart.cloudera:8020/user/cloudera/files
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-01-2019
	
		
		05:51 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello guys,     I hope that I post in the right section.     I have to following python script(I managed to run it locally):        #!/usr/bin/env python3
import folderstats
df = folderstats.folderstats('hdfs://quickstart.cloudera.8020/user/cloudera/files', hash_name='md5', ignore_hidden=True)
df.to_csv(r'hdfs://quickstart.cloudera.8020/user/cloudera/files.csv', sep=',', index=True)        I have the directory: "files" in that location. I checked this through the command line and even with HUE, and it's there.     (myproject) [cloudera@quickstart ~]$ hadoop fs -ls /user/cloudera
Found 1 items
drwxrwxrwx   - cloudera cloudera          0 2019-06-01 13:30 /user/cloudera/files  The problem is that the directory can't be accessed.  :I tried to run it normally: python3 script.py and even with super-user like: sudo -u hdfs python3 script.py and the out says:        Traceback (most recent call last):
  File "script.py", line 5, in <module>
    df = folderstats.folderstats('hdfs://quickstart.cloudera:8020/user/cloudera/files', hash_name='md5', ignore_hidden=True)
  File "/home/cloudera/miniconda3/envs/myproject/lib/python3.7/site-packages/folderstats/__init__.py", line 88, in folderstats
    verbose=verbose)
  File "/home/cloudera/miniconda3/envs/myproject/lib/python3.7/site-packages/folderstats/__init__.py", line 32, in _recursive_folderstats
    for f in os.listdir(folderpath):
FileNotFoundError: [Errno 2] No such file or directory: 'hdfs://quickstart.cloudera:8020/user/cloudera/files'  "No such file or directory: 'hdfs://quickstart.cloudera:8020/user/cloudera/files'"     Can you, please, help me to clarify this issue?     Thank you!          
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			HDFS
 
        
