Member since
06-01-2019
3
Posts
0
Kudos Received
0
Solutions
06-03-2019
04:19 PM
Hello @lwang Thank you for your reply! I already tried these options. The first one using subprocesses and trying to run some hdfs commands could be an option but I am not very familiar with how to obtain the metadata I need: file_extension, creation_time, etc. The second link is more about how to read/write a specific file, for example, .txt files. I basically want to access a location(directory) in HDFS, iterate over all files inside and extract metadata about the files. If I find a working solution I can forget about that "folderstats" module and do it in another way.
... View more
06-03-2019
01:21 PM
That is the location where I want the csv File to be generated. It doesn’t even get to that line. The script can not Access the directory located in hdfs://quickstart.cloudera:8020/user/cloudera/files
... View more
06-01-2019
05:51 PM
Hello guys, I hope that I post in the right section. I have to following python script(I managed to run it locally): #!/usr/bin/env python3
import folderstats
df = folderstats.folderstats('hdfs://quickstart.cloudera.8020/user/cloudera/files', hash_name='md5', ignore_hidden=True)
df.to_csv(r'hdfs://quickstart.cloudera.8020/user/cloudera/files.csv', sep=',', index=True) I have the directory: "files" in that location. I checked this through the command line and even with HUE, and it's there. (myproject) [cloudera@quickstart ~]$ hadoop fs -ls /user/cloudera
Found 1 items
drwxrwxrwx - cloudera cloudera 0 2019-06-01 13:30 /user/cloudera/files The problem is that the directory can't be accessed. :I tried to run it normally: python3 script.py and even with super-user like: sudo -u hdfs python3 script.py and the out says: Traceback (most recent call last):
File "script.py", line 5, in <module>
df = folderstats.folderstats('hdfs://quickstart.cloudera:8020/user/cloudera/files', hash_name='md5', ignore_hidden=True)
File "/home/cloudera/miniconda3/envs/myproject/lib/python3.7/site-packages/folderstats/__init__.py", line 88, in folderstats
verbose=verbose)
File "/home/cloudera/miniconda3/envs/myproject/lib/python3.7/site-packages/folderstats/__init__.py", line 32, in _recursive_folderstats
for f in os.listdir(folderpath):
FileNotFoundError: [Errno 2] No such file or directory: 'hdfs://quickstart.cloudera:8020/user/cloudera/files' "No such file or directory: 'hdfs://quickstart.cloudera:8020/user/cloudera/files'" Can you, please, help me to clarify this issue? Thank you!
... View more
Labels:
- Labels:
-
HDFS