Support Questions

Find answers, ask questions, and share your expertise

Reading files from s3 bucket sub folders

avatar

Hi all,

I am trying to read the files from s3 bucket (which contain many sub directories). As of now i am giving the phyisical path to read the files. How to read the files without hard coded values.

File path : S3 bucket name/Folder/1005/SoB/20180722_zpsx3Gcc7J2MlNnViVp61/JPR_DM2_ORG/ *.gz files

"S3 bucket name/Folder/" this path is fixed one and client id(1005) we have to pass as a parameter.

Under Sob folder, we are having monthly wise folders and I have to take only latest two months data.

Please help me how to read the data without hard-coded.

Many thanks for your help.

1 ACCEPTED SOLUTION

avatar
Super Guru

@Lakshmi Prathyusha,

You can write a simple python snippet like below to read the subfolders. I have put a print statement in the code, but you can replace it some subprocess command to run it.

from datetime import date, timedelta
from dateutil.relativedelta import relativedelta

today = date.today()
two_months_back = today - relativedelta(months=2)

delta = today - two_months_back

for i in range(delta.days + 1):
dt = str(two_months_back + timedelta(i)).replace("-", "")
print "hdfs dfs -ls s3a://bucket/Folder/1005/SoB/%s" % dt

.

-Aditya

View solution in original post

3 REPLIES 3

avatar
Super Guru

@Lakshmi Prathyusha,

You can write a simple python snippet like below to read the subfolders. I have put a print statement in the code, but you can replace it some subprocess command to run it.

from datetime import date, timedelta
from dateutil.relativedelta import relativedelta

today = date.today()
two_months_back = today - relativedelta(months=2)

delta = today - two_months_back

for i in range(delta.days + 1):
dt = str(two_months_back + timedelta(i)).replace("-", "")
print "hdfs dfs -ls s3a://bucket/Folder/1005/SoB/%s" % dt

.

-Aditya

avatar

Hi Aditya,

Thanks a lot for your help. Is it possible to do in scala? As i dont have knowledge on python.

avatar
Super Guru

@Lakshmi Prathyusha,

I'm not sure of how to do this in Scala. I guess you may have similar date time functions in Scala as well. You can apply this logic in Scala.