Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Reading files from s3 bucket sub folders

avatar
New Member

Hi all,

I am trying to read the files from s3 bucket (which contain many sub directories). As of now i am giving the phyisical path to read the files. How to read the files without hard coded values.

File path : S3 bucket name/Folder/1005/SoB/20180722_zpsx3Gcc7J2MlNnViVp61/JPR_DM2_ORG/ *.gz files

"S3 bucket name/Folder/" this path is fixed one and client id(1005) we have to pass as a parameter.

Under Sob folder, we are having monthly wise folders and I have to take only latest two months data.

Please help me how to read the data without hard-coded.

Many thanks for your help.

1 ACCEPTED SOLUTION

avatar
Super Guru

@Lakshmi Prathyusha,

You can write a simple python snippet like below to read the subfolders. I have put a print statement in the code, but you can replace it some subprocess command to run it.

from datetime import date, timedelta
from dateutil.relativedelta import relativedelta

today = date.today()
two_months_back = today - relativedelta(months=2)

delta = today - two_months_back

for i in range(delta.days + 1):
dt = str(two_months_back + timedelta(i)).replace("-", "")
print "hdfs dfs -ls s3a://bucket/Folder/1005/SoB/%s" % dt

.

-Aditya

View solution in original post

3 REPLIES 3

avatar
Super Guru

@Lakshmi Prathyusha,

You can write a simple python snippet like below to read the subfolders. I have put a print statement in the code, but you can replace it some subprocess command to run it.

from datetime import date, timedelta
from dateutil.relativedelta import relativedelta

today = date.today()
two_months_back = today - relativedelta(months=2)

delta = today - two_months_back

for i in range(delta.days + 1):
dt = str(two_months_back + timedelta(i)).replace("-", "")
print "hdfs dfs -ls s3a://bucket/Folder/1005/SoB/%s" % dt

.

-Aditya

avatar
New Member

Hi Aditya,

Thanks a lot for your help. Is it possible to do in scala? As i dont have knowledge on python.

avatar
Super Guru

@Lakshmi Prathyusha,

I'm not sure of how to do this in Scala. I guess you may have similar date time functions in Scala as well. You can apply this logic in Scala.