Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Sqoop job from python stdout=subprocess.pipe

Highlighted

Sqoop job from python stdout=subprocess.pipe

New Contributor

I am trying to generate Sqoopcommand using python. I am able to pass and fire the Sqoopquery. I wanted to map the column name in Sqoop command --map-column-java and number of columns are different in each column . only BLOB and CLOB needs to be mapped.

Data:

ColumnDatatype
C460VARCHAR2
C459CLOB
C456 BLOB
C60901 Varchar
C8BLOB

Sample code i am using :-

proc=subprocess.Popen(["sqoop", "eval", "--connect","jdbc:oracle:thin:@" + config["Production_host"]+":"+config["port"]+"/"+config['Production_SERVICE_NAME'],"--username", config["Production_User"], "--password", config["Production_Password"], "--query","SELECT column_name, data_type FROM all_tab_columns where table_name =" + "'"+ Tablename + "'"],stdout=subprocess.PIPE)
COl_Re=re.compile('(?m)(C\d+)(?=.+[CB]LOB)')
columns=COl_Re.findall(proc.stdout.read())

i am able to get the required column namesC459,C456,C8 using the above code . output ['C459', 'C456','C8']

i should get with below format

sqoop import --connect  "--connect","jdbc:oracle:thin:@" + config["Production_host"]+":"+config["port"]+"/"+config['Production_SERVICE_NAME'],"--username", config["Production_User"], "--password", config["Production_Password"], --table table --fields-terminated-by '|' --map-column-java C456=String,C459=String,C8=String --hive-drop-import-delims --input-null-string '\\N' --input-null-non-string '\\N'   --as-textfile --target-dir <Location>  -m 1

i only need to add this part --map-column-java C456=String,C459=String,C8=String dynamically so that my next code subprocess.call can use this.