Created 10-14-2016 01:02 AM
I have pig output like below in hdfs
a,,b,,
I created target table as,
CREATE SET TABLE panda.test ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT, DEFAULT MERGEBLOCKRATIO ( col VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC, col2 VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC, col3 VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC, col4 VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC) PRIMARY INDEX ( col );
And when I export it
sqoop export --connect jdbc:teradata://xxxxx/database=xx --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username xx --password xx --export-dir /test/xxx/xx/temp/teradata1 --table test --input-fields-terminated-by ',' -m 2 --input-null-string "\\N" --input-null-non-string "\\N"
or
sqoop export --connect jdbc:teradata://xxxxx/database=xx --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username xx --password xx --export-dir /test/xxx/xx/temp/teradata1 --table test --input-fields-terminated-by ',' -m 2
or
sqoop export --connect jdbc:teradata://xxxxx/database=xx --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username xx --password xx --export-dir /test/xxx/xx/temp/teradata1 --table test --input-fields-terminated-by ',' -m 2 --input-null-string "\\" --input-null-non-string "\\"
I get null values in place of blanks in teradata table.
Any one has suggestion so that I can export blank as blank in teradata.
Created 10-14-2016 02:13 AM
Try the following
sqoop export --connect jdbc:teradata://xxxxx/database=xx --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username xx --password xx --export-dir /test/xxx/xx/temp/teradata1 --table test --input-fields-terminated-by ',' -m 2 --input-null-string '\n' --input-null-non-string '\n'
Created 10-14-2016 03:35 AM
Thank you, I already tried it. But It converts blanks into nulls. I want blanks as blanks for pig output stored in hdfs like a,,b,,
Created 10-14-2016 04:30 PM
data: a||c |b|c a|b| a|\N\c a|N|c a|\\N|c a|NULL|c Table: CREATE SET TABLE panda.test_tera2 ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT, DEFAULT MERGEBLOCKRATIO ( col VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC, col2 CHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC, col3 VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC) PRIMARY INDEX ( col ); --------------------------- Command: sqoop export --connect jdbc:teradata://xxx/database=xx --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username --password xx --export-dir /test/xxtemp/teradata2 --table test_tera2 --input-fields-terminated-by '|' -m 2 ------------------------- Result: col col2 col3 1 a ? c 2 ? b c 3 a \N\c ? 4 a N c 5 a b ? 6 a NULL c 7 a \\N c ------------------------------------------------ Command: sqoop export --connect jdbc:teradata://xx/database=xx --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username xx --password xx --export-dir /test/xx/teradata2 --table test_tera2 --input-fields-terminated-by '|' -m 2 --input-null-non-string "\\N" RESULT: col col2 col3 1 a ? c 2 ? b c 3 a \N\c ? 4 a N c 5 a b ? 6 a NULL c 7 a \\N c ------------------------------------------------ Command: sqoop export --connect jdbc:teradata://xx/database=xx --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username xx --password xx --export-dir /test/xx/teradata2 --table test_tera2 --input-fields-terminated-by '|' -m 2 --input-null-non-string "\\N" --input-null-string "\\N" RESULT: col col2 col3 1 a ? c 2 ? b c 3 a \N\c ? 4 a N c 5 a b ? 6 a NULL c 7 a \\N c -------------------------------------------------- I am still failed to export blanks || field between pipes as blanks in teradata with sqoop export. Could someone explain me the reason and way to solve it?
Created 10-14-2016 04:31 PM
@mqureshi I still failed to export blanks as blanks.
Created 10-17-2016 12:51 AM
Hi @Ashish Vishnoi.
In the sqoop-export doco, it says:
"The --input-null-string
and --input-null-non-string
arguments are optional. If --input-null-string
is not specified, then the string "null" will be interpreted as null for string-type columns. If --input-null-non-string
is not specified, then both the string "null" and the empty string will be interpreted as null for non-string columns. Note that, the empty string will be always interpreted as null for non-string columns, in addition to other string if specified by --input-null-non-string
"
There is another discussion here around a similar issue.
If it doesn't work even for string columns, it may be that a workaround of some kind is needed. E.g. conversion of blanks to another character (or set of characters that wouldn't normally be a part of your data set) prior to export which can be converted back to blanks once in Teradata.
Hope this helps.
Created 10-20-2016 02:41 PM
Thank you Emily for your reply.