Created 06-13-2017 09:39 AM
when importing data from sybase all (NULL) values from sybase are written as null strings in the hdfs files
I really want sqoop not to write anything when source contains (NULL) values is that tpossible ?
sqoop import "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" \
--driver net.sourceforge.jtds.jdbc.Driver \
--connect jdbc:jtds:sybase://127.0.0.1:4000/BAS_csrp \
--username xxxx \
--P \
--query "select cpr_nr ,
csrp_koersel_dto ,
haendelse_kod ,
nyt_cpr_nr ,
omplart_kod ,
status_kod ,
umynd_kod ,
umynd_ins_koersel_dto ,
umynd_del_koersel_dto ,
pers_nvn ,
flyt_dto ,
adr_aendr_aarsag_kod ,
adr_bskyt_kod ,
adr_art_kod ,
bopael_kom_kod ,
vej_kod ,
tsr_nr ,
co_nvn ,
adr1 ,
vej_nvn ,
hus_nr ,
hus_bogst ,
etage_nr ,
side_nr ,
doer_nr ,
by_nvn ,
post_nr ,
post_by ,
aegtf_fra_dto ,
aegtf_til_dto ,
aegtf_cpr_nr ,
civst_fra_dto ,
civst_til_dto ,
civst_kod ,
doed_kod ,
doed_dto ,
genopliv_kod ,
genopliv_dto ,
kirke_fra_dto ,
kirke_til_dto ,
kirke_skat_kod ,
far_cpr_nr ,
mor_cpr_nr ,
pbs1_fra_dto ,
pbs1_til_dto ,
pbs2_fra_dto ,
pbs2_til_dto ,
udl_adr1 ,
udl_adr2 ,
udl_adr3 ,
udl_adr4 ,
udl_adr5 ,
kontakt_adr1 ,
kontakt_adr2 ,
kontakt_adr3 ,
kontakt_adr4 ,
kontakt_adr5 ,
supp_adr1 ,
supp_adr2 ,
supp_adr3 ,
emailadr ,
mobiltlfnr ,
stabokod ,
motagdto_sag ,
efterbskdto ,
notattkst ,
ibertperskod ,
senr_1 ,
pernr_1 ,
fradto_1 ,
tildto_1 ,
foskod_1 ,
ibertperskod_1 ,
senr_2 ,
pernr_2 ,
fradto_2 ,
tildto_2 ,
foskod_2 ,
ibertperskod_2a ,
senr_3 ,
pernr_3 ,
fradto_3 ,
tildto_3 ,
foskod_3 ,
ibertperskod_3a ,
pers_pernr ,
pers_omvalgdto ,
pers_fradto ,
pers_tildto ,
pers_fosskapct ,
pers_ibertperskod ,
fri_pernr ,
fri_fradto ,
fri_tildto ,
fri_ambifriko ,
fri_ibertperskod ,
uden_ordnnr_1 ,
uden_pernr_1 ,
uden_fradto_1 ,
uden_til_dto_1 ,
uden_ibertperskod_1b ,
uden_notat_tekst_1 ,
uden_ordnnr_2 ,
uden_pernr_2 ,
uden_fradto_2 ,
uden_til_dto_2 ,
uden_ibertperskod_2b ,
uden_notat_tekst_2 ,
uden_ordnnr_3 ,
uden_pernr_3 ,
uden_fradto_3 ,
uden_til_dto_3 ,
uden_ibertperskod_3b ,
uden_notat_tekst_3 ,
pal_ordnnr ,
pal_pernr ,
pal_fradto ,
pal_tildto ,
ibertperskod_2 ,
notattkst_1 ,
pernr_fripens_1 ,
se_nr_fripens_1 ,
policektonr_1 ,
fri_fra_dto_1 ,
fri_til_dto_1 ,
skatfri_1 ,
ibertperkod_fri_1 ,
notat_fri_1 ,
pernr_fripens_2 ,
se_nr_fripens_2 ,
policektonr_2 ,
fri_fra_dto_2 ,
fri_til_dto_2 ,
skatfri_2 ,
ibertperkod_fri_2 ,
notat_fri_2 ,
notattkst_3 ,
pernr_fripens_3 ,
policektonr_3 ,
fri_fra_dto_3 ,
fri_til_dto_3 ,
skatfri_3 ,
ibertperkod_fri_3 ,
notat_fri_3 ,
vandre_fra_dto ,
vandre_til_dto ,
ibertperkod_vandre ,
notat_vandre ,
digital_fra_dto ,
digital_til_dto from csrp_total_bas where CAST(csrp_koersel_dto as DATE)<'2017-04-07' AND \$CONDITIONS" \
--fields-terminated-by '\t' \
--split-by cpr_nr \
--target-dir /tmp/test/sqoop/csrp/
Created 06-13-2017 12:11 PM
Hi Simon, you can try some of the import control options, you can find them here - https://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html - i.e - The --null-string and --null-non-string arguments are optional.\ If not specified, then the string "null" will be used.
Created 06-14-2017 01:01 PM
Hi Dennis, Thank you very much that worked
Created 07-10-2017 04:20 PM
Hi Dennis,
Even though I declared --null-string as --null-string '\\N' in the sqoop import synatax. still i am seeing "null" string value for all string fields where value is null in source schma. can you please provide any solution to it.
below is my sqoop import syntax.
import -Dhadoop.security.credential.provider.path=${jceks_loc} --password-alias ${passwordAlias} --username ${userName} --connect ${connection} --table ${sourceSchema}.${sourceTable} --hive-import --hive-table ${hiveTable} --target-dir ${hdfsDir} --fields-terminated-by "|" --delete-target-dir --null-string '\\N' -m 1
Thanks,
Brahma