Support Questions

Find answers, ask questions, and share your expertise

How does --hive import work

Expert Contributor

Whenever I use --hive-import argument, I specify a --warehouse-dir as well in my sqoop jobs.

Now, I check my hive tables and data is indeed there but my question is why do I not see any files in warehouse dir using hadoop fs -ls command ?

Sure I do see them when I replace -hive-import and --warehouse-dir with --target-dir .

How does it work? What are the advantages of one over the other?

1 ACCEPTED SOLUTION

Rising Star

@Simran Kaur In sqoop --hive import --warehouse directory is the temporary HDFS directory location which collects the imported data finally it moves the data (metadata of files) to hive.warehouse.dir (generally /apps/hive/warehouse- as we specify in our hive-site.xml)

View solution in original post

10 REPLIES 10

Rising Star

@Simran Kaur In sqoop --hive import --warehouse directory is the temporary HDFS directory location which collects the imported data finally it moves the data (metadata of files) to hive.warehouse.dir (generally /apps/hive/warehouse- as we specify in our hive-site.xml)

Expert Contributor

Awesome.Thank you!

Expert Contributor

so if I replace --warehouse-dir with --target-dir , it would permanently store files in target-dir location and then I can have my tables mapped to this location as external table? @Dileep Kumar Chiguruvada

@Simran Kaur

Can you please check the hive table created using describe formatted <hivetablename> and check the location of the hive data?

It seems like data is being written to different directory and with --warehouse-dir not taking effect.

Thanks and Regards,

Sindhu

Expert Contributor
@Sindhu

You are right. It shows table location as

hdfs://FQDN:8020/user/hive/warehouse/magentodb.db/TABLENAME

Expert Contributor

Why would it ignore the argument? I tried it with target-dir as well and that did not work either @Sindhu

Expert Contributor

I believe it is because of the --hive-import argument? I could remove that but I have to use --hive-overwrite argument and I can't use it unless I use --hive-import. @Sindhu . So, how do I use --hive-overwrite while using warehouse-dir /target-dir?

@Simran Kaur

--target-dir is the while importing table data into HDFS using the Sqoop import tool and might not work with --hive-import.

As @Dileep Kumar Chiguruvada explained earlier, the value of Hive warehouse directory will be picked from hive-site.xml.

Thanks and Regards,

Sindhu

Expert Contributor

@Sindhu: Got it. But, I do not want the data moved out of warehouse dir /target-dir.Is there a solution for that? or I need to do it separately without the hive import option to keep it in hdfs ?Also, the link suggests using hcatalog: http://grokbase.com/t/sqoop/user/143waxddrr/jira-commented-sqoop-1293-hive-import-causes-target-dir-... .Is it really a solution to the problem?

@Simran Kaur

If you have a Hive metastore associated with your HDFS cluster, --hive-import and -hive-overwrite always writes to Hive warehouse directory. Arguments like --warehouse-dir <dir>, --as-avrodatafile, --as-sequencefile, --target-dir etc. are not honoured.

Thanks and Regards,

Sindhu

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.