I'm scheduling a sqoop incremental job using a shell script, but it behaves differently from a conventional job, it gives me the following error:
20/07/07 15:38:26 ERROR tool.JobTool: I/O error performing job operation: java.io.IOException: Exception creating SQL connection
Caused by: java.sql.SQLException: File input/output error /home/denap0p/.sqoop/metastore.db.properties java.io.FileNotFoundException: /home/denap0p/.sqoop/metastore.db.properties.new (No such file or directory)
the script has the following instructions:
sqoop job \
--create ch_job_inc \
-- import \
--connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Scheduler' \
--username denap0p \
--password-alias sqlserver.password \
--query 'select * from dbo.Job where $CONDITIONS' \
--incremental lastmodified \
--check-column endtime \
--hcatalog-database hpchistperf \
--hcatalog-table ch_job \
--split-by ID \
--merge-key Id \
--hcatalog-storage-stanza 'stored as parquet'
sqoop job --exec ch_job_inc
Based on the information I've read, it seems that I need to use a metastore. We are using Cloudera CDH-5.14.2-1.cdh5.14.2.p0.3
Any advise is welcome. Thanks in advance.
You don't have to use a sqoop metastore, if you don't have a good reason to. If the same user will be executing the jobs for example, there is limited reason to use sqoop metastore. As I don't see you specifying --meta-connection option in your sqoop job, I suspect the sqoop-site.xml has a setting to try and connect to a sqoop metastore. Can you post settings here that start from sqoop.metastore in the XML file?
If there is any mention of sqoop.metastore settings in your sqoop-site.xml, I would suggest removing them and trying your script again to see if it works. However, this is assuming you are still at PoC stage and no production workloads are at risk of being messed up by XML change.
Thanks for the reply,
I originally didn't know i had to use a meta connection or a metastore until I found that error. So I really don't have a configuration considered yet.
What would you recommend me to do? Is it just matter of adding the --meta-connection parameter?
You don't have to use a sqoop metastore. What I recommend is you check your sqoop-site.xml file. Inside of that file look for any sqoop.metastore settings and remove them. That should stop sqoop jobs from trying to access the sqoop metastore.