Support Questions

Find answers, ask questions, and share your expertise

Getting Error in Sqoop Import from Oracle & Mssql database in Hive Table as Parquet Format

avatar
New Contributor

I am getting same error in using sqoop import while fetching data from either Mssql or Oracle database.

Oracle ->

sqoop import --connect "jdbc:oracle:thin:@<ServerName>:<PortName>:<Database>" --hadoop-home "/usr/hdp/2.4.0.0-169/hadoop" --username <UserName> --password <Password> --table <TableName> --hive-overwrite --columns "COL_NAM" --hive-import -hive-database ia_db -hive-table par_str_cols --map-column-hive db_user_nam=string --as-parquetfile -m 1

Mssql ->

sqoop import --connect "jdbc:sqlserver://<ServerName>:<PortNo>;database=<DatabaseName>" --username <UserName> --password <Password> --table <TableName> --hive-overwrite --columns "COLUMN1, COLUMN2" --where "COLUMN1 = 7390" --hive-import --target-dir /apps/hive/warehouse/ia_db.db/par_int_cols --as-parquetfile -m 1

I have tried with Integer, Decimal and String columns and getting same error.

Note - I am able to import same data in Hive tables as simple text files but getting this error while importing in Parquet format.

Can someone give some pointers on what "Error: parquet.Preconditions.checkArgument(ZLjava/lang/String;[Ljava/lang/Object;)V" is this?

**********************************************************************************************************

16/03/17 10:43:03 INFO hive.metastore: Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083 16/03/17 10:43:03 INFO hive.metastore: Connected to metastore.

16/03/17 10:43:05 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/

16/03/17 10:43:05 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050 16/03/17 10:43:05 INFO hive.metastore: Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083 16/03/17 10:43:05 INFO hive.metastore: Connected to metastore.

16/03/17 10:43:11 INFO db.DBInputFormat: Using read commited transaction isolation

16/03/17 10:43:11 INFO mapreduce.JobSubmitter: number of splits:1

16/03/17 10:43:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1458111880471_0054

16/03/17 10:43:13 INFO impl.YarnClientImpl: Submitted application application_1458111880471_0054

16/03/17 10:43:13 INFO mapreduce.Job: The url to track the job: http://sandbox.hortonworks.com:8088/proxy/application_1458111880471_0054/

16/03/17 10:43:13 INFO mapreduce.Job: Running job: job_1458111880471_0054

16/03/17 10:43:28 INFO mapreduce.Job: Job job_1458111880471_0054 running in uber mode : false

16/03/17 10:43:28 INFO mapreduce.Job: map 0% reduce 0%

16/03/17 10:43:42 INFO mapreduce.Job: map 100% reduce 0%

16/03/17 10:43:42 INFO mapreduce.Job: Task Id : attempt_1458111880471_0054_m_000000_0, Status : FAILED

Error: parquet.Preconditions.checkArgument(ZLjava/lang/String;[Ljava/lang/Object;)V

Container killed by the ApplicationMaster.

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

16/03/17 10:43:43 INFO mapreduce.Job: map 0% reduce 0%

16/03/17 10:43:56 INFO mapreduce.Job: map 100% reduce 0% 16/03/17 10:43:56

INFO mapreduce.Job: Task Id : attempt_1458111880471_0054_m_000000_1, Status : FAILED

Error: parquet.Preconditions.checkArgument(ZLjava/lang/String;[Ljava/lang/Object;)V

Container killed by the ApplicationMaster.

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

**********************************************************************************************************************

1 ACCEPTED SOLUTION

avatar
Guru

I was able to reproduce this issue and it looks like a jar version mismatch for parquet on sandbox. Is there a reason for using parquet instead of ORC here? While both are supported on hive, ORC has advantages when using with hive since some of the stinger initiative improvements to hive take advantage of ORC.

Here is an example of using ORC from sqoop.

sqoop import --connect "jdbc:sqlserver://<ServerName>:<PortNo>;database=<DatabaseName>" --username <UserName> --password <Password> --table <TableName> --columns "COLUMN1, COLUMN2" --where "COLUMN1 = 7390" --hcatalog-database default --hcatalog-table my_table_orc --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile"

View solution in original post

1 REPLY 1

avatar
Guru

I was able to reproduce this issue and it looks like a jar version mismatch for parquet on sandbox. Is there a reason for using parquet instead of ORC here? While both are supported on hive, ORC has advantages when using with hive since some of the stinger initiative improvements to hive take advantage of ORC.

Here is an example of using ORC from sqoop.

sqoop import --connect "jdbc:sqlserver://<ServerName>:<PortNo>;database=<DatabaseName>" --username <UserName> --password <Password> --table <TableName> --columns "COLUMN1, COLUMN2" --where "COLUMN1 = 7390" --hcatalog-database default --hcatalog-table my_table_orc --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile"