Created 03-17-2016 12:01 PM
I am getting same error in using sqoop import while fetching data from either Mssql or Oracle database.
Oracle ->
sqoop import --connect "jdbc:oracle:thin:@<ServerName>:<PortName>:<Database>" --hadoop-home "/usr/hdp/2.4.0.0-169/hadoop" --username <UserName> --password <Password> --table <TableName> --hive-overwrite --columns "COL_NAM" --hive-import -hive-database ia_db -hive-table par_str_cols --map-column-hive db_user_nam=string --as-parquetfile -m 1
Mssql ->
sqoop import --connect "jdbc:sqlserver://<ServerName>:<PortNo>;database=<DatabaseName>" --username <UserName> --password <Password> --table <TableName> --hive-overwrite --columns "COLUMN1, COLUMN2" --where "COLUMN1 = 7390" --hive-import --target-dir /apps/hive/warehouse/ia_db.db/par_int_cols --as-parquetfile -m 1
I have tried with Integer, Decimal and String columns and getting same error.
Note - I am able to import same data in Hive tables as simple text files but getting this error while importing in Parquet format.
Can someone give some pointers on what "Error: parquet.Preconditions.checkArgument(ZLjava/lang/String;[Ljava/lang/Object;)V" is this?
**********************************************************************************************************
16/03/17 10:43:03 INFO hive.metastore: Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083 16/03/17 10:43:03 INFO hive.metastore: Connected to metastore.
16/03/17 10:43:05 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
16/03/17 10:43:05 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050 16/03/17 10:43:05 INFO hive.metastore: Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083 16/03/17 10:43:05 INFO hive.metastore: Connected to metastore.
16/03/17 10:43:11 INFO db.DBInputFormat: Using read commited transaction isolation
16/03/17 10:43:11 INFO mapreduce.JobSubmitter: number of splits:1
16/03/17 10:43:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1458111880471_0054
16/03/17 10:43:13 INFO impl.YarnClientImpl: Submitted application application_1458111880471_0054
16/03/17 10:43:13 INFO mapreduce.Job: The url to track the job: http://sandbox.hortonworks.com:8088/proxy/application_1458111880471_0054/
16/03/17 10:43:13 INFO mapreduce.Job: Running job: job_1458111880471_0054
16/03/17 10:43:28 INFO mapreduce.Job: Job job_1458111880471_0054 running in uber mode : false
16/03/17 10:43:28 INFO mapreduce.Job: map 0% reduce 0%
16/03/17 10:43:42 INFO mapreduce.Job: map 100% reduce 0%
16/03/17 10:43:42 INFO mapreduce.Job: Task Id : attempt_1458111880471_0054_m_000000_0, Status : FAILED
Error: parquet.Preconditions.checkArgument(ZLjava/lang/String;[Ljava/lang/Object;)V
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
16/03/17 10:43:43 INFO mapreduce.Job: map 0% reduce 0%
16/03/17 10:43:56 INFO mapreduce.Job: map 100% reduce 0% 16/03/17 10:43:56
INFO mapreduce.Job: Task Id : attempt_1458111880471_0054_m_000000_1, Status : FAILED
Error: parquet.Preconditions.checkArgument(ZLjava/lang/String;[Ljava/lang/Object;)V
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
**********************************************************************************************************************
Created 05-03-2016 01:07 AM
I was able to reproduce this issue and it looks like a jar version mismatch for parquet on sandbox. Is there a reason for using parquet instead of ORC here? While both are supported on hive, ORC has advantages when using with hive since some of the stinger initiative improvements to hive take advantage of ORC.
Here is an example of using ORC from sqoop.
sqoop import --connect "jdbc:sqlserver://<ServerName>:<PortNo>;database=<DatabaseName>" --username <UserName> --password <Password> --table <TableName> --columns "COLUMN1, COLUMN2" --where "COLUMN1 = 7390" --hcatalog-database default --hcatalog-table my_table_orc --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile"
Created 05-03-2016 01:07 AM
I was able to reproduce this issue and it looks like a jar version mismatch for parquet on sandbox. Is there a reason for using parquet instead of ORC here? While both are supported on hive, ORC has advantages when using with hive since some of the stinger initiative improvements to hive take advantage of ORC.
Here is an example of using ORC from sqoop.
sqoop import --connect "jdbc:sqlserver://<ServerName>:<PortNo>;database=<DatabaseName>" --username <UserName> --password <Password> --table <TableName> --columns "COLUMN1, COLUMN2" --where "COLUMN1 = 7390" --hcatalog-database default --hcatalog-table my_table_orc --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile"