Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Tutorial exercise 1: Problem ingesting structured data using sqoop does not work

Highlighted

Tutorial exercise 1: Problem ingesting structured data using sqoop does not work

New Contributor

Hi, the following sqoop command does not work in VM 5.7, can anyone help?

 


[cloudera@quickstart ~]$ sqoop import-all-tables \
-m 1 \
--connect jdbc:mysql://quickstart:3306/retail_db \
--username=retail_dba \
--password=cloudera \
--compression-codec=snappy \
--as-parquetfile \
--warehouse-dir=/user/hive/warehouse \
--hive-import

 

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

16/05/21 01:52:17 INFO mapreduce.ImportJobBase: Transferred 3.3652 KB in 57.8114 seconds (59.6077 bytes/sec)
16/05/21 01:52:17 INFO mapreduce.ImportJobBase: Retrieved 58 records.
16/05/21 01:52:17 INFO tool.CodeGenTool: Beginning code generation
16/05/21 01:52:17 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `customers` AS t LIMIT 1
16/05/21 01:52:17 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/2c4f23186b50638117ee1594fae3977f/codegen_categories.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/05/21 01:52:19 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/2c4f23186b50638117ee1594fae3977f/codegen_categories.jar
16/05/21 01:52:19 INFO mapreduce.ImportJobBase: Beginning import of customers
16/05/21 01:52:19 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
16/05/21 01:52:19 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `customers` AS t LIMIT 1
16/05/21 01:52:19 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `customers` AS t LIMIT 1
16/05/21 01:52:19 WARN mapreduce.DataDrivenImportJob: Target Hive table 'customers' exists! Sqoop will append data into the existing Hive table. Consider using --hive-overwrite, if you do NOT intend to do appending.
16/05/21 01:52:20 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/05/21 01:52:28 INFO db.DBInputFormat: Using read commited transaction isolation
16/05/21 01:52:28 INFO mapreduce.JobSubmitter: number of splits:1
16/05/21 01:52:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1463818069837_0006
16/05/21 01:52:29 INFO impl.YarnClientImpl: Submitted application application_1463818069837_0006
16/05/21 01:52:29 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1463818069837_0006/
16/05/21 01:52:29 INFO mapreduce.Job: Running job: job_1463818069837_0006
16/05/21 01:52:57 INFO mapreduce.Job: Job job_1463818069837_0006 running in uber mode : false
16/05/21 01:52:57 INFO mapreduce.Job: map 0% reduce 0%
16/05/21 01:53:13 INFO mapreduce.Job: Task Id : attempt_1463818069837_0006_m_000000_0, Status : FAILED
Error: org.kitesdk.data.DatasetOperationException: Failed to append {"customer_id": 1, "customer_fname": "Richard", "customer_lname": "Hernandez", "customer_email": "XXXXXXXXX", "customer_password": "XXXXXXXXX", "customer_street": "6303 Heather Plaza", "customer_city": "Brownsville", "customer_state": "TX", "customer_zipcode": "78521"} to ParquetAppender{path=hdfs://quickstart.cloudera:8020/tmp/default/.temp/job_1463818069837_0006/mr/attempt_1463818069837_0006_m_000000_0/.cccefc21-ec3f-4a22-94c1-c1dfc2cf088b.parquet.tmp, schema={"type":"record","name":"customers","fields":[{"name":"id","type":["null","int"],"doc":"Converted from 'int'","default":null},{"name":"name","type":["null","string"],"doc":"Converted from 'string'","default":null},{"name":"email_preferences","type":["null",{"type":"record","name":"email_preferences","fields":[{"name":"email_format","type":["null","string"],"doc":"Converted from 'string'","default":null},{"name":"frequency","type":["null","string"],"doc":"Converted from 'string'","default":null},{"name":"categories","type":["null",{"type":"record","name":"categories","fields":[{"name":"promos","type":["null","boolean"],"doc":"Converted from 'boolean'","default":null},{"name":"surveys","type":["null","boolean"],"doc":"Converted from 'boolean'","default":null}]}],"default":null}]}],"default":null},{"name":"addresses","type":["null",{"type":"map","values":["null",{"type":"record","name":"addresses","fields":[{"name":"street_1","type":["null","string"],"doc":"Converted from 'string'","default":null},{"name":"street_2","type":["null","string"],"doc":"Converted from 'string'","default":null},{"name":"city","type":["null","string"],"doc":"Converted from 'string'","default":null},{"name":"state","type":["null","string"],"doc":"Converted from 'string'","default":null},{"name":"zip_code","type":["null","string"],"doc":"Converted from 'string'","default":null}]}]}],"doc":"Converted from 'map<string,struct<street_1:string,street_2:string,city:string,state:string,zip_code:string>>'","default":null},{"name":"orders","type":["null",{"type":"array","items":["null",{"type":"record","name":"orders","fields":[{"name":"order_id","type":["null","string"],"doc":"Converted from 'string'","default":null},{"name":"order_date","type":["null","string"],"doc":"Converted from 'string'","default":null},{"name":"items","type":["null",{"type":"array","items":["null",{"type":"record","name":"items","fields":[{"name":"product_id","type":["null","int"],"doc":"Converted from 'int'","default":null},{"name":"sku","type":["null","string"],"doc":"Converted from 'string'","default":null},{"name":"name","type":["null","string"],"doc":"Converted from 'string'","default":null},{"name":"price","type":["null","double"],"doc":"Converted from 'double'","default":null},{"name":"qty","type":["null","int"],"doc":"Converted from 'int'","default":null}]}]}],"doc":"Converted from 'array<struct<product_id:int,sku:string,name:string,price:double,qty:int>>'","default":null}]}]}],"doc":"Converted from 'array<struct<order_id:string,order_date:string,items:array<struct<product_id:int,sku:string,name:string,price:double,qty:int>>>>'","default":null}]}, fileSystem=DFS[DFSClient[clientName=DFSClient_attempt_1463818069837_0006_m_000000_0_-1719628224_1, ugi=cloudera (auth:SIMPLE)]], avroParquetWriter=parquet.avro.AvroParquetWriter@98a0bfa}
at org.kitesdk.data.spi.filesystem.FileSystemWriter.write(FileSystemWriter.java:184)
at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$DatasetRecordWriter.write(DatasetKeyOutputFormat.java:325)
at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$DatasetRecordWriter.write(DatasetKeyOutputFormat.java:304)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:658)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.sqoop.mapreduce.ParquetImportMapper.map(ParquetImportMapper.java:70)
at org.apache.sqoop.mapreduce.ParquetImportMapper.map(ParquetImportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.avro.generic.IndexedRecord
at org.apache.avro.generic.GenericData.getField(GenericData.java:658)
at parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:164)
at parquet.avro.AvroWriteSupport.writeRecord(AvroWriteSupport.java:149)
at parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:262)
at parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167)
at parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:116)
at parquet.hadoop.ParquetWriter.write(ParquetWriter.java:324)
at org.kitesdk.data.spi.filesystem.ParquetAppender.append(ParquetAppender.java:75)
at org.kitesdk.data.spi.filesystem.ParquetAppender.append(ParquetAppender.java:36)
at org.kitesdk.data.spi.filesystem.FileSystemWriter.write(FileSystemWriter.java:178)
... 16 more

 

1 REPLY 1

Re: Tutorial exercise 1: Problem ingesting structured data using sqoop does not work

Master Collaborator
Try adding the --hive-overwrite flag. The error you posted indicates it's
failing because of a previous failed attempt that hasn't been cleaned up.
There was likely some other failure that occurred the first time, but
there's no telling what it was from this specific log.
Don't have an account?
Coming from Hortonworks? Activate your account here