Support Questions

Find answers, ask questions, and share your expertise

Sqoop : Teradata to HDFS using AVRO file format not working

avatar
Expert Contributor

I am getting below error when I am trying to import from Teradata to HDFS

Sqoop command :

sqoop import --connection-manager org.apache.sqoop.teradata.TeradataConnManager --connect jdbc:teradata://**.***.***.**/DATABASE=***** --username ****** --password ***** --table employee --target-dir /home/****/tera_to_hdfs125 --as-avrodatafile -m 1

16/09/14 11:56:22 ERROR teradata.TeradataSqoopImportHelper: Exception running Teradata import job com.teradata.connector.common.exception.ConnectorException: no Avro schema is found for type mapping at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:142) at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:58) at org.apache.sqoop.teradata.TeradataSqoopImportHelper.runJob(TeradataSqoopImportHelper.java:370) at org.apache.sqoop.teradata.TeradataConnManager.importTable(TeradataConnManager.java:504) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:148) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235) at org.apache.sqoop.Sqoop.main(Sqoop.java:244) 16/09/14 11:56:22 INFO teradata.TeradataSqoopImportHelper: Teradata import job completed with exit code 1 16/09/14 11:56:22 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Exception running Teradata import job at org.apache.sqoop.teradata.TeradataSqoopImportHelper.runJob(TeradataSqoopImportHelper.java:373) at org.apache.sqoop.teradata.TeradataConnManager.importTable(TeradataConnManager.java:504) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:148) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235) at org.apache.sqoop.Sqoop.main(Sqoop.java:244) Caused by: com.teradata.connector.common.exception.ConnectorException: no Avro schema is found for type mapping at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:142) at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:58) at org.apache.sqoop.teradata.TeradataSqoopImportHelper.runJob(TeradataSqoopImportHelper.java:370) ... 9 more

Please help.

Thanks,

Arkaprova

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@Pierre Villard

I am getting below error now

Error: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter

I have avro-mapred-1.7.5-hadoop2.jar and avro-1.7.5.jar in my $SQOOP_HOME/lib."

Please help.

View solution in original post

9 REPLIES 9

avatar

Hi,

Based on this documentation :

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_HortonworksConnectorForTeradata/content/...

I think you need to:

- "Note: If you will run Avro jobs, download avro-mapred-1.7.4-hadoop2.jar and place it under $SQOOP_HOME/lib."

- Give as argument the Avro schema of the data you want to import through the option 'avroschemafile'. This is a connector-specific argument so you would need to do something like:

sqoop import--connection-manager org.apache.sqoop.teradata.TeradataConnManager--connect jdbc:teradata://**.***.***.**/DATABASE=***** --username ****** --password ***** --table employee --target-dir /home/****/tera_to_hdfs125 --as-avrodatafile -m 1 -- --avroschemafile <schema>

Hope this helps.

avatar
Expert Contributor

@Pierre Villard

I am getting below error now

Error: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter

I have avro-mapred-1.7.5-hadoop2.jar and avro-1.7.5.jar in my $SQOOP_HOME/lib."

Please help.

avatar

Do you have a full stack trace that you could share? What is your schema (maybe some types are not yet supported with Teradata connector depending of the version)?

avatar
Expert Contributor

Below is the full stack trace.

16/09/14 15:49:10 INFO mapreduce.Job: Running job: job_1473774257007_0002 16/09/14 15:49:19 INFO mapreduce.Job: Job job_1473774257007_0002 running in uber mode : false 16/09/14 15:49:19 INFO mapreduce.Job: map 0% reduce 0% 16/09/14 15:49:22 INFO mapreduce.Job: Task Id : attempt_1473774257007_0002_m_000000_0, Status : FAILED Error: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter; Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143

16/09/14 15:49:25 INFO mapreduce.Job: Task Id : attempt_1473774257007_0002_m_000000_1, Status : FAILED Error: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter; 16/09/14 15:49:29 INFO mapreduce.Job: Task Id : attempt_1473774257007_0002_m_000000_2, Status : FAILED Error: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter; 16/09/14 15:49:35 INFO mapreduce.Job: map 100% reduce 0% 16/09/14 15:49:36 INFO mapreduce.Job: Job job_1473774257007_0002 failed with state FAILED due to: Task failed task_1473774257007_0002_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0

16/09/14 15:49:36 INFO mapreduce.Job: Counters: 12 Job Counters Failed map tasks=4 Launched map tasks=4 Other local map tasks=3 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=8818 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=8818 Total vcore-seconds taken by all map tasks=8818 Total megabyte-seconds taken by all map tasks=18059264 Map-Reduce Framework CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 16/09/14 15:49:36 INFO processor.TeradataInputProcessor: input postprocessor com.teradata.connector.teradata.processor.TeradataSplitByHashProcessor starts at: 1473848376584 16/09/14 15:49:37 INFO processor.TeradataInputProcessor: input postprocessor com.teradata.connector.teradata.processor.TeradataSplitByHashProcessor ends at: 1473848376584 16/09/14 15:49:37 INFO processor.TeradataInputProcessor: the total elapsed time of input postprocessor com.teradata.connector.teradata.processor.TeradataSplitByHashProcessor is: 0s 16/09/14 15:49:37 INFO teradata.TeradataSqoopImportHelper: Teradata import job completed with exit code 1 16/09/14 15:49:37 ERROR tool.ImportTool: Error during import: Import Job failed

Schema :

{ "type" : "record", "namespace" : "avronamespace", "name" : "Employee", "fields" : [ { "name" : "Id" , "type" : "string" }, { "name" : "Name" , "type" : "string" } ] }

Also my concern is , why avro schema file is required here. I am trying to import data from Teradata to HDFS using avro file format. Please help.

avatar
"When Avro data is stored in a file, its schema is stored with it, so that files may be processed later by any program."

I believe the schema is required so it is stored with the data you imported into HDFS.

Could you run the following command to have more details about the error?

yarn logs -applicationId application_1473774257007_0002

avatar
Expert Contributor

Below is from yarn log

2016-09-14 15:49:29,345 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter; at org.apache.avro.mapreduce.AvroKeyRecordWriter.<init>(AvroKeyRecordWriter.java:53) at org.apache.avro.mapreduce.AvroKeyOutputFormat$RecordWriterFactory.create(AvroKeyOutputFormat.java:78) at org.apache.avro.mapreduce.AvroKeyOutputFormat.getRecordWriter(AvroKeyOutputFormat.java:104) at com.teradata.connector.hdfs.HdfsAvroOutputFormat.getRecordWriter(HdfsAvroOutputFormat.java:49) at com.teradata.connector.common.ConnectorOutputFormat$ConnectorFileRecordWriter.<init>(ConnectorOutputFormat.java:89) at com.teradata.connector.common.ConnectorOutputFormat.getRecordWriter(ConnectorOutputFormat.java:38) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) 2016-09-14 15:49:29,351 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics system... 2016-09-14 15:49:29,351 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system stopped. 2016-09-14 15:49:29,352 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system shutdown complete. End of LogType:syslog

avatar
Expert Contributor
@Pierre Villard

This is working with -Dmapreduce.job.user.classpath.first=true option. Thanks a lot.

avatar
Contributor

@Arkaprova Saha

I'm not sure about the --connection-manager option, but I have successfully performed a sqoop import from Teradata to AVRO using Teradata's JDBC driver as follows:

sqoop import --driver com.teradata.jdbc.TeraDriver \
--connect 'jdbc:teradata://****/DATABASE=****' \
--username **** --password **** \
--table MyTable \
--target-dir /****/****/**** \
--as-avrodatafile \
--num-mappers 1

Just ensure that the JBDC driver, terajdbc4.jar, is in your $SQOOP_LIB folder. For me, on HDP 2.4 that is /usr/hdp/current/sqoop-client/lib

avatar
Expert Contributor

@Steven O'Neill Thanks a lot 🙂 . This is working for me .