- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Sqoop : Teradata to HDFS using AVRO file format not working
- Labels:
-
Apache Sqoop
Created ‎09-14-2016 06:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am getting below error when I am trying to import from Teradata to HDFS
Sqoop command :
sqoop import --connection-manager org.apache.sqoop.teradata.TeradataConnManager --connect jdbc:teradata://**.***.***.**/DATABASE=***** --username ****** --password ***** --table employee --target-dir /home/****/tera_to_hdfs125 --as-avrodatafile -m 1
16/09/14 11:56:22 ERROR teradata.TeradataSqoopImportHelper: Exception running Teradata import job com.teradata.connector.common.exception.ConnectorException: no Avro schema is found for type mapping at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:142) at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:58) at org.apache.sqoop.teradata.TeradataSqoopImportHelper.runJob(TeradataSqoopImportHelper.java:370) at org.apache.sqoop.teradata.TeradataConnManager.importTable(TeradataConnManager.java:504) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:148) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235) at org.apache.sqoop.Sqoop.main(Sqoop.java:244) 16/09/14 11:56:22 INFO teradata.TeradataSqoopImportHelper: Teradata import job completed with exit code 1 16/09/14 11:56:22 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Exception running Teradata import job at org.apache.sqoop.teradata.TeradataSqoopImportHelper.runJob(TeradataSqoopImportHelper.java:373) at org.apache.sqoop.teradata.TeradataConnManager.importTable(TeradataConnManager.java:504) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:148) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235) at org.apache.sqoop.Sqoop.main(Sqoop.java:244) Caused by: com.teradata.connector.common.exception.ConnectorException: no Avro schema is found for type mapping at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:142) at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:58) at org.apache.sqoop.teradata.TeradataSqoopImportHelper.runJob(TeradataSqoopImportHelper.java:370) ... 9 more
Please help.
Thanks,
Arkaprova
Created ‎09-14-2016 10:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am getting below error now
Error: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter
I have avro-mapred-1.7.5-hadoop2.jar and avro-1.7.5.jar in my $SQOOP_HOME/lib."
Please help.
Created ‎09-14-2016 08:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Based on this documentation :
I think you need to:
- "Note: If you will run Avro jobs, download avro-mapred-1.7.4-hadoop2.jar and place it under $SQOOP_HOME/lib."
- Give as argument the Avro schema of the data you want to import through the option 'avroschemafile'. This is a connector-specific argument so you would need to do something like:
sqoop import--connection-manager org.apache.sqoop.teradata.TeradataConnManager--connect jdbc:teradata://**.***.***.**/DATABASE=***** --username ****** --password ***** --table employee --target-dir /home/****/tera_to_hdfs125 --as-avrodatafile -m 1 -- --avroschemafile <schema>
Hope this helps.
Created ‎09-14-2016 10:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am getting below error now
Error: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter
I have avro-mapred-1.7.5-hadoop2.jar and avro-1.7.5.jar in my $SQOOP_HOME/lib."
Please help.
Created ‎09-14-2016 10:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you have a full stack trace that you could share? What is your schema (maybe some types are not yet supported with Teradata connector depending of the version)?
Created ‎09-14-2016 11:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Below is the full stack trace.
16/09/14 15:49:10 INFO mapreduce.Job: Running job: job_1473774257007_0002 16/09/14 15:49:19 INFO mapreduce.Job: Job job_1473774257007_0002 running in uber mode : false 16/09/14 15:49:19 INFO mapreduce.Job: map 0% reduce 0% 16/09/14 15:49:22 INFO mapreduce.Job: Task Id : attempt_1473774257007_0002_m_000000_0, Status : FAILED Error: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter; Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143
16/09/14 15:49:25 INFO mapreduce.Job: Task Id : attempt_1473774257007_0002_m_000000_1, Status : FAILED Error: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter; 16/09/14 15:49:29 INFO mapreduce.Job: Task Id : attempt_1473774257007_0002_m_000000_2, Status : FAILED Error: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter; 16/09/14 15:49:35 INFO mapreduce.Job: map 100% reduce 0% 16/09/14 15:49:36 INFO mapreduce.Job: Job job_1473774257007_0002 failed with state FAILED due to: Task failed task_1473774257007_0002_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0
16/09/14 15:49:36 INFO mapreduce.Job: Counters: 12 Job Counters Failed map tasks=4 Launched map tasks=4 Other local map tasks=3 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=8818 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=8818 Total vcore-seconds taken by all map tasks=8818 Total megabyte-seconds taken by all map tasks=18059264 Map-Reduce Framework CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 16/09/14 15:49:36 INFO processor.TeradataInputProcessor: input postprocessor com.teradata.connector.teradata.processor.TeradataSplitByHashProcessor starts at: 1473848376584 16/09/14 15:49:37 INFO processor.TeradataInputProcessor: input postprocessor com.teradata.connector.teradata.processor.TeradataSplitByHashProcessor ends at: 1473848376584 16/09/14 15:49:37 INFO processor.TeradataInputProcessor: the total elapsed time of input postprocessor com.teradata.connector.teradata.processor.TeradataSplitByHashProcessor is: 0s 16/09/14 15:49:37 INFO teradata.TeradataSqoopImportHelper: Teradata import job completed with exit code 1 16/09/14 15:49:37 ERROR tool.ImportTool: Error during import: Import Job failed
Schema :
{ "type" : "record", "namespace" : "avronamespace", "name" : "Employee", "fields" : [ { "name" : "Id" , "type" : "string" }, { "name" : "Name" , "type" : "string" } ] }
Also my concern is , why avro schema file is required here. I am trying to import data from Teradata to HDFS using avro file format. Please help.
Created ‎09-14-2016 11:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"When Avro data is stored in a file, its schema is stored with it, so that files may be processed later by any program."
I believe the schema is required so it is stored with the data you imported into HDFS.
Could you run the following command to have more details about the error?
yarn logs -applicationId application_1473774257007_0002
Created ‎09-14-2016 03:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Below is from yarn log
2016-09-14 15:49:29,345 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter; at org.apache.avro.mapreduce.AvroKeyRecordWriter.<init>(AvroKeyRecordWriter.java:53) at org.apache.avro.mapreduce.AvroKeyOutputFormat$RecordWriterFactory.create(AvroKeyOutputFormat.java:78) at org.apache.avro.mapreduce.AvroKeyOutputFormat.getRecordWriter(AvroKeyOutputFormat.java:104) at com.teradata.connector.hdfs.HdfsAvroOutputFormat.getRecordWriter(HdfsAvroOutputFormat.java:49) at com.teradata.connector.common.ConnectorOutputFormat$ConnectorFileRecordWriter.<init>(ConnectorOutputFormat.java:89) at com.teradata.connector.common.ConnectorOutputFormat.getRecordWriter(ConnectorOutputFormat.java:38) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) 2016-09-14 15:49:29,351 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics system... 2016-09-14 15:49:29,351 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system stopped. 2016-09-14 15:49:29,352 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system shutdown complete. End of LogType:syslog
Created ‎09-21-2016 11:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is working with -Dmapreduce.job.user.classpath.first=true option. Thanks a lot.
Created ‎09-14-2016 11:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm not sure about the --connection-manager option, but I have successfully performed a sqoop import from Teradata to AVRO using Teradata's JDBC driver as follows:
sqoop import --driver com.teradata.jdbc.TeraDriver \ --connect 'jdbc:teradata://****/DATABASE=****' \ --username **** --password **** \ --table MyTable \ --target-dir /****/****/**** \ --as-avrodatafile \ --num-mappers 1
Just ensure that the JBDC driver, terajdbc4.jar, is in your $SQOOP_LIB folder. For me, on HDP 2.4 that is /usr/hdp/current/sqoop-client/lib
Created ‎09-15-2016 06:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Steven O'Neill Thanks a lot 🙂 . This is working for me .
