Support Questions
Find answers, ask questions, and share your expertise

How to import xml data from oracle to hive using scoop?

How to import xml data from oracle to hive using scoop?

New Contributor

I am stuck while importing xml data from oracle database. Please help me.


hduser@hadoop:~$ sqoop import --connect "jdbc:oracle:thin:@xxx:1521:dcpdb" --username "nblt24" --password "nblt24" --query 'SELECT RECID, XMLTYPE.getStringVal(XMLRECORD) FROM F_COMPANY where $CONDITIONS' --target-dir /home/hduser/hadoop/share/hadoop/hdfsa --delete-target-dir --num-mappers 1 --hive-import --hive-table hr.f_company_xml_tab --verbose

Warning: /home/hduser/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hduser/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hduser/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hduser/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/12/24 12:01:00 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
18/12/24 12:01:00 DEBUG tool.BaseSqoopTool: Enabled debug logging.
18/12/24 12:01:00 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/12/24 12:01:00 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
18/12/24 12:01:00 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
18/12/24 12:01:00 DEBUG sqoop.ConnFactory: Loaded manager factory: org.apache.sqoop.manager.oracle.OraOopManagerFactory
18/12/24 12:01:00 DEBUG sqoop.ConnFactory: Loaded manager factory: com.cloudera.sqoop.manager.DefaultManagerFactory
18/12/24 12:01:00 DEBUG sqoop.ConnFactory: Trying ManagerFactory: org.apache.sqoop.manager.oracle.OraOopManagerFactory
18/12/24 12:01:00 DEBUG oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop can be called by Sqoop!
18/12/24 12:01:00 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
18/12/24 12:01:00 DEBUG sqoop.ConnFactory: Trying ManagerFactory: com.cloudera.sqoop.manager.DefaultManagerFactory
18/12/24 12:01:00 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:oracle:thin:@xx:1521
18/12/24 12:01:00 DEBUG manager.OracleManager$ConnCache: Instantiated new connection cache.
18/12/24 12:01:00 INFO manager.SqlManager: Using default fetchSize of 1000
18/12/24 12:01:00 DEBUG sqoop.ConnFactory: Instantiated ConnManager org.apache.sqoop.manager.OracleManager@4df50bcc
18/12/24 12:01:00 INFO tool.CodeGenTool: Beginning code generation
18/12/24 12:01:00 DEBUG manager.SqlManager: Execute getColumnInfoRawQuery : SELECT RECID, XMLTYPE.getStringVal(XMLRECORD) FROM F_COMPANY where (1 = 0)
18/12/24 12:01:00 DEBUG manager.OracleManager: Creating a new connection for jdbc:oracle:thin:@xxx.45:1521:dcpdb, using username: nblt24
18/12/24 12:01:00 DEBUG manager.OracleManager: No connection paramenters specified. Using regular API for making connection.
18/12/24 12:01:00 INFO manager.OracleManager: Time zone has been set to GMT
18/12/24 12:01:00 DEBUG manager.SqlManager: Using fetchSize for next query: 1000
18/12/24 12:01:00 INFO manager.SqlManager: Executing SQL statement: SELECT RECID, XMLTYPE.getStringVal(XMLRECORD) FROM F_COMPANY where (1 = 0)
18/12/24 12:01:00 DEBUG manager.SqlManager: Found column RECID of type [12, 255, 0]
18/12/24 12:01:00 DEBUG manager.SqlManager: Found column XMLTYPE.GETSTRINGVAL(XMLRECORD) of type [12, 4000, 0]
18/12/24 12:01:00 DEBUG manager.SqlManager: Using fetchSize for next query: 1000
18/12/24 12:01:00 INFO manager.SqlManager: Executing SQL statement: SELECT RECID, XMLTYPE.getStringVal(XMLRECORD) FROM F_COMPANY where (1 = 0)
18/12/24 12:01:00 DEBUG manager.SqlManager: Found column RECID
18/12/24 12:01:00 DEBUG manager.SqlManager: Found column XMLTYPE.GETSTRINGVAL(XMLRECORD)
18/12/24 12:01:00 DEBUG orm.ClassWriter: selected columns:
18/12/24 12:01:00 DEBUG orm.ClassWriter: RECID
18/12/24 12:01:00 DEBUG orm.ClassWriter: XMLTYPE_GETSTRINGVAL_XMLRECORD_
18/12/24 12:01:00 DEBUG orm.ClassWriter: Writing source file: /tmp/sqoop-hduser/compile/a5b10acdb0159723f9a2d3fb7b9fc84c/QueryResult.java
18/12/24 12:01:00 DEBUG orm.ClassWriter: Table name: null
18/12/24 12:01:00 DEBUG orm.ClassWriter: Columns: RECID:12, XMLTYPE.GETSTRINGVAL(XMLRECORD):12,
18/12/24 12:01:00 DEBUG orm.ClassWriter: sourceFilename is QueryResult.java
18/12/24 12:01:00 DEBUG orm.CompilationManager: Found existing /tmp/sqoop-hduser/compile/a5b10acdb0159723f9a2d3fb7b9fc84c/
18/12/24 12:01:00 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hduser/hadoop
18/12/24 12:01:00 DEBUG orm.CompilationManager: Returning jar file path /home/hduser/hadoop/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/hadoop-common-2.8.0.jar:/home/hduser/hadoop/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/hadoop-common-2.8.0.jar
18/12/24 12:01:00 DEBUG orm.CompilationManager: Current sqoop classpath = /home/hduser/hadoop/etc/hadoop:/home/hduser/hadoop/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/home/hduser/hadoop/share/hadoop/common/lib/xmlenc-0.52.jar:/home/hduser/hadoop/contrib/capacity-scheduler/*.jar
18/12/24 12:01:00 DEBUG orm.CompilationManager: Adding source file: /tmp/sqoop-hduser/compile/a5b10acdb0159723f9a2d3fb7b9fc84c/QueryResult.java
18/12/24 12:01:00 DEBUG orm.CompilationManager: Invoking javac with args:
18/12/24 12:01:00 DEBUG orm.CompilationManager: -sourcepath
18/12/24 12:01:00 DEBUG orm.CompilationManager: /tmp/sqoop-hduser/compile/a5b10acdb0159723f9a2d3fb7b9fc84c/
18/12/24 12:01:00 DEBUG orm.CompilationManager: -d
18/12/24 12:01:00 DEBUG orm.CompilationManager: /tmp/sqoop-hduser/compile/a5b10acdb0159723f9a2d3fb7b9fc84c/
18/12/24 12:01:00 DEBUG orm.CompilationManager: -classpath
18/12/24 12:01:00 DEBUG orm.CompilationManager: /home/hduser/hadoop/etc/hadoop:/home/hduser/hadoop/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/home/hduser/hadoop/share/hadoop/common/lib/xmlenc-0.52.jar:/home/hduser/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.8.0.jar:/home/hduser/sqoop/sqoop-1.4.6.jar
Note: /tmp/sqoop-hduser/compile/a5b10acdb0159723f9a2d3fb7b9fc84c/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/12/24 12:01:01 DEBUG orm.CompilationManager: Could not rename /tmp/sqoop-hduser/compile/a5b10acdb0159723f9a2d3fb7b9fc84c/QueryResult.java to /home/hduser/./QueryResult.java
18/12/24 12:01:01 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hduser/compile/a5b10acdb0159723f9a2d3fb7b9fc84c/QueryResult.jar
18/12/24 12:01:01 DEBUG orm.CompilationManager: Scanning for .class files in directory: /tmp/sqoop-hduser/compile/a5b10acdb0159723f9a2d3fb7b9fc84c
18/12/24 12:01:01 DEBUG orm.CompilationManager: Got classfile: /tmp/sqoop-hduser/compile/a5b10acdb0159723f9a2d3fb7b9fc84c/QueryResult.class -> QueryResult.class
18/12/24 12:01:01 DEBUG orm.CompilationManager: Finished writing jar file /tmp/sqoop-hduser/compile/a5b10acdb0159723f9a2d3fb7b9fc84c/QueryResult.jar
18/12/24 12:01:02 INFO tool.ImportTool: Destination directory /home/hduser/hadoop/share/hadoop/hdfsa deleted.
18/12/24 12:01:02 INFO mapreduce.ImportJobBase: Beginning query import.
18/12/24 12:01:02 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
18/12/24 12:01:02 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/12/24 12:01:02 DEBUG db.DBConfiguration: Securing password into job credentials store
18/12/24 12:01:02 DEBUG mapreduce.DataDrivenImportJob: Using table class: QueryResult
18/12/24 12:01:02 DEBUG mapreduce.DataDrivenImportJob: Using InputFormat: class com.cloudera.sqoop.mapreduce.db.DataDrivenDBInputFormat
18/12/24 12:01:02 DEBUG manager.OracleManager$ConnCache: Caching released connection for jdbc:oracle:thin:@xxxx:1521:dcpdb/nblt24
18/12/24 12:01:02 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/12/24 12:01:02 DEBUG mapreduce.JobBase: Adding to job classpath: file:/home/hduser/sqoop/sqoop-1.4.6.jar
18/12/24 12:01:02 DEBUG mapreduce.JobBase: Adding to job classpath: file:/home/hduser/sqoop/lib/ant-eclipse-1.0-jvm1.2.jar
18/12/24 12:01:02 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/12/24 12:01:06 DEBUG db.DBConfiguration: Fetching password from job credentials store
18/12/24 12:01:06 INFO db.DBInputFormat: Using read commited transaction isolation
18/12/24 12:01:06 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '1=1' and upper bound '1=1'
18/12/24 12:01:06 INFO mapreduce.JobSubmitter: number of splits:1
18/12/24 12:01:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1545568287772_0004
18/12/24 12:01:07 INFO impl.YarnClientImpl: Submitted application application_1545568287772_0004
18/12/24 12:01:07 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1545568287772_0004/
18/12/24 12:01:07 INFO mapreduce.Job: Running job: job_1545568287772_0004
18/12/24 12:01:13 INFO mapreduce.Job: Job job_1545568287772_0004 running in uber mode : false
18/12/24 12:01:13 INFO mapreduce.Job: map 0% reduce 0%
18/12/24 12:01:19 INFO mapreduce.Job: map 100% reduce 0%
18/12/24 12:01:20 INFO mapreduce.Job: Job job_1545568287772_0004 completed successfully
18/12/24 12:01:20 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0 FILE: Number of bytes written=154736
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87 HDFS: Number of bytes written=784988
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=3847
Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=3847
Total vcore-milliseconds taken by all map tasks=3847
Total megabyte-milliseconds taken by all map tasks=3939328
Map-Reduce Framework
Map input records=203
Map output records=203
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0 GC time elapsed (ms)=56
CPU time spent (ms)=1570
Physical memory (bytes) snapshot=209362944
Virtual memory (bytes) snapshot=1988337664
Total committed heap usage (bytes)=114294784
File Input Format Counters
Bytes Read=0
File Output Format Counters Bytes Written=784988
18/12/24 12:01:20 INFO mapreduce.ImportJobBase: Transferred 766.5898 KB in 17.9024 seconds (42.8204 KB/sec)
18/12/24 12:01:20 INFO mapreduce.ImportJobBase: Retrieved 203 records.
18/12/24 12:01:20 DEBUG hive.HiveImport: Hive.inputTable: null
18/12/24 12:01:20 DEBUG hive.HiveImport: Hive.outputTable: hr.f_company_xml_tab
18/12/24 12:01:20 DEBUG manager.SqlManager: Execute getColumnInfoRawQuery : SELECT RECID, XMLTYPE.getStringVal(XMLRECORD) FROM F_COMPANY where (1 = 0)
18/12/24 12:01:20 DEBUG manager.OracleManager$ConnCache: Got cached connection for jdbc:oracle:thin:@xxx:1521:dcpdb/nblt24
18/12/24 12:01:20 INFO manager.OracleManager: Time zone has been set to GMT
18/12/24 12:01:20 DEBUG manager.SqlManager: Using fetchSize for next query: 1000
18/12/24 12:01:20 INFO manager.SqlManager: Executing SQL statement: SELECT RECID, XMLTYPE.getStringVal(XMLRECORD) FROM F_COMPANY where (1 = 0)
18/12/24 12:01:20 DEBUG manager.SqlManager: Found column RECID of type [12, 255, 0]
18/12/24 12:01:20 DEBUG manager.SqlManager: Found column XMLTYPE.GETSTRINGVAL(XMLRECORD) of type [12, 4000, 0]
18/12/24 12:01:20 DEBUG manager.SqlManager: Using fetchSize for next query: 1000
18/12/24 12:01:20 INFO manager.SqlManager: Executing SQL statement: SELECT RECID, XMLTYPE.getStringVal(XMLRECORD) FROM F_COMPANY where (1 = 0)
18/12/24 12:01:20 DEBUG manager.SqlManager: Found column RECID
18/12/24 12:01:20 DEBUG manager.SqlManager: Found column XMLTYPE.GETSTRINGVAL(XMLRECORD)
18/12/24 12:01:20 DEBUG hive.TableDefWriter: Create statement: CREATE TABLE IF NOT EXISTS `hr.f_company_xml_tab` ( `RECID` STRING, `XMLTYPE.GETSTRINGVAL(XMLRECORD)` STRING) COMMENT 'Imported by sqoop on 2018/12/24 12:01:20' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE
18/12/24 12:01:20 DEBUG hive.TableDefWriter: Load statement: LOAD DATA INPATH 'hdfs://localhost:9000/home/hduser/hadoop/share/hadoop/hdfsa' INTO TABLE `hr.f_company_xml_tab`
18/12/24 12:01:20 INFO hive.HiveImport: Loading uploaded data into Hive
18/12/24 12:01:20 DEBUG hive.HiveImport: Using external Hive process.
18/12/24 12:01:25 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.
18/12/24 12:01:25 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/home/hduser/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
18/12/24 12:01:25 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/home/hduser/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
18/12/24 12:01:25 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/home/hduser/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
18/12/24 12:01:25 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
18/12/24 12:01:25 INFO hive.HiveImport: SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
18/12/24 12:01:26 INFO hive.HiveImport:
18/12/24 12:01:26 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/home/hduser/hive/lib/hive-common-2.3.3.jar!/hive-log4j2.properties Async: true
18/12/24 12:01:28 INFO hive.HiveImport: FailedPredicateException([., :] can not be used in column name in create table statement.,{}?)
18/12/24 12:01:28 INFO hive.HiveImport: at org.apache.hadoop.hive.ql.parse.HiveParser.throwColumnNameException(HiveParser.java:1256)
18/12/24 12:01:28 INFO hive.HiveImport: at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameType(HiveParser.java:32991)
18/12/24 12:01:28 INFO hive.HiveImport: at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameTypeOrPKOrFK(HiveParser.java:33432)
18/12/24 12:01:28 INFO hive.HiveImport: at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameTypeOrPKOrFKList(HiveParser.java:29513)
18/12/24 12:01:28 INFO hive.HiveImport: at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:6175)
18/12/24 12:01:28 INFO hive.HiveImport: at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:3808)
18/12/24 12:01:28 INFO hive.HiveImport: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
18/12/24 12:01:28 INFO hive.HiveImport: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
18/12/24 12:01:28 INFO hive.HiveImport: at java.lang.reflect.Method.invoke(Method.java:498)
18/12/24 12:01:28 INFO hive.HiveImport: at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
18/12/24 12:01:28 INFO hive.HiveImport: FAILED: ParseException line 1:108 Failed to recognize predicate ')'. Failed rule: '[., :] can not be used in column name in create table statement.' in column specification
18/12/24 12:02:26 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Hive exited with status 64

at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:389)
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:339)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:240)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:514)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

18/12/24 12:02:26 DEBUG manager.OracleManager$ConnCache: Caching released connection for jdbc:oracle:thin:@xxx:1521:dcpdb/nblt24

3 REPLIES 3

Re: How to import xml data from oracle to hive using scoop?

Super Collaborator

Hi @Armanur Rahman,

from the error message it seems to be a simple syntax error. From there I think this is the statement with the error:

CREATE TABLE IF NOT EXISTS `hr.f_company_xml_tab`
( `RECID` STRING, `XMLTYPE.GETSTRINGVAL(XMLRECORD)` STRING)

your second column is tried to be named 'XMLTYPE.GETSTRINGVAL(XMLRECORD)' which includes a '(' just as the error message claims. Can you rename the column to an easier name i.e. 'val', and try again?

Regards
Harald

Re: How to import xml data from oracle to hive using scoop?

New Contributor

Hello Sir,

Thank you very much for your reply.

Basically I am trying to import data from oracle to hive using sqoop. In oracle table has 2 field, 1st column is containing normal data and 2nd column containing xml data. For xml data I used getStringVal function. That's why while creating the table it's getting error.

sqoop import --connect "jdbc:oracle:thin:@xxx:1521:dcpdb" --username "nblt24" --password "nblt24" --query 'SELECT RECID, XMLTYPE.getStringVal(XMLRECORD) FROM F_COMPANY where $CONDITIONS' --target-dir /home/hduser/hadoop/share/hadoop/hdfsa --delete-target-dir --num-mappers 1 --hive-import --hive-table hr.f_company_xml_tab --verbose

Is there any other solution to import xml data?

Regards

Arman

Re: How to import xml data from oracle to hive using scoop?

New Contributor

Note: Using XMLTYPE.GETSTRINGVAL won't work if XMLRECORD size is greater than 4000. Use XMLTYPE.GETCLOBVAL


sqoop import --connect "jdbc:oracle:thin:@xxx:1521:dcpdb" --username "nblt24" --password "nblt24" --query 'SELECT RECID, XMLTYPE.GETCLOBVAL(XMLRECORD) FROM F_COMPANY where $CONDITIONS' --target-dir /home/hduser/hadoop/share/hadoop/hdfsa --delete-target-dir --num-mappers 1 --hive-import --hive-table hr.f_company_xml_tab --verbose