Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

CDH 5.3 and Sqoop2 startup and use error

Explorer

Hello.

 

I had a CDH 5.1.3 installation where I was able to use the Sqoop2 client to import data into Hive and HBase. After upgrading to CDH 5.3 I am unable to get neither the Sqoop2 service to start successfully, nor am I able to use the Sqoop2 client. What do I need to do to get this to work as it did before the upgrade?

 

The errors reported in /var/log/sqoop2/catalna log are as follows:

... ... ...

Jan 20, 2015 4:10:54 PM org.apache.tomcat.util.digester.Digester endElement
WARNING:   No rules found matching 'HTML/BODY'.
Jan 20, 2015 4:10:54 PM org.apache.tomcat.util.digester.Digester endElement
WARNING:   No rules found matching 'HTML'.
Jan 20, 2015 4:10:54 PM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 59 ms
Jan 20, 2015 4:10:54 PM org.apache.catalina.startup.Catalina start
SEVERE: Cannot start server. Server instance is not configured.

 

 

And, the Sqoop2 client throws an exception, and following is the output it displays:

15/01/20 19:58:13 FATAL conf.Configuration: error parsing conf core-default.xml
javax.xml.parsers.ParserConfigurationException: Feature 'http://apache.org/xml/features/xinclude' is not recognized.
        at org.apache.xerces.jaxp.DocumentBuilderFactoryImpl.newDocumentBuilder(Unknown Source)
        at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2375)
        at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2337)
        at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2254)
        at org.apache.hadoop.conf.Configuration.get(Configuration.java:861)
        at org.apache.sqoop.tool.SqoopTool.loadPluginsFromConfDir(SqoopTool.java:170)
        at org.apache.sqoop.tool.SqoopTool.loadPlugins(SqoopTool.java:140)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:208)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Exception in thread "main" java.lang.RuntimeException: javax.xml.parsers.ParserConfigurationException: Feature 'http://apache.org/xml/features/xinclude' is not recognized.
        at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2493)
        at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2337)
        at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2254)
        at org.apache.hadoop.conf.Configuration.get(Configuration.java:861)
        at org.apache.sqoop.tool.SqoopTool.loadPluginsFromConfDir(SqoopTool.java:170)
        at org.apache.sqoop.tool.SqoopTool.loadPlugins(SqoopTool.java:140)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:208)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: javax.xml.parsers.ParserConfigurationException: Feature 'http://apache.org/xml/features/xinclude' is not recognized.
        at org.apache.xerces.jaxp.DocumentBuilderFactoryImpl.newDocumentBuilder(Unknown Source)
        at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2375)
        ... 8 more

 

1 ACCEPTED SOLUTION

Explorer

For anyone else who may experience this:

The cause was a HSQLDB jar file present on the server outside of CDH installation and which was installed as part of another software. Removing this Jar file allowed Sqoop to import files from Sql Server (and MySql also). So problem solved.

View solution in original post

15 REPLIES 15

Explorer

Also I should add:

- I have tried first by upgrading CDH 5.1.3 to CDH 5.3. When I could not get Sqoop2 to work I performed a full CDH install.

- All the other services (HDFS, Hive, Hue, HBase, Solr, Spark, Impala)  that I have configured seem to be up and functioning properly.

- I have tried to remove the Sqoop2 service and add it again.

Explorer

In attempting to analyze the problem I recreated a similar setup on another hardware. And in this setup Sqoop seems to start properly. The question then becomes how do I analyze the problem? What other logs are relevant?

Expert Contributor
The error "error parsing conf core-default.xml" seems to suggest that your
hadoop client configuration is malformed. Can you go through your hadoop
configuration and ensure that it's all proper XML please?

Explorer

Thanks for your response.

 

The Sqoop2 client errors, I suspect, are the consequence of a problem somewhere else. Comparing with two setups where I attempted the same I do not find core-default.xml file anywhere on any of the file systems. Isn't the successful install of Sqoop2 service be a pre-requisite for the client to function?

 

At this time I am inclined to reinstall the base OS and install CDH on it. So the following is only to provide info that may help analyze and  improve debuggability of CDH in this aspect of installation, that the reported info does not seem to help identify the cause of the problem.

 

Following is reported by the installer when it installs Sqoop2 service.

 

Service did not start successfully; not all of the required roles started: Service has only 0 Sqoop 2 Server roles running instead of minimum required 1.
Supervisor returned FATAL. Please check the role log file, stderr, or stdout.
Program: sqoop/sqoop.sh []
Recent Log EntriesLinks to full logs:
2015-01-22 08:08:25,483 INFO org.apache.sqoop.connector.ConnectorHandler: Connector [org.apache.sqoop.connector.hdfs.HdfsConnector] initialized.
2015-01-22 08:08:25,511 INFO org.apache.sqoop.connector.ConnectorHandler: Connector [org.apache.sqoop.connector.jdbc.GenericJdbcConnector] initialized.
2015-01-22 08:08:25,679 INFO org.apache.sqoop.repository.JdbcRepositoryTransaction: Attempting transaction commit
2015-01-22 08:08:25,680 INFO org.apache.sqoop.connector.ConnectorManager: Connectors loaded: {hdfs-connector={hdfs-connector:org.apache.sqoop.connector.hdfs.HdfsConnector:jar:file:/var/lib/sqoop2/tomcat-deployment/webapps/sqoop/WEB-INF/lib/sqoop-connector-hdfs-1.99.4-cdh5.3.0.jar!/sqoopconnector.properties}, generic-jdbc-connector={generic-jdbc-connector:org.apache.sqoop.connector.jdbc.GenericJdbcConnector:jar:file:/var/lib/sqoop2/tomcat-deployment/webapps/sqoop/WEB-INF/lib/sqoop-connector-generic-jdbc-1.99.4-cdh5.3.0.jar!/sqoopconnector.properties}}
2015-01-22 08:08:25,680 INFO org.apache.sqoop.tools.tool.UpgradeTool: Initializing the Driver with upgrade option turned on.
2015-01-22 08:08:25,694 INFO org.apache.sqoop.repository.JdbcRepositoryTransaction: Attempting transaction commit
2015-01-22 08:08:25,695 INFO org.apache.sqoop.driver.Driver: Driver initialized: OK
2015-01-22 08:08:25,695 INFO org.apache.sqoop.tools.tool.UpgradeTool: Upgrade completed successfully.
2015-01-22 08:08:25,695 INFO org.apache.sqoop.tools.tool.UpgradeTool: Tearing all managers down.
2015-01-22 08:08:25,750 INFO org.apache.sqoop.repository.derby.DerbyRepositoryHandler: Embedded Derby shutdown raised SQL STATE 45000 as expected.
2015-01-22 08:08:25,750 INFO org.apache.sqoop.repository.JdbcRepositoryProvider: Deregistering JDBC driver
2015-01-22 08:08:25,751 INFO org.apache.sqoop.core.PropertiesConfigurationProvider: Shutting down configuration poller thread

 

 

 

 

Explorer

With CDH 5.3 reinstalled on a fresh Ubuntu 14.04 installation without any errors. But the Sqoop scripts/commands that worked with CDH 5.1 now do not work. The logged output indicates the SqlServer JDBC driver is not chosen by Sqoop. Sqoop incorrectly chooses Hsql driver even though the JDBC URL specifies sqlserver.  I have placed a copy of the SqlServer JDBC driver in /var/lib/sqoop, /var/lib/sqoop2 as well as in /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/jars.

 

As an experiment I also tried to using Sqoop2 also. But add link fails. I have posted the output at the end of this post.

 

What do I need to do to make this work?

 

 

The command is as below:

sqoop import \
    --driver com.microsoft.sqlserver.jdbc.SQLServerDriver \
    --connect jdbc:sqlserver://sqlserver:1433\;databaseName=dbname \
    --username sa --password thepasswd \
    --direct \
    --num-mappers 1 \
    --hive-import \
    --table person\
    --map-column-hive row_timestamp=timestamp \
    --verbose

 

The output of the command is below:

Warning: /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/01/27 10:56:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.3.0
15/01/27 10:56:53 DEBUG tool.BaseSqoopTool: Enabled debug logging.
15/01/27 10:56:53 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/01/27 10:56:53 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
15/01/27 10:56:53 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
15/01/27 10:56:53 DEBUG sqoop.ConnFactory: Loaded manager factory: org.apache.sqoop.manager.oracle.OraOopManagerFactory
15/01/27 10:56:53 DEBUG sqoop.ConnFactory: Loaded manager factory: com.cloudera.sqoop.manager.DefaultManagerFactory
15/01/27 10:56:53 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
15/01/27 10:56:53 INFO manager.SqlManager: Using default fetchSize of 1000
15/01/27 10:56:53 INFO tool.CodeGenTool: Beginning code generation
15/01/27 10:56:53 DEBUG manager.SqlManager: Execute getColumnInfoRawQuery : SELECT t.* FROM person AS t WHERE 1=0
15/01/27 10:56:53 DEBUG manager.SqlManager: No connection paramenters specified. Using regular API for making connection.
Exception in thread "main" java.lang.NoSuchMethodError: org.hsqldb.DatabaseURL.parseURL(Ljava/lang/String;ZZ)Lorg/hsqldb/persist/HsqlProperties;
        at org.hsqldb.jdbc.JDBCDriver.getConnection(Unknown Source)
        at org.hsqldb.jdbc.JDBCDriver.connect(Unknown Source)
        at java.sql.DriverManager.getConnection(DriverManager.java:571)
        at java.sql.DriverManager.getConnection(DriverManager.java:215)
        at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:877)
        at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
        at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:736)
        at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:759)
        at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:269)
        at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:240)
        at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:226)
        at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
        at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833)
        at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645)
        at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

 

 

Output of Sqoop2 add link:

sqoop:000> set server --host localhost --port 12000 --webapp sqoop
Server is set successfully
sqoop:000> create link --cid 2
Creating link for connector with id 2
Please fill following values to create new link object
Name: ngreportdb

Link configuration

JDBC Driver Class: com.microsoft.sqlserver.jdbc.SQLServerDriver
JDBC Connection String: jdbc:sqlserver://sqlserver:1433;databaseName=dbname
Username: sa
Password: *************
JDBC Connection Properties:
There are currently 0 values in the map:
entry#

 There are issues with entered data, please revise your input:

Explorer

For anyone else who may experience this:

The cause was a HSQLDB jar file present on the server outside of CDH installation and which was installed as part of another software. Removing this Jar file allowed Sqoop to import files from Sql Server (and MySql also). So problem solved.

Master Collaborator

Thank you for providing the solution, @arund 

New Contributor
Hey, I have encountered the same issue, and I found one path including this hsqldb.jar file and two outside of CDH:
/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/client-0.20/hsqldb.jar

/usr/share/java/hsqldb.jar

/usr/local/remote/packages/fortify_360_remote/2.6.0/PTA/Core/lib/hsqldb.jar

Should I remove both the two out of CDH? the last one is a Read-only directory, how did you do it?

Thanks

Explorer

In my case removing the non CDH HSQDLDB Jar files was acceptable to me. But in general I suppose this could be corrected so that CDH does not use the Jar files that are elsewhere. Perhaps you could make this a suggestion in appropriate forum / ticket system at Cloudera.

New Contributor
Thanks for your reply.

I reinstalled the CDH 5.3.0 via Cloudera Manager, and found log under:
/var/log/sqoop2/sqoop-cmf-sqoop-SQOOP_SERVER-slc05hwl.us.oracle.com.log.out

error saying:


2015-01-30 09:03:04,822 ERROR org.apache.sqoop.server.ServerInitializer: Sqoop server failed to start
java.lang.RuntimeException: Failure in server initialization

.....

Caused by: org.apache.sqoop.common.SqoopException: JDBCREPO_0007:Unable to lease link
at org.apache.sqoop.repository.JdbcRepositoryTransaction.begin(JdbcRepositoryTransaction.java:63)
......

Caused by: org.apache.sqoop.common.SqoopException: JDBCREPO_0007:Unable to lease link
at org.apache.sqoop.repository.JdbcRepositoryTransaction.begin(JdbcRepositoryTransaction.java:63)
.....
Caused by: java.sql.SQLException: No suitable driver found for jdbc:derby:/var/lib/sqoop2/repository/db;create=true
at java.sql.DriverManager.getConnection(DriverManager.java:596)
at java.sql.DriverManager.getConnection(DriverManager.java:187)
at org.apache.commons.dbcp.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:78)
at org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:582)
at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1148)
at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
at org.apache.sqoop.repository.JdbcRepositoryTransaction.begin(JdbcRepositoryTransaction.java:61)
... 31 more



And I added derby-10.11.1.1.jar to /var/lib/sqoop2/

then re-ran the Cluster Settings and error came out as you did mentioned


Do you have any idea about this?

Thank you very much

Explorer

Not exactly same but I had similar installation issue with Sqoop2. In my case I had to reinstall the OS and get past the issue. It was the most time saving solution for me after spending good amount of time trying to analyze the problem.

New Contributor
well , that is lucky for you.

I re-imaged the server and issue still exists... no idea.


Any way, thank you ~~

New Contributor

I've upgraded CDH from 5.2.1 to 5.3.2 and got the same errors as in the first message. Sqoop2 service could not be started.

 

After several hours of investigations I've found out that new sqoop2 is not compatible with xercesImpl.jar from libxerces2-java (Debian Wheezy). It cannot be removed because other packages depend on it.

 

The workaround is to remove /usr/share/java/*.jar from common.loader=... in /var/lib/sqoop2/tomcat-deployment/conf/catalina.properties.

Explorer

Many Thanks, it worked for me!

I followed Yugune's reply.

 

The workaround is to remove /usr/share/java/*.jar from common.loader=... in /var/lib/sqoop2/tomcat-deployment/conf/catalina.properties.

New Contributor
Encountered this myself on Red Hat Enterprise Linux Server release 7.2 (Maipo). Seems to be a conflict with this Red Hat's release of xceres-j2.