Welcome to the Cloudera Community

Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

Exercise 1 / Sqoop fails: Could not read schema

avatar
Explorer

I've got a problem. Trying to do the exercises to get to know Hadoop and Cloudera. But my VM is already failing at the first tast, when I try to execute:

 

$ sqoop import-all-tables \
    -m 1 \
    --connect jdbc:mysql://quickstart:3306/retail_db \
    --username=retail_dba \
    --password=cloudera \
    --compression-codec=snappy \
    --as-parquetfile \
    --warehouse-dir=/user/hive/warehouse \
    --hive-import

 

Then I'll get the following error message:

 

Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/01/19 08:54:16 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.5.0
16/01/19 08:54:16 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/01/19 08:54:17 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
16/01/19 08:54:17 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
16/01/19 08:54:17 WARN tool.BaseSqoopTool: It seems that you're doing hive import directly into default
16/01/19 08:54:17 WARN tool.BaseSqoopTool: hive warehouse directory which is not supported. Sqoop is
16/01/19 08:54:17 WARN tool.BaseSqoopTool: firstly importing data into separate directory and then
16/01/19 08:54:17 WARN tool.BaseSqoopTool: inserting data into hive. Please consider removing
16/01/19 08:54:17 WARN tool.BaseSqoopTool: --target-dir or --warehouse-dir into /user/hive/warehouse in
16/01/19 08:54:17 WARN tool.BaseSqoopTool: case that you will detect any issues.
16/01/19 08:54:17 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
16/01/19 08:54:18 INFO tool.CodeGenTool: Beginning code generation
16/01/19 08:54:18 INFO tool.CodeGenTool: Will generate java class as codegen_categories
16/01/19 08:54:18 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
16/01/19 08:54:18 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
16/01/19 08:54:18 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/6c71f3454000819b9873b7b398482ec4/codegen_categories.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/01/19 08:54:21 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/6c71f3454000819b9873b7b398482ec4/codegen_categories.jar
16/01/19 08:54:21 WARN manager.MySQLManager: It looks like you are importing from mysql.
16/01/19 08:54:21 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
16/01/19 08:54:21 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
16/01/19 08:54:21 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
16/01/19 08:54:21 INFO mapreduce.ImportJobBase: Beginning import of categories
16/01/19 08:54:22 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/01/19 08:54:23 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
16/01/19 08:54:23 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
16/01/19 08:54:26 INFO hive.metastore: Trying to connect to metastore with URI thrift://quickstart.cloudera:9083
16/01/19 08:54:26 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/01/19 08:54:26 INFO hive.metastore: Connected to metastore.
16/01/19 08:54:26 WARN mapreduce.DataDrivenImportJob: Target Hive table 'categories' exists! Sqoop will append data into the existing Hive table. Consider using --hive-overwrite, if you do NOT intend to do appending.
16/01/19 08:54:29 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetIOException: Could not read schema
org.kitesdk.data.DatasetIOException: Could not read schema
	at org.kitesdk.data.spi.hive.HiveUtils.descriptorForTable(HiveUtils.java:152)
	at org.kitesdk.data.spi.hive.HiveAbstractMetadataProvider.load(HiveAbstractMetadataProvider.java:104)
	at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:197)
	at org.kitesdk.data.Datasets.load(Datasets.java:108)
	at org.kitesdk.data.Datasets.load(Datasets.java:165)
	at org.kitesdk.data.Datasets.load(Datasets.java:187)
	at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:111)
	at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:130)
	at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
	at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
	at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
	at org.apache.sqoop.tool.ImportAllTablesTool.run(ImportAllTablesTool.java:111)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: java.io.FileNotFoundException: File does not exist: /user/hive/warehouse/categories/.metadata/schemas/1.avsc
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
	at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1260)
	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1245)
	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1233)
	at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:302)
	at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:268)
	at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:260)
	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1564)
	at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:308)
	at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:304)
	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:775)
	at org.kitesdk.data.spi.Schemas.open(Schemas.java:210)
	at org.kitesdk.data.spi.Schemas.fromAvsc(Schemas.java:71)
	at org.kitesdk.data.DatasetDescriptor$Builder.schemaUri(DatasetDescriptor.java:436)
	at org.kitesdk.data.spi.hive.HiveUtils.descriptorForTable(HiveUtils.java:150)
	... 18 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /user/hive/warehouse/categories/.metadata/schemas/1.avsc
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
	at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

	at org.apache.hadoop.ipc.Client.call(Client.java:1472)
	at org.apache.hadoop.ipc.Client.call(Client.java:1403)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
	at com.sun.proxy.$Proxy15.getBlockLocations(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:254)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
	at com.sun.proxy.$Proxy16.getBlockLocations(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1258)
	... 33 more

I've searched the forums, but have not found anything.

 

Help would be very apprechiated.

Who agreed with this topic