Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Exercise 1 / Sqoop fails: Could not read schema

Exercise 1 / Sqoop fails: Could not read schema

New Contributor

I've got a problem. Trying to do the exercises to get to know Hadoop and Cloudera. But my VM is already failing at the first tast, when I try to execute:

 

$ sqoop import-all-tables \
    -m 1 \
    --connect jdbc:mysql://quickstart:3306/retail_db \
    --username=retail_dba \
    --password=cloudera \
    --compression-codec=snappy \
    --as-parquetfile \
    --warehouse-dir=/user/hive/warehouse \
    --hive-import

 

Then I'll get the following error message:

 

Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/01/19 08:54:16 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.5.0
16/01/19 08:54:16 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/01/19 08:54:17 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
16/01/19 08:54:17 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
16/01/19 08:54:17 WARN tool.BaseSqoopTool: It seems that you're doing hive import directly into default
16/01/19 08:54:17 WARN tool.BaseSqoopTool: hive warehouse directory which is not supported. Sqoop is
16/01/19 08:54:17 WARN tool.BaseSqoopTool: firstly importing data into separate directory and then
16/01/19 08:54:17 WARN tool.BaseSqoopTool: inserting data into hive. Please consider removing
16/01/19 08:54:17 WARN tool.BaseSqoopTool: --target-dir or --warehouse-dir into /user/hive/warehouse in
16/01/19 08:54:17 WARN tool.BaseSqoopTool: case that you will detect any issues.
16/01/19 08:54:17 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
16/01/19 08:54:18 INFO tool.CodeGenTool: Beginning code generation
16/01/19 08:54:18 INFO tool.CodeGenTool: Will generate java class as codegen_categories
16/01/19 08:54:18 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
16/01/19 08:54:18 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
16/01/19 08:54:18 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/6c71f3454000819b9873b7b398482ec4/codegen_categories.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/01/19 08:54:21 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/6c71f3454000819b9873b7b398482ec4/codegen_categories.jar
16/01/19 08:54:21 WARN manager.MySQLManager: It looks like you are importing from mysql.
16/01/19 08:54:21 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
16/01/19 08:54:21 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
16/01/19 08:54:21 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
16/01/19 08:54:21 INFO mapreduce.ImportJobBase: Beginning import of categories
16/01/19 08:54:22 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/01/19 08:54:23 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
16/01/19 08:54:23 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
16/01/19 08:54:26 INFO hive.metastore: Trying to connect to metastore with URI thrift://quickstart.cloudera:9083
16/01/19 08:54:26 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/01/19 08:54:26 INFO hive.metastore: Connected to metastore.
16/01/19 08:54:26 WARN mapreduce.DataDrivenImportJob: Target Hive table 'categories' exists! Sqoop will append data into the existing Hive table. Consider using --hive-overwrite, if you do NOT intend to do appending.
16/01/19 08:54:29 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetIOException: Could not read schema
org.kitesdk.data.DatasetIOException: Could not read schema
	at org.kitesdk.data.spi.hive.HiveUtils.descriptorForTable(HiveUtils.java:152)
	at org.kitesdk.data.spi.hive.HiveAbstractMetadataProvider.load(HiveAbstractMetadataProvider.java:104)
	at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:197)
	at org.kitesdk.data.Datasets.load(Datasets.java:108)
	at org.kitesdk.data.Datasets.load(Datasets.java:165)
	at org.kitesdk.data.Datasets.load(Datasets.java:187)
	at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:111)
	at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:130)
	at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
	at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
	at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
	at org.apache.sqoop.tool.ImportAllTablesTool.run(ImportAllTablesTool.java:111)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: java.io.FileNotFoundException: File does not exist: /user/hive/warehouse/categories/.metadata/schemas/1.avsc
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
	at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1260)
	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1245)
	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1233)
	at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:302)
	at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:268)
	at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:260)
	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1564)
	at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:308)
	at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:304)
	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:775)
	at org.kitesdk.data.spi.Schemas.open(Schemas.java:210)
	at org.kitesdk.data.spi.Schemas.fromAvsc(Schemas.java:71)
	at org.kitesdk.data.DatasetDescriptor$Builder.schemaUri(DatasetDescriptor.java:436)
	at org.kitesdk.data.spi.hive.HiveUtils.descriptorForTable(HiveUtils.java:150)
	... 18 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /user/hive/warehouse/categories/.metadata/schemas/1.avsc
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
	at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

	at org.apache.hadoop.ipc.Client.call(Client.java:1472)
	at org.apache.hadoop.ipc.Client.call(Client.java:1403)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
	at com.sun.proxy.$Proxy15.getBlockLocations(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:254)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
	at com.sun.proxy.$Proxy16.getBlockLocations(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1258)
	... 33 more

I've searched the forums, but have not found anything.

 

Help would be very apprechiated.

3 REPLIES 3

Re: Exercise 1 / Sqoop fails: Could not read schema

New Contributor

I am experiencing this issue as well. Can you please advise if this has been resolved and how?

Re: Exercise 1 / Sqoop fails: Could not read schema

New Contributor

not sure if someone is still looking for a resolution, below is what I did

 

1. recreated /user/hive/warehouse with all permission(777)

2. dropped all hive tables in the query editior and refreshed metadata

 

Happy learning !!

Re: Exercise 1 / Sqoop fails: Could not read schema

Explorer

You just need to retart the VM or delete the ld VM and start a new one.

 

I had the same problem and resolved by this way.