Created on 05-06-2016 10:19 AM - edited 08-19-2019 01:59 AM
Stack : Installed HDP-2.3.2.0-2950 using Ambari 2.1
I had imported a table from ms sql server to HDFS using Sqoop, there I had faced permission issues but I could proceed with the help of the community. The corresponding files on HDFS are as shown in the image :
My objectives are :
Accordingly, I tried to create the table :
CREATE EXTERNAL TABLE DimSampleDesc ( SmapiName_ver varchar(30) ,SmapiColName varchar(35),ChartType int,X_Indexet int,Y_Indexet int,X_Tick varchar(20),Y_Tick varchar(20),X_TickRange varchar(40),X_TickRangeFrom varchar(20),X_TickRangeTom varchar(20),Y_TickRange varchar(40),Y_TickRangeFrom varchar(20),Y_TickRangeTom varchar(20),IndexCount int,X_IndexCount int,Y_IndexCount int,X_Symbol varchar(10),X_SymbolName varchar(40),X_SymbolDescr varchar(40),Y_Symbol varchar(10),Y_SymbolName varchar(40),Y_SymbolDescr varchar(40),SmapiName varchar(30),Incorrect_Ver_FL boolean ) COMMENT 'EXTERNAL, ORC table' STORED AS ORC LOCATION '/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc';
The error made me question about my basics about 'location' - I assumed it is the 'read/source' location but now Hive is trying to write there(I think so!). If this is the case, how do I specify the source location for the external table ?
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.security.AccessControlException: Permission denied: user=hive, access=WRITE, inode="/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc":sqoop:hdfs:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:219) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1771) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1755) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPathAccess(FSDirectory.java:1729) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAccess(FSNamesystem.java:8285) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkAccess(NameNodeRpcServer.java:1914) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.checkAccess(ClientNamenodeProtocolServerSideTranslatorPB.java:1443) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2133) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2131)
Created 05-06-2016 11:27 AM
Keeping imported table as-is and transforming it (ORC, Parquet, etc) is the preferred way. You keep your data, no need to import again if something unplanned happens, and once you have decided how to handle it you can drop the imported table. (You can also import directly to Hive as ORC, there is a guide how to do that on HCC). Regarding the second question, my example was with a managed table (stored at /apps/hive/warehouse), if you wish to store tables elsewhere you can create another external table, provide the location, and write into it.
Created 05-06-2016 10:53 AM
Your location is okay, but the format of the files is not. It looks like you imported your table as avro files. So, in your declaration you have to say "STORED AS AVRO", not "STORED AS ORC". Once that suceeds you can first test your table by selecting some rows, SELECT COUNT(*) etc, and then create your ORC table by saying in Hive, for example: "CREATE TABLE DimSampleDesc_orc STORED AS ORC AS SELECT * from DimSampleDesc". You also have permission issues, the owner of the file is sqoop but user hive needs write permission to create external table. There are several way to resolve this, the fastest will be to give write permissions to everyone:
hdfs dfs chmod -R a+w /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc
However, you better think how to handle permissions in you system, for example you can run sqoop etc. as an end user (like user1), add user1 to hadoop group, and give write permissions to the group (g+w). That will work because user hive also belongs to group hadoop. Or you can use Ranger to manage permissions.
Created on 05-06-2016 11:08 AM - edited 08-19-2019 01:59 AM
I partially understood the solution that you are suggesting :
Created 05-06-2016 11:25 AM
Another update : The empty table is getting created but data load is failing : hive> > > CREATE EXTERNAL TABLE DimSampleDesc ( SmapiName_ver varchar(30) ,SmapiColName varchar(35),ChartType int,X_Indexet int,Y_Indexet int,X_Tick varchar(20),Y_Tick varchar(20),X_TickRange varchar(40),X_TickRangeFrom varchar(20),X_TickRangeTom varchar(20),Y_TickRange varchar(40),Y_TickRangeFrom varchar(20),Y_TickRangeTom varchar(20),IndexCount int,X_IndexCount int,Y_IndexCount int,X_Symbol varchar(10),X_SymbolName varchar(40),X_SymbolDescr varchar(40),Y_Symbol varchar(10),Y_SymbolName varchar(40),Y_SymbolDescr varchar(40),SmapiName varchar(30),Incorrect_Ver_FL boolean ) COMMENT 'EXTERNAL, ORC table' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS ORC LOCATION '/datastore/hdfstohive/reio/odpdw/may2016'; OK Time taken: 0.187 seconds hive> > select * from DimSampleDesc; OK Time taken: 0.367 seconds hive> hive> > LOAD DATA INPATH '/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/*.avro' INTO TABLE DimSampleDesc; Loading data to table odp_dw_may2016.dimsampledesc Failed with exception Unable to move source hdfs://l1031lab.sss.se..com:8020/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00000.avro to destination hdfs://l1031lab.sss.se.com:8020/datastore/hdfstohive/reio/odpdw/may2016/part-m-00000.avro FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask hive> >
Created 05-06-2016 11:27 AM
Keeping imported table as-is and transforming it (ORC, Parquet, etc) is the preferred way. You keep your data, no need to import again if something unplanned happens, and once you have decided how to handle it you can drop the imported table. (You can also import directly to Hive as ORC, there is a guide how to do that on HCC). Regarding the second question, my example was with a managed table (stored at /apps/hive/warehouse), if you wish to store tables elsewhere you can create another external table, provide the location, and write into it.
Created 05-06-2016 12:49 PM
Try this
-- Imported table CREATE EXTERNAL TABLE DimSampleDesc_avro(SmapiName_ver varchar(30),...) STORED AS AVRO LOCATION '/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc'; -- External ORC table CREATE EXTERNAL TABLE DimSampleDesc(SmapiName_ver varchar(30), ...) STORED AS ORC LOCATION '/datastore/hdfstohive/reio/odpdw/may2016'; INSERT OVERWRITE TABLE DimSampleDesc SELECT * FROM DimSampleDesc_avro;
Created 05-06-2016 03:18 PM
@Predag Minovic
Couldn't comment on the post despite several attempts, hence, this 'answer'
Changed the permissions for Hive to have write access
[sqoop@l1038lab root]$ hadoop fs -chmod -R a+w /dataload/[sqoop@l1038lab root]$
The table gets created but all fields are NULL :
hive> > > CREATE EXTERNAL TABLE DimSampleDesc ( SmapiName_ver varchar(30) ,SmapiColName varchar(35),ChartType int,X_Indexet int,Y_Indexet int,X_Tick varchar(20),Y_Tick varchar(20),X_TickRange varchar(40),X_TickRangeFrom varchar(20),X_TickRangeTom varchar(20),Y_TickRange varchar(40),Y_TickRangeFrom varchar(20),Y_TickRangeTom varchar(20),IndexCount int,X_IndexCount int,Y_IndexCount int,X_Symbol varchar(10),X_SymbolName varchar(40),X_SymbolDescr varchar(40),Y_Symbol varchar(10),Y_SymbolName varchar(40),Y_SymbolDescr varchar(40),SmapiName varchar(30),Incorrect_Ver_FL boolean ) COMMENT 'EXTERNAL, ORC table' STORED AS Avro LOCATION '/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/';OKTime taken: 0.206 secondshive>
hive> > select * from DimSampleDesc limit 10 ;OKNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLTime taken: 0.099 seconds, Fetched: 10 row(s)hive> >
Even providing all options doesn't help :
hive> > > drop table DimSampleDesc;OKTime taken: 0.422 secondshive> > > CREATE EXTERNAL TABLE DimSampleDesc ( SmapiName_ver varchar(30) ,SmapiColName varchar(35),ChartType int,X_Indexet int,Y_Indexet int,X_Tick varchar(20),Y_Tick varchar(20),X_TickRange varchar(40),X_TickRangeFrom varchar(20),X_TickRangeTom varchar(20),Y_TickRange varchar(40),Y_TickRangeFrom varchar(20),Y_TickRangeTom varchar(20),IndexCount int,X_IndexCount int,Y_IndexCount int,X_Symbol varchar(10),X_SymbolName varchar(40),X_SymbolDescr varchar(40),Y_Symbol varchar(10),Y_SymbolName varchar(40),Y_SymbolDescr varchar(40),SmapiName varchar(30),Incorrect_Ver_FL boolean ) COMMENT 'EXTERNAL, AVRO table' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/';OKTime taken: 0.169 secondshive> > select * from DimSampleDesc limit 10 ;OKNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLTime taken: 0.107 seconds, Fetched: 10 row(s)hive>
This NULL is bothering me - I didn't get any errors in Sqoop while importing the file onto HDFS. Is there an issue - in data import in the first place or this Hive table creation ?
Created 05-06-2016 03:34 PM
Yeah, doesn't look good, can you try to "get" one avro file to your local files system and check is it correct. You can use Avro Tools. Or if Avro is not a must, import the table as text file, and explore the Avro path later.
Created 05-09-2016 08:21 AM
Yeah even I share your opinion.
I am uncomfortable with installing any new tools on the cluster machines, I did find jars like /usr/hdp/2.3.2.0-2950/sqoop/lib/avro-mapred-1.7.5-hadoop2.jar and /usr/hdp/2.3.2.0-2950/pig/lib/avro-tools-1.7.5-nodeps.jar but I am unable to find the classes/commands to be used merely to view the avro files already present on HDFS.
Created 05-09-2016 09:46 AM
In your Avro table declaration in Hive, can you try to replace VARCHAR(n) with STRING? I just tried to import as Avro one of my tables from Mysql with 3 columns: int, varchar(16) and timestamp. When I declared my varchar(16) as string in Hive it worked, when I used varchar(16) in Hive I got an error when I tried "SELECT *" and it failed. Also, on import, Sqoop creates an .avsc file in your current directory, you can inspect that file to find out what types Sqoop used when creating Avro files.