Reply
Highlighted
Explorer
Posts: 12
Registered: ‎02-21-2019

Store data as Parquet file

[ Edited ]

I want to store data in hdfs as parquet file and then create an external table over it as parquet. 

 

i used sqoop command to pull data from Oracle into hdfs.Column4 is date field in Oracle table so i want to store it as timestamp in hive.When i ran select * from table in hue i got the error below.

 

Please help , looks like data type issue because of column 4 .If i import only first 3 fields (string) from oracle to hdfs  and create hive table using first 3 columns then it is working fine. Issue came once i added column4.

 

Step1 :

 

CREATE external TABLE IF NOT EXISTS `abc`
(
column1 STRING,
column2 STRING,
column3 STRING
column4 TIMESTAMP
)
STORED AS PARQUET
LOCATION '/data/encrypt/abc';


Step 2:

sqoop import --connect *** --username *** --password ****** --query 'select column1,column2,column3,column4 from abc where $CONDITIONS' --split-by column1 --delete-target-dir --target-dir /data/encrypt/abc --compression-codec org.apache.hadoop.io.compress.SnappyCodec --as-parquetfile

 

Hue :

 

select * from abc;

 

Error:

 

  • Bad status for request TFetchResultsReq(fetchType=0, operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, operationType=0, operationId=THandleIdentifier(secret='\xc5/w\xfd|0NL\x89\xbf\xa4)8\xaf\xa89', guid='\xaf\xca\x1b\xf0\xcd\xe3D4\xa2M\x94\xb8a\xed\xd3\x18')), orientation=4, maxRows=100): TFetchResultsResp(status=TStatus(errorCode=0, errorMessage='java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.LongWritable', sqlState=None, infoMessages=['*org.apache.hive.service.cli.HiveSQLException:java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.LongWritable:14:13', 'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:463', 'org.apache.hive.service.cli.operation.OperationManager:getOperationNextRowSet:OperationManager.java:294', 'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:769', 'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:462', 'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:691', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1553', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1538', 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', 'org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor:process:HadoopThriftAuthBridge.java:762', 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286', 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149', 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624', 'java.lang.Thread:run:Thread.java:748', '*java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.LongWritable:16:2', 'org.apache.hadoop.hive.ql.exec.FetchTask:fetch:FetchTask.java:154', 'org.apache.hadoop.hive.ql.Driver:getResults:Driver.java:2058', 'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:458', '*org.apache.hadoop.hive.ql.metadata.HiveException:java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.LongWritable:23:7', 'org.apache.hadoop.hive.ql.exec.ListSinkOperator:processOp:ListSinkOperator.java:90', 'org.apache.hadoop.hive.ql.exec.Operator:forward:Operator.java:815', 'org.apache.hadoop.hive.ql.exec.SelectOperator:processOp:SelectOperator.java:84', 'org.apache.hadoop.hive.ql.exec.Operator:forward:Operator.java:815', 'org.apache.hadoop.hive.ql.exec.TableScanOperator:processOp:TableScanOperator.java:98', 'org.apache.hadoop.hive.ql.exec.FetchOperator:pushRow:FetchOperator.java:425', 'org.apache.hadoop.hive.ql.exec.FetchOperator:pushRow:FetchOperator.java:417', 'org.apache.hadoop.hive.ql.exec.FetchTask:fetch:FetchTask.java:140', '*java.lang.UnsupportedOperationException:Cannot inspect org.apache.hadoop.io.LongWritable:28:5', 'org.apache.hadoop.hive.ql.io.parquet.serde.primitive.ParquetStringInspector:getPrimitiveJavaObject:ParquetStringInspector.java:77', 'org.apache.hadoop.hive.ql.io.parquet.serde.primitive.ParquetStringInspector:getPrimitiveJavaObject:ParquetStringInspector.java:28', 'org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils:copyToStandardObject:ObjectInspectorUtils.java:336', 'org.apache.hadoop.hive.serde2.SerDeUtils:toThriftPayload:SerDeUtils.java:167', 'org.apache.hadoop.hive.ql.exec.FetchFormatter$ThriftFormatter:convert:FetchFormatter.java:61', 'org.apache.hadoop.hive.ql.exec.ListSinkOperator:processOp:ListSinkOperator.java:87'], statusCode=3), results=None, hasMoreRows=None).
Cloudera Employee
Posts: 9
Registered: ‎09-05-2017

Re: Store data as Parquet file

Hello VJROCK,

 

When importing data from Oracle to HDFS using Sqoop, we are able to specify an option to import it as parquet file, using '--as-parquetfile', which in turn changes Timestamp/Date datatypes to Long during the import process. This explains the error [1] you are recieving on querying the table, as you are trying to store and query a Long datatype as Timestamp. 

 

To avoid this, you can import these values as String and create your table accordingly on top of it. This can be done by adding the parameter '--map-column-java' to your Sqoop command and specifying the column name along with its datatype in a comma separated manner. For example - 

 

--map-column-java id=String,value=Integer

 

This overrides mapping from SQL to Java type for configured columns. More details can be found in the Sqoop Documentation [2].

 

Regards,

Sweeta

 

[1] org.apache.hive.service.cli.HiveSQLException:java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.LongWritable:14:13

 

[2] https://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html#_controlling_type_mapping

Announcements
New solutions