Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Sqoop import into Hive as Parquet fails for decimal type

Highlighted

Sqoop import into Hive as Parquet fails for decimal type

New Contributor

Hello,

I am trying to import a table from MS SQL server into Hive as Parquet, and one of the columns is a decimal type. By default, Sqoop would change the type for the decimal to a double, but unfortunately that is causing precision issues for some of our calculations.

Right now, I am getting the following error running in a HDP 2.4 sandbox:

Import command:

[root@sandbox sqoop]# sqoop import -Dsqoop.avro.logical_types.decimal.enable=true --hive-import --num-mappers 1 --connect "jdbc:sqlserver://<conn_string>" --username uname --password pass --hive-overwrite --hive-database default --table SqoopDecimalTest --driver com.microsoft.sqlserver.jdbc.SQLServerDriver --null-string '\\N' --as-parquetfile

Error: org.kitesdk.data.DatasetOperationException: Failed to append {"id": 1, "price": 19.123450} to ParquetAppender{path=hdfs://sandbox.hortonworks.com:8020/tmp/default/.temp/job_1514513583437_0001/mr/attempt_1514513583437_0001_m_000000_0/.6b8d110f-6d1a-450c-93e4-c3db1a421476.parquet.tmp, schema={"type":"record","name":"SqoopDecimalTest","doc":"Sqoop import of SqoopDecimalTest","fields":[{"name":"id","type":["null","int"],"default":null,"columnName":"id","sqlType":"4"},{"name":"price","type":["null",{"type":"bytes","logicalType":"decimal","precision":19,"scale":6}],"default":null,"columnName":"price","sqlType":"3"}],"tableName":"SqoopDecimalTest"}, fileSystem=DFS[DFSClient[clientName=DFSClient_attempt_1514513583437_0001_m_000000_0_1859161154_1, ugi=root (auth:SIMPLE)]], avroParquetWriter=org.apache.parquet.avro.AvroParquetWriter@f60f96b} at org.kitesdk.data.spi.filesystem.FileSystemWriter.write(FileSystemWriter.java:194) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$DatasetRecordWriter.write(DatasetKeyOutputFormat.java:326) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$DatasetRecordWriter.write(DatasetKeyOutputFormat.java:305) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:658) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.sqoop.mapreduce.ParquetImportMapper.map(ParquetImportMapper.java:70) at org.apache.sqoop.mapreduce.ParquetImportMapper.map(ParquetImportMapper.java:39) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

Caused by: java.lang.ClassCastException: java.math.BigDecimal cannot be cast to java.nio.ByteBuffer at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:257) at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167) at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142) at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121) at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:288) at org.kitesdk.data.spi.filesystem.ParquetAppender.append(ParquetAppender.java:74) at org.kitesdk.data.spi.filesystem.ParquetAppender.append(ParquetAppender.java:35) at org.kitesdk.data.spi.filesystem.FileSystemWriter.write(FileSystemWriter.java:188)

I am running Sqoop v.1.4.7 built against Kite v. 1.1.1-SNAPSHOT (the master branch) because I noticed that the current version 1.0.0 uses parquet-avro 1.6.0, so I thought using parquet-avro 1.8.1 might help. I get the error in both versions.

Does anyone know what might be wrong? Or, is the answer that this is simply not supported in Sqoop? Any ideas would be greatly appreciated!

Thank you,

Subhash

2 REPLIES 2
Highlighted

Re: Sqoop import into Hive as Parquet fails for decimal type

New Contributor

Hello @subhash_sriram,

I encounter the same issue. Did you find the solution? 

Re: Sqoop import into Hive as Parquet fails for decimal type

Community Manager

@ou As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. 

 


Vidya Sargur, Community Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Don't have an account?
Coming from Hortonworks? Activate your account here