Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

SaveasTable method with dataframe to write data onto Hive is not working

SaveasTable method with dataframe to write data onto Hive is not working

New Contributor

Hi,

I am using Spark (scala) to create a table in Hive based on an Avro file. The avro file is with a number of columns with data type String, with around 5 rows of data. When I try to save this data into Hive using a dataframe & JDBC connection (my hive environment is on a remote machine), I get a strange error. The CREATE table command generated by SaveAsTable method is appending "" to each column name and also converting STRING datatype as TEXT. This is causing errors in creating the table. Please find below the exact script that I am using along with the error message.

 

I have tried with Impala JDBC drivers and I get the same error. Not sure what is wrong with the code. Could you please guide me? 

 

Script:

object HiveDataLoad {
def main(args: Array[String]) {

val spark = SparkSession
.builder()
.appName("HiveDataLoad")
.config("spark.master", "local[*]")
.getOrCreate()

val avroDF = spark.read.format("avro").load("DataFolder/applications.avro")

avroDF.write
.format("jdbc")
.option("driver","com.cloudera.hive.jdbc41.HS2Driver")
.option("url", "jdbc:hive2://localhost:10000/sparktest")
.option("dbtable", "applications")
.option("user", "******")
.option("password", "******")
.mode("append")
.saveAsTable("application")

spark.stop()
}
}

Error:

Exception in thread "main" java.sql.SQLException: [Cloudera][HiveJDBCDriver](500051) ERROR processing query/statement. Error Code: 40000, SQL state: TStatus(statusCode:ERROR_STATUS, infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error while compiling statement: FAILED: ParseException line 1:27 cannot recognize input near '"CODE"' 'TEXT' ',' in column specification:28:27, org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:400, org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:187, org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:271, org.apache.hive.service.cli.operation.Operation:run:Operation.java:337, org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:439, org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementAsync:HiveSessionImpl.java:416, sun.reflect.GeneratedMethodAccessor21:invoke::-1, sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43, java.lang.reflect.Method:invoke:Method.java:606, org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78, org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36, org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63, java.security.AccessController:doPrivileged:AccessController.java:-2, javax.security.auth.Subject:doAs:Subject.java:415, org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1917, org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59, com.sun.proxy.$Proxy26:executeStatementAsync::-1, org.apache.hive.service.cli.CLIService:executeStatementAsync:CLIService.java:282, org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:501, org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1313, org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1298, org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39, org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39, org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56, org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286, java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145, java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615, java.lang.Thread:run:Thread.java:745, *org.apache.hadoop.hive.ql.parse.ParseException:line 1:27 cannot recognize input near '"CODE"' 'TEXT' ',' in column specification:32:5, org.apache.hadoop.hive.ql.parse.ParseDriver:parse:ParseDriver.java:204, org.apache.hadoop.hive.ql.parse.ParseDriver:parse:ParseDriver.java:166, org.apache.hadoop.hive.ql.Driver:compile:Driver.java:522, org.apache.hadoop.hive.ql.Driver:compileInternal:Driver.java:1356, org.apache.hadoop.hive.ql.Driver:compileAndRespond:Driver.java:1343, org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:185], sqlState:42000, errorCode:40000, errorMessage:Error while compiling statement: FAILED: ParseException line 1:27 cannot recognize input near '"CODE"' 'TEXT' ',' in column specification), Query: CREATE TABLE applications ("CODE" TEXT , "NAME" TEXT , "DESCRIPTION" TEXT , "ICON_NAME" TEXT , "STATUS" TEXT , "CONNECT_STRING" TEXT , "CREATED_BY" TEXT , "DATE_CREATED" TEXT , "MODIFIED_BY" TEXT , "DATE_MODIFIED" TEXT ).