- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
SPARK Dataframe and IMPALA CREATE TABLE issue
- Labels:
-
Apache Impala
-
Apache Spark
Created on ‎06-06-2017 12:21 AM - edited ‎09-16-2022 04:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All, using spakr 1.6.1 to store data into IMPALA (read works without issues),
getting exception with table creation..when executed as below.
joined.write().mode(SaveMode.Overwrite).jdbc(DB_CONNECTION, DB_TABLE3, props);
Could anyone help on data type converion from TEXT to String and DOUBLE PRECISION to Double .
Created ‎06-13-2017 07:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Msdhan,
What's the schema and fileformat of the Impala table? Why not write the data directly and avoid a jdbc connection to impala?
Jason
Created ‎06-13-2017 11:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the reply, The peace of code is mentioned below.
DataFrame right = sqlContext.read().jdbc(DB_CONNECTION, "testDB.tab2", props);
DataFrame joined = sqlContext.read().jdbc(DB_CONNECTION, "testDB.tab1", props).join(right, "id");
joined.write().jdbc(DB_CONNECTION, DB_TABLE3, props);
Its default file comma delimited format. Please find the full exception is mentioned below.
Exception in thread "main" java.sql.SQLException: [Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:AnalysisException: Syntax error in line 1:
....tab3 (id INTEGER , col_1 TEXT , col_2 DOUBLE PRECISIO...
^
Encountered: IDENTIFIER
Expected: ARRAY, BIGINT, BINARY, BOOLEAN, CHAR, DATE, DATETIME, DECIMAL, REAL, FLOAT, INTEGER, MAP, SMALLINT, STRING, STRUCT, TIMESTAMP, TINYINT, VARCHAR
CAUSED BY: Exception: Syntax error
), Query: CREATE TABLE testDB.tab3 (id INTEGER , col_1 TEXT , col_2 DOUBLE PRECISION , col_3 TIMESTAMP , col_11 TEXT , col_22 DOUBLE PRECISION , col_33 TIMESTAMP ).
at com.cloudera.hivecommon.api.HS2Client.executeStatementInternal(Unknown Source)
at com.cloudera.hivecommon.api.HS2Client.executeStatement(Unknown Source)
at com.cloudera.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.executeHelper(Unknown Source)
at com.cloudera.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.execute(Unknown Source)
at com.cloudera.jdbc.common.SStatement.executeNoParams(Unknown Source)
at com.cloudera.jdbc.common.SStatement.executeUpdate(Unknown Source)
at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:302)
Caused by: com.cloudera.support.exceptions.GeneralException: [Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:AnalysisException: Syntax error in line 1:
....tab3 (id INTEGER , col_1 TEXT , col_2 DOUBLE PRECISIO...
^
Encountered: IDENTIFIER
Expected: ARRAY, BIGINT, BINARY, BOOLEAN, CHAR, DATE, DATETIME, DECIMAL, REAL, FLOAT, INTEGER, MAP, SMALLINT, STRING, STRUCT, TIMESTAMP, TINYINT, VARCHAR
CAUSED BY: Exception: Syntax error
), Query: CREATE TABLE testDB.tab3 (id INTEGER , col_1 TEXT , col_2 DOUBLE PRECISION , col_3 TIMESTAMP , col_11 TEXT , col_22 DOUBLE PRECISION , col_33 TIMESTAMP ).
... 7 more
Created ‎06-13-2017 11:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎06-14-2017 06:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Actually trying in java, as below
DataFrame updated = joined.selectExpr("id", "cast(col_1 as STRING) col_1", "cast(col_2 as DOUBLE) col_2", "cast(col_11 as STRING) col_11", "cast(col_22 as DOUBLE) col_22" );
updated.write().jdbc(DB_CONNECTION, DB_TABLE3, props);
Still shows the same error, any issue over here ?
Created ‎06-15-2017 06:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Why are you trying to connect to Impala via JDBC and write the data? You can write the data directly to the storage through Spark and still access through Impala after calling "refresh <table>" in impala. This will avoid the issues you are having and should be more performant.
Created ‎06-15-2017 08:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks for the suggession, will try this.
Is there any way to avoid the above error?
Created ‎06-16-2017 12:24 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎02-13-2018 11:13 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How to integrate impala and spark using scala?
