Member since
02-13-2018
10
Posts
0
Kudos Received
0
Solutions
06-13-2020
07:11 AM
Hi Royles, I am using hiveql for creating the table, altering the table for adding new columns. Doing all the operations like msck repair table,add partition to table everything I am doing from hiveql only.Only we are reading table from sparksql. After reading your reply,I tried to create external table,do msck repair,alter table to add new columns everything from sparksql. I got the below results 1.No results from spark when reading data from table 2.No results from hive shell when reading table 3.If I see the tblproperties,parquet schema is not matching .So there are no results from hiveql and from spark The only solution which I am following till now is(for adding new columns to external tbls) 1.Drop and create table using hiveql from hiveshell with all columns(old + new) 2.add latest partition manually which has data for all new columns added so far apart from beginning creation of table from hiveshell 3.query table from spark.Then check for tblproperties and parquet schema should be reflecting and mapped with hive columns 4.If the schema is not matching like testData in parquet is reflecting as testdata in hive tblproperties then we will get null values form spark 5.If both the schemas are matching,then we can see results from spark 4.then do msck repair which is giving me results in both spark 2.2 and 2.3 But I feel there must be some other way of adding new columns instead of dropping table and recreating it.
... View more
06-12-2019
11:31 PM
@Yeseswini Hive's VIEW is ready only, please see below doc: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/AlterView >> Views are read-only and may not be used as the target of LOAD/INSERT/ALTER. Insert or update is not supported. Thanks Eric
... View more
06-03-2019
01:46 PM
from spark or pyspark shell use the below commands to access hive database objects. spark.sql("show databases;") spark.sql("select * from databasename.tablename;") or spark.read.table("databasename.tablename") You can give any query inside spark.sql which will give you results.
... View more
02-07-2019
06:21 AM
Hi All,
Daily we will follow the below steps for full load
Step 1 : Reading data from parquet files into pyspark and do transformations and save data as parquet files with partitions and repartitions
Step2 : Dataframe will be saved as new external table in hive(with some temporary name and it has complex datatypes)
Step3 : Create or alter view in hive
Step4 : The view will be used by tableau with impala and by pyspark for some other jobs.
Problem is impala cannot read complex datatypes,so view will throw error.If we create view in impala then it ignores complex datatypes and creates view so we cannot read complex datatypes from spark.
We have solutions like creating 2 views in hive and impala on the same table or rename newly created external table with original table name.Currently we are renaming the table.Can someone suggest if there is any better approach for this.
Thanks in advance
... View more
Labels:
02-22-2018
01:19 AM
Thank You.It worked.
... View more