Member since
11-14-2017
3
Posts
0
Kudos Received
0
Solutions
12-17-2017
04:26 AM
Two expose data to tableau I have to create external tables from spark sql which points to partitioned parquet file directory which look like below.
data
/key=2017_12_15
/gender=male
/gender=female
*.parquet
/key=2017_12_16
/gender=male
/gender=female
*.parquet Now i have to create three external table one which have all data, one which only have male data and third with female data. I tried below create commands but it's not working. I think issue is with * . Can someone please let me know how can i achieve it through spark-sql. Just for every one information * for with sqlContext object but I cannot use it as it don't have feature to create external table. CREATE EXTERNAL TABLE IF NOT EXISTS all (name STRING, address STRING, date DATE)
STORED AS PARQUET
LOCATION 'data/key=*'; CREATE EXTERNAL TABLE IF NOT EXISTS male (name STRING, address STRING, date DATE)
STORED AS PARQUET
LOCATION 'data/key=*/gender=male';
CREATE EXTERNAL TABLE IF NOT EXISTS female (name STRING, address STRING, date DATE)
STORED AS PARQUET
LOCATION 'data/key=*/gender=female';
... View more
Labels:
- Labels:
-
Apache Spark
11-19-2017
09:53 AM
Since i have to use streaming data later for spark batch job will it be better to store it in cassandra or hive? Also in hive will it give me good performance If i have to do upsert instead of just insert.
... View more
11-19-2017
07:55 AM
I have requirement where we will be listening to kafka in spark stream and this data has to be used by other spark batch job. So we need to upsert (insert if not exists if exists update it) this data to some db which is easily queryable by other spark batch job. Please suggest if I should use cassandra or some hadoop based db like hbase?
... View more
Labels:
- Labels:
-
Apache Spark