Member since
05-01-2016
6
Posts
0
Kudos Received
0
Solutions
11-06-2018
11:13 AM
We have pyspark program which ingests source files into hdfs and then load them into other hive tables.
Program works fine on edgenode as source files are local in edgenode.Can we schedule this pyspark program through oozie with spark2-submit --deploy-mode client?
... View more
Labels:
10-04-2018
11:47 AM
I am excuting create table using json serde "org.apache.hive.hcatalog.data.JsonSerDe" in spark.sql on spark version 2.1.0.cloudera2 I am getting below error java.lang.ClassNotFoundException Class org.apache.hive.hcatalog.data.JsonSerDe not found while executing spark2-submit & pyspark2 i am using below --jars --jars /opt/cloudera/parcels/CDH-5.13.3-1.cdh5.13.3.p0.2/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar I also tried set("sparks.jars","/opt/cloudera/parcels/CDH-5.13.3-1.cdh5.13.3.p0.2/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar") inside my code . Nothing works
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
09-19-2018
06:13 AM
Getting below error after trying to query hive tables in impala InvalidStorageDescriptorException: Impala does not support tables of this type. REASON: SerDe library 'org.apache.hive.hcatalog.data.JsonSerDe' is not supported. I able to query after converting table to parquet format. Is there any way to read json files in Impala?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
05-15-2017
06:14 PM
Does Falcon Mirror Provides below requirements?
Easily configurable to allow for multiple datasets
(Some datasets are more than 1TB in size)
Support multiple streams of data processing
Includes alerting for job failures
Includes validation that data has been copied
successfully. (Data should reconcile back to source)
Support full and delta copies
Support dynamic changes in the structure on source
without requiring manual intervention on target
Target data should always be available even during
refresh/sync
Need optimal data transfer speeds due to the volume of
data
Ideal solution would include near real time data sync
(15 minutes)
... View more
Labels:
- Labels:
-
Apache Falcon