Member since
09-16-2021
280
Posts
31
Kudos Received
19
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
450 | 08-28-2024 12:40 AM | |
811 | 02-09-2024 04:31 AM | |
3509 | 11-06-2023 03:10 AM | |
620 | 10-30-2023 07:17 AM | |
1963 | 10-27-2023 12:07 AM |
09-27-2023
07:43 AM
if the partition data exists like below: <s3:bucket>/<some_location>/<part_column>=<part_value>/<filename> you can create a external table by specifiying above location and run 'msck repair table <table_name> sync partitions' to sync partitions. validate the data by running some sample select statements. Once it's done you can create new external table with another bucket and run insert statement with dynamic partition. Ref - https://cwiki.apache.org/confluence/display/hive/dynamicpartitions
... View more
09-05-2023
01:50 PM
Hey @Shivakuk Circling back to see if my response was helpful. I am happy to help you if you have followup questions. Thanks!
... View more
07-20-2023
03:10 AM
We verified the same in the CDP environment, as we are uncertain about the Databricks Spark environment. As we have mixed of managed and external tables , extracted the necessary information through HWC. >>> database=spark.sql("show tables in default").collect()
23/07/20 10:04:45 INFO rule.HWCSwitchRule: Registering Listeners
23/07/20 10:04:47 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
Hive Session ID = e6f70006-0c2e-4237-9a9e-e1d19901af54
>>> desiredColumn="name"
>>> tablenames = []
>>> for row in database:
... cols = spark.table(row.tableName).columns
... listColumns= spark.table(row.tableName).columns
... if desiredColumn in listColumns:
... tablenames.append(row.tableName)
...
>>>
>>> print("\n".join(tablenames))
movies
tv_series_abc
cdp1
tv_series
spark_array_string_example
>>>
... View more
07-19-2023
08:21 PM
Hi ggangadharan, Thanks for you feedback. Maybe I need to explain more detail on my case above. Both solution you give more to sqoop/import the data using the JDBC. My current situation, the source system not give the permission to access their db. They will export the file and save as .sql file. I need to download the file and load into hive table.
... View more
07-18-2023
07:27 AM
@Choolake, Thank you for your participation in Cloudera Community. I'm happy to see you resolved your issue. Please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
... View more
07-15-2023
12:42 AM
@Sunanna Validate the job status using below command. hadoop job -status <hadoop_job_id>
yarn application -status <hadoop_application_id> Depends upon the status validate the logs using below , If needed validate the Jstack of the child tasks for better understanding. yarn logs -applicationId <applicationId>
... View more
07-14-2023
02:39 AM
If my understanding is correct, the schema is altered for different input files, which implies that the data itself lacks a structured schema. Given the frequent changes in the schema, it is advisable to store the data in a column-oriented system such as HBASE. The Same HBASE data can be accessed through spark using HBase-Spark Connector. Ref - https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/accessing-hbase/topics/hbase-example-using-hbase-spark-connector.html
... View more
06-16-2023
12:04 AM
Check the possibility of using hive managed table. As part of hive managed tables, you won't require separate merge job , as hive compaction takes care by default if compaction is enabled. You can access managed tables through HWC (Hive Warehouse Connector) from Spark.
... View more