Member since
09-16-2021
291
Posts
39
Kudos Received
21
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
609 | 09-10-2024 07:50 AM | |
377 | 09-04-2024 05:35 AM | |
1222 | 08-28-2024 12:40 AM | |
940 | 02-09-2024 04:31 AM | |
4154 | 11-06-2023 03:10 AM |
07-15-2023
12:42 AM
@Sunanna Validate the job status using below command. hadoop job -status <hadoop_job_id>
yarn application -status <hadoop_application_id> Depends upon the status validate the logs using below , If needed validate the Jstack of the child tasks for better understanding. yarn logs -applicationId <applicationId>
... View more
07-14-2023
02:39 AM
If my understanding is correct, the schema is altered for different input files, which implies that the data itself lacks a structured schema. Given the frequent changes in the schema, it is advisable to store the data in a column-oriented system such as HBASE. The Same HBASE data can be accessed through spark using HBase-Spark Connector. Ref - https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/accessing-hbase/topics/hbase-example-using-hbase-spark-connector.html
... View more
06-16-2023
12:04 AM
Check the possibility of using hive managed table. As part of hive managed tables, you won't require separate merge job , as hive compaction takes care by default if compaction is enabled. You can access managed tables through HWC (Hive Warehouse Connector) from Spark.
... View more
06-15-2023
11:47 PM
@Abdul_ As of now hive won't support row delimiter other new line character . Attaching the corresponding Jira for reference HIVE-11996 As a workaround, Recommend to update the input file using external libraries like awk,...etc and upload the input file in the corresponding FileSystem location to read. Eg - Through AWK [root@c2757-node2 ~]# awk -F "\",\"" 'NF < 3 {getline nextline; $0 = $0 nextline} 1' sample_case.txt
"IM43163","SOUTH,OFC","10-Jan-23"
"IM41763","John:comment added","12-Jan-23"
[root@c2757-node2 ~]# awk -F "\",\"" 'NF < 3 {getline nextline; $0 = $0 nextline} 1' sample_case.txt > sample_text.csv Reading from Hive Table 0: jdbc:hive2://c2757-node2.coelab.cloudera.c> select * from table1;
.
.
.
INFO : Executing command(queryId=hive_20230616064136_333ff98d-636b-43b1-898d-fca66031fe7f): select * from table1
INFO : Completed executing command(queryId=hive_20230616064136_333ff98d-636b-43b1-898d-fca66031fe7f); Time taken: 0.023 seconds
INFO : OK
+---------------+---------------------+---------------+
| table1.col_1 | table1.col_2 | table1.col_3 |
+---------------+---------------------+---------------+
| IM43163 | SOUTH,OFC | 10-Jan-23 |
| IM41763 | John:comment added | 12-Jan-23 |
+---------------+---------------------+---------------+
2 rows selected (1.864 seconds)
... View more
06-15-2023
04:26 AM
Thank you for your advise. We will investigate proposed solution with spark-xml Best regards
... View more
06-13-2023
12:12 AM
Output of below to identify the exact ouptut records details, explain formatted <query> explain extended <query> explain analyze <query>
... View more
06-09-2023
11:59 AM
@Josh2023 Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
... View more