Member since
05-29-2023
8
Posts
1
Kudos Received
0
Solutions
10-22-2024
05:06 AM
hi @krishna2023 according to the documentation: https://docs.cloudera.com/documentation/other/connectors/impala-jdbc/2-6-23/Cloudera-JDBC-Driver-for-Impala-Install-Guide.pdf try to create the JAAS login file and check if the connection will occur.
... View more
03-04-2024
04:20 AM
1 Kudo
@krishna2023 It seems like you're encountering issues with loading data into partitions in Impala after executing the provided steps. Create table as select * from db2.t1 where 1=2: This step creates an empty table db1.t1 based on the schema of db2.t1 without any data. Ensure that the table schema matches between db1.t1 and db2.t1. Alter table set location: After creating the empty table, you're altering its location to a new path. Make sure that the specified path exists and has the necessary permissions for Impala to read and write data. Add partition for every day: Adding partitions should involve specifying the loading date for each partition and its corresponding HDFS directory path. Double-check that the HDFS directory paths specified in each partition definition are correct and accessible by Impala. Refresh table: The REFRESH command updates the metadata of the table to reflect changes made in the underlying data directory. After adding partitions, running REFRESH is necessary to inform Impala about the new partitions. Make sure to execute this command after adding partitions. Compute stats: The COMPUTE STATS command gathers statistics about the table, which helps Impala optimize query execution. While this command is not directly related to loading data into partitions, it's good practice to run it after making significant changes to the table. To further troubleshoot the issue, consider the following additional steps: Check Impala logs for any error messages or warnings that might indicate issues with loading data or adding partitions. Verify that the data files corresponding to the partitions are present in the specified HDFS directory paths. Ensure that the partitioning column (loading_date) values in the data files match the partition definitions specified in the ALTER TABLE statements. Regards, Chethan YM
... View more
10-20-2023
11:53 AM
Hi Krishna We have done this by pushing commands out to the shell after setting up a trusted SSH connection between CDSW and the Unix server. This is the python function we use: user_name = "username"
unix_server = "my.unix.host"
unix_path = "/some/path"
file_to_transfer = "my_csv_file.csv"
def scp_file_to_sas(local_path, file_name, user_name, unix_server, unix_path):
p = subprocess.Popen(
[
"scp",
"-v",
local_path + file_name,
user_name + "@" + unix_server + ":/" + unix_path + "/" + file_name,
]
)
sts = os.waitpid(p.pid, 0)
... View more
07-12-2023
09:42 AM
Hi. I wanted to bump this thread because I have a same question.
... View more
07-12-2023
12:46 AM
Hello All, We are looking for generic solutation to change CDSW jobs owner to other user or technical user. Please advise.
... View more
Labels: