Member since
04-25-2020
43
Posts
5
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
574 | 11-19-2023 11:07 AM | |
810 | 09-30-2023 09:10 AM | |
1076 | 03-20-2022 03:00 AM | |
1169 | 03-20-2022 02:47 AM |
01-21-2022
09:09 AM
Hi Experts, One of our oozie workflow got failed while sqoop import job. Please find below logs: org.apache.velocity.exception.VelocityException: Error initializing log: Failed to initialize an instance of org.apache.velocity.runtime.log.Log4JLogChute with the current runtime configuration. in other logs as : org.springframework.web.client.HttpServerErrorException: 500 null Please tell what could be the possible cause of this failure, we retried this job but getting the same error.
... View more
Labels:
- Labels:
-
Apache Sqoop
08-09-2021
10:48 AM
2 Kudos
Hi @ryu , I have recently copied the hive tables from our Production cluster to non production cluster using distcp the location of hive warehouse directory from Prod to non prod, After running distcp we created the table schema on non prod as same as Prod using 'create table'. If table consist partition then please apply 'alter table' to add partition. We are also using hive replication to copy the tables from our Prod to DR cluster. If this has helped you then please mark the answer as solution.
... View more
07-12-2021
07:19 AM
Hello Experts, We have identified that 2 records have been duplicated in our hive tables. We have taken the backup of tables in case if we need to rollback. But now when we run insert overwrite command (e.g. insert overwrite table demo select distinct * from demo;)on smallest table with raw volume of "570 GB" but we are getting the following error. INFO : 2021-07-11 15:33:47,756 Stage-0_0: 122/122 Finished Stage-1_0: 70(+380,-64)/978 INFO : state = STARTED INFO : state = FAILED ERROR : Status: Failed ERROR : FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask DEBUG : Shutting down query insert overwrite table raw_switch.partitiontable select distinct * from raw_switch.partitiontable INFO : Completed executing command(queryId=hive_19660743242525_d9c3a756-452f-472c-a92e-2b966c37d0ce); Time taken: 4078.407 seconds DEBUG : Shutting down query insert overwrite table raw_switch.partitiontable select distinct * from raw_switch.partitiontable Error: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask (state=08S01,code=3) Please find below hive server2 logs:- 2021-07-11 15:33:49,834 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: [HiveServer2-Background-Pool: Thread-29919]: Call: delete took 30ms 2021-07-11 15:33:49,834 ERROR org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-29919]: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask The default parameters of hive are as follows:- hive.execution.engine=spark; spark.executor.memory=12g; spark.executor.cores=4; hive.optimize.sort.dynamic.partition=true; hive.exec.dynamic.partition.mode=strict; Kindly suggest how to resolve this issue. Do we need to change any of the above default parameters or some other parameters which we have missed. Hope, we are running the correct query of insert overwrite to remove duplicate records.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
12-14-2020
11:53 PM
@Tim Armstrong Thanks for your quick reply. I did ran the compute stats command to know the query performance however I got the error below: compute stats <table name>; ERROR: AnalysisException: COMPUTE STATS not supported for view: I also got the details from the Query details after running the query. I had to cancel it as it takes a lot of resources and goes in executing state for longer time. Query Info Query ID: 130b304cc42b5010:19ef656c00000000 User: hadmin@COMPS-NVIRGINIA.LOCAL Database: prod_cdb Coordinator: usnprod4.n-virginia.dc Query Type: QUERY Query State: EXCEPTION Start Time: Dec 15, 2020 7:17:34 AM End Time: Dec 15, 2020 7:18:59 AM Duration: 1m, 24s Rows Produced: 0 Admission Result: Admitted immediately Admission Wait Time: 0ms Aggregate Peak Memory Usage: 13.2 GiB Bytes Streamed: 15.6 GiB Client Fetch Wait Time: 0ms Client Fetch Wait Time Percentage: 0 Connected User: hadmin@COMPS-NVIRGINIA.LOCAL Estimated per Node Peak Memory: 12.8 GiB File Formats: PARQUET/NONE,PARQUET/SNAPPY HDFS Average Scan Range: 90.5 KiB HDFS Bytes Read: 24.7 GiB HDFS Bytes Read From Cache: 0 B HDFS Bytes Read From Cache Percentage: 0 HDFS Local Bytes Read: 23.7 GiB HDFS Local Bytes Read Percentage: 96 HDFS Remote Bytes Read: 1.1 GiB HDFS Remote Bytes Read Percentage: 4 HDFS Scanner Average Read Throughput: 155.9 MiB/s HDFS Short Circuit Bytes Read: 23.7 GiB HDFS Short Circuit Bytes Read Percentage: 96 Impala Version: impalad version 2.11.0-cdh5.14.4 RELEASE Memory Accrual: 158,669,819,348 byte seconds Memory Spilled: 1.0 GiB Network Address: 10.206.100.226:42238 Node with Peak Memory Usage: usnprod3.n-virginia.dc:22000 Out of Memory: false Per Node Peak Memory Usage: 5.2 GiB Planning Wait Time: 6.69s Planning Wait Time Percentage: 8 Pool: root.default Query Status: Cancelled Session ID: 9b32167b6eef775e:293bgd8a4350f200 Session Type: BEESWAX Statistics Corrupt: false Statistics Missing: true Threads: CPU Time: 2.9m Threads: CPU Time Percentage: 2 Threads: Network Receive Wait Time: 11.4m Threads: Network Receive Wait Time Percentage: 6 Threads: Network Send Wait Time: 45.4m Threads: Network Send Wait Time Percentage: 24 Threads: Storage Wait Time: 2.2h Threads: Storage Wait Time Percentage: 69 Threads: Total Time: 3.2h Please suggest is it due to memory issue that the query is actually getting time out and there is no output from the impala shell, however it shows query executing in cloudera. Please note: The Default Query Memory Limit in Impala = 6 GB Max Memory = 270 GB A quick reply would be highly appreciated.
... View more
12-14-2020
03:02 AM
Hello, One of our end user tried to run an impala query, which is actually a view with a long query statement. We have noticed that the query time out's and we have found below logs in query details in cloudera. WARNING: The following tables are missing relevant table and/or column statistics. prod_cdb.mig_pdg_common, cdb.mig.accepted_common, prod_cdb.mig_post_correlation_common, prod_cdb.output_pre_correlation_common, prod_cdb.retained_for_correlation_common What could be the reason for this, does the view need to be validated or does impala does not support such view. what is the other way to know the root cause. Any suggestions, would be highly appreciated.
... View more
Labels:
- Labels:
-
Apache Impala
11-07-2020
12:41 AM
Hi Ateka, Can you share what error you are getting. while submitting the command to better understand the issue.
... View more
11-01-2020
02:54 AM
Hi, I am trying to see a summary of a query's progress that updates in real-time, Hence I ran 'set LIVE_PROGRESS=1;'. But I am getting the message stating as below: ERROR: User hadmin@HADOOP-GROUP.LOCAL is not authorized to access the runtime profile or execution summary. However, the user hadmin is the authorized user to view that particular database and table. Please suggest what could be the reason and how to fix this, Since I have taken the Kerberos ticket as well.
... View more
Labels:
- Labels:
-
Apache Impala
10-24-2020
11:40 PM
Hi Tushar Thanks a lot for your quick reply. The resolution you provided has worked. I am accepting it as a solution. Thanks once again.
... View more
10-20-2020
01:41 AM
Hello, We are using a beeline or impala-shell to extract data from hive tables as per requirement from the end-user. However, the request for data extraction is for high records for more than 1000 or sometimes more than 3000. It is very tedious to extract using the select query and dump to excel sheet. Is there any alternative way to take the output in a CSV file?. Like the output of select query moves to CSV file. Please suggest.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
10-16-2020
11:26 AM
Hi, To know what should be the heap memory of the name node, please provide the storage capacity of each node and the replication factor. Based on this we can calculate the heap memory. the default block size is recommended as a default block size for the large and small clusters.
... View more