About HanzalaShaikh

HanzalaShaikh · ‎01-21-2022

Hi Experts, One of our oozie workflow got failed while sqoop import job. Please find below logs: org.apache.velocity.exception.VelocityException: Error initializing log: Failed to initialize an instance of org.apache.velocity.runtime.log.Log4JLogChute with the current runtime configuration. in other logs as : org.springframework.web.client.HttpServerErrorException: 500 null Please tell what could be the possible cause of this failure, we retried this job but getting the same error.

HanzalaShaikh · ‎08-09-2021

Hi @ryu , I have recently copied the hive tables from our Production cluster to non production cluster using distcp the location of hive warehouse directory from Prod to non prod, After running distcp we created the table schema on non prod as same as Prod using 'create table'. If table consist partition then please apply 'alter table' to add partition. We are also using hive replication to copy the tables from our Prod to DR cluster. If this has helped you then please mark the answer as solution.

HanzalaShaikh · ‎07-12-2021

Hello Experts, We have identified that 2 records have been duplicated in our hive tables. We have taken the backup of tables in case if we need to rollback. But now when we run insert overwrite command (e.g. insert overwrite table demo select distinct * from demo;)on smallest table with raw volume of "570 GB" but we are getting the following error. INFO : 2021-07-11 15:33:47,756 Stage-0_0: 122/122 Finished Stage-1_0: 70(+380,-64)/978 INFO : state = STARTED INFO : state = FAILED ERROR : Status: Failed ERROR : FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask DEBUG : Shutting down query insert overwrite table raw_switch.partitiontable select distinct * from raw_switch.partitiontable INFO : Completed executing command(queryId=hive_19660743242525_d9c3a756-452f-472c-a92e-2b966c37d0ce); Time taken: 4078.407 seconds DEBUG : Shutting down query insert overwrite table raw_switch.partitiontable select distinct * from raw_switch.partitiontable Error: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask (state=08S01,code=3) Please find below hive server2 logs:- 2021-07-11 15:33:49,834 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: [HiveServer2-Background-Pool: Thread-29919]: Call: delete took 30ms 2021-07-11 15:33:49,834 ERROR org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-29919]: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask The default parameters of hive are as follows:- hive.execution.engine=spark; spark.executor.memory=12g; spark.executor.cores=4; hive.optimize.sort.dynamic.partition=true; hive.exec.dynamic.partition.mode=strict; Kindly suggest how to resolve this issue. Do we need to change any of the above default parameters or some other parameters which we have missed. Hope, we are running the correct query of insert overwrite to remove duplicate records.

HanzalaShaikh · ‎12-14-2020

@Tim Armstrong Thanks for your quick reply. I did ran the compute stats command to know the query performance however I got the error below: compute stats <table name>; ERROR: AnalysisException: COMPUTE STATS not supported for view: I also got the details from the Query details after running the query. I had to cancel it as it takes a lot of resources and goes in executing state for longer time. Query Info Query ID: 130b304cc42b5010:19ef656c00000000 User: hadmin@COMPS-NVIRGINIA.LOCAL Database: prod_cdb Coordinator: usnprod4.n-virginia.dc Query Type: QUERY Query State: EXCEPTION Start Time: Dec 15, 2020 7:17:34 AM End Time: Dec 15, 2020 7:18:59 AM Duration: 1m, 24s Rows Produced: 0 Admission Result: Admitted immediately Admission Wait Time: 0ms Aggregate Peak Memory Usage: 13.2 GiB Bytes Streamed: 15.6 GiB Client Fetch Wait Time: 0ms Client Fetch Wait Time Percentage: 0 Connected User: hadmin@COMPS-NVIRGINIA.LOCAL Estimated per Node Peak Memory: 12.8 GiB File Formats: PARQUET/NONE,PARQUET/SNAPPY HDFS Average Scan Range: 90.5 KiB HDFS Bytes Read: 24.7 GiB HDFS Bytes Read From Cache: 0 B HDFS Bytes Read From Cache Percentage: 0 HDFS Local Bytes Read: 23.7 GiB HDFS Local Bytes Read Percentage: 96 HDFS Remote Bytes Read: 1.1 GiB HDFS Remote Bytes Read Percentage: 4 HDFS Scanner Average Read Throughput: 155.9 MiB/s HDFS Short Circuit Bytes Read: 23.7 GiB HDFS Short Circuit Bytes Read Percentage: 96 Impala Version: impalad version 2.11.0-cdh5.14.4 RELEASE Memory Accrual: 158,669,819,348 byte seconds Memory Spilled: 1.0 GiB Network Address: 10.206.100.226:42238 Node with Peak Memory Usage: usnprod3.n-virginia.dc:22000 Out of Memory: false Per Node Peak Memory Usage: 5.2 GiB Planning Wait Time: 6.69s Planning Wait Time Percentage: 8 Pool: root.default Query Status: Cancelled Session ID: 9b32167b6eef775e:293bgd8a4350f200 Session Type: BEESWAX Statistics Corrupt: false Statistics Missing: true Threads: CPU Time: 2.9m Threads: CPU Time Percentage: 2 Threads: Network Receive Wait Time: 11.4m Threads: Network Receive Wait Time Percentage: 6 Threads: Network Send Wait Time: 45.4m Threads: Network Send Wait Time Percentage: 24 Threads: Storage Wait Time: 2.2h Threads: Storage Wait Time Percentage: 69 Threads: Total Time: 3.2h Please suggest is it due to memory issue that the query is actually getting time out and there is no output from the impala shell, however it shows query executing in cloudera. Please note: The Default Query Memory Limit in Impala = 6 GB Max Memory = 270 GB A quick reply would be highly appreciated.

HanzalaShaikh · ‎12-14-2020

Hello, One of our end user tried to run an impala query, which is actually a view with a long query statement. We have noticed that the query time out's and we have found below logs in query details in cloudera. WARNING: The following tables are missing relevant table and/or column statistics. prod_cdb.mig_pdg_common, cdb.mig.accepted_common, prod_cdb.mig_post_correlation_common, prod_cdb.output_pre_correlation_common, prod_cdb.retained_for_correlation_common What could be the reason for this, does the view need to be validated or does impala does not support such view. what is the other way to know the root cause. Any suggestions, would be highly appreciated.

HanzalaShaikh · ‎11-07-2020

Hi Ateka, Can you share what error you are getting. while submitting the command to better understand the issue.

HanzalaShaikh · ‎11-01-2020

Hi, I am trying to see a summary of a query's progress that updates in real-time, Hence I ran 'set LIVE_PROGRESS=1;'. But I am getting the message stating as below: ERROR: User hadmin@HADOOP-GROUP.LOCAL is not authorized to access the runtime profile or execution summary. However, the user hadmin is the authorized user to view that particular database and table. Please suggest what could be the reason and how to fix this, Since I have taken the Kerberos ticket as well.

HanzalaShaikh · ‎10-24-2020

Hi Tushar Thanks a lot for your quick reply. The resolution you provided has worked. I am accepting it as a solution. Thanks once again.

HanzalaShaikh · ‎10-20-2020

Hello, We are using a beeline or impala-shell to extract data from hive tables as per requirement from the end-user. However, the request for data extraction is for high records for more than 1000 or sometimes more than 3000. It is very tedious to extract using the select query and dump to excel sheet. Is there any alternative way to take the output in a CSV file?. Like the output of select query moves to CSV file. Please suggest.

HanzalaShaikh · ‎10-16-2020

Hi, To know what should be the heap memory of the name node, please provide the storage capacity of each node and the replication factor. Based on this we can calculate the heap memory. the default block size is recommended as a default block size for the large and small clusters.

Online	Offline
Last Visited	‎10-05-2024 10:54 AM

Member Since	‎04-25-2020 12:26 AM
Last Visited	‎10-05-2024 10:54 AM
Posts	43
Kudos received	5

Cloudera Community

Re: Is it best to increase the disk or datanode to...

Re: CDP minor upgrade to close vulnerability

Re: Oozie workflow failed for sqoop import job

Re: Impala Daemon Shutdown and unable to start up

Oozie workflow failed for sqoop import job

Re: How to properly copy hive tables from one clus...

Error while running Insert overwrite query on hive...

Re: Impala query time out's

Impala query time out's

Re: How to check the size of Hive view table?

Not able to access the runtime profile or executi...

Re: Convert select query output to CSV from beelin...

Convert select query output to CSV from beeline or...

Re: heap size of namenode/ blocksize