Member since
04-25-2020
27
Posts
1
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
153 | 03-20-2022 03:00 AM | |
145 | 03-20-2022 02:47 AM |
03-21-2022
03:46 AM
Hi @Katja CAST is supported in impala and hive, you said the query works where have you tested it. Can you try running on beeline and see if it works.
... View more
03-20-2022
03:00 AM
Yes before I tried runing sqoop import manually, I restarted the oozie sqoop service multiple times until I can see the oozie workflow was initialized in the logs. The workflows started working as expected and the job completed.
... View more
03-20-2022
02:47 AM
We are able to bring the impala daemon up by disabling Enable Lineage Collection and Enable Impala Lineage Generation in Impala Configuration and after this the impala queries started resuming fine and the data loads as expected but when I try to enable it back to the previous state it is again going down not sure why it is happening. if any one has any suggestions please tell.
... View more
03-18-2022
11:26 PM
Hi Experts, One of the Impala daemon suddenly went down and we are unable to bring it up. Some queries were executed on the impala daemon. I have captured the logs when it got shutdown. Could not open log file: /var/log/impalad/lineage/impala_lineage_log_1.0-1647629222243 @ 0x95b479 impala::Status::Status() @ 0xd00b95 impala::SimpleLogger::FlushInternal() @ 0xd01fcf impala::SimpleLogger::Init() @ 0xbb9c2a impala::ImpalaServer::InitLineageLogging() @ 0xbcb25e impala::ImpalaServer::ImpalaServer() @ 0xbb676d ImpaladMain() @ 0x8e2593 main @ 0x7f5cf7658495 __libc_start_main @ 0x9299e1 (unknown) ERROR cc:312 Aborting Impala Server startup due to failure initializing lineage logging. Impalad exiting. Can some one please tell what steps should be taken, Just to note Enable Audit Collection and Enable Impala Audit Event Generation is enabled/checked.
... View more
Labels:
- Labels:
-
Apache Impala
01-21-2022
09:09 AM
Hi Experts, One of our oozie workflow got failed while sqoop import job. Please find below logs: org.apache.velocity.exception.VelocityException: Error initializing log: Failed to initialize an instance of org.apache.velocity.runtime.log.Log4JLogChute with the current runtime configuration. in other logs as : org.springframework.web.client.HttpServerErrorException: 500 null Please tell what could be the possible cause of this failure, we retried this job but getting the same error.
... View more
Labels:
- Labels:
-
Apache Sqoop
01-21-2022
09:00 AM
Sharon, Welcome to cloudera community
... View more
08-09-2021
10:48 AM
1 Kudo
Hi @ryu , I have recently copied the hive tables from our Production cluster to non production cluster using distcp the location of hive warehouse directory from Prod to non prod, After running distcp we created the table schema on non prod as same as Prod using 'create table'. If table consist partition then please apply 'alter table' to add partition. We are also using hive replication to copy the tables from our Prod to DR cluster. If this has helped you then please mark the answer as solution.
... View more
08-09-2021
05:09 AM
Hello Team, Previously, we used to query the view to fetch required output from hive tables but the performance was very poor hence we came up with the idea of Materialized view but unfortunately, that is also not supported in CDH 5.14.4 of hive version 1.1. We have second approach to overcome the challenge of view and I need any suggestions if below approach supported in hive and what all hive properties need to be changed in order to implement this. We thought of creating a staging table and use it to fetch the report rather than using a view. The required indexes can be added into the staging table. So that the report generation can be done quickly. Below are the actions given by our Developer:- 1. Create a table which is a union of multiple base tables. 2. Create a non-unique index on one of the key columns in the table. 3. Generate a report via front end which should call the staging table instead of a view. 4. Data will regularly be inserted into the base tables on a daily basis. 5. We need to identify those new records and insert them into a staging table once in a day . 6. Plan for rebuilding the index on a staging table once a month. As per our Developer this is the only option we left with and based on our knowledge hopefully this should work but want to check if this can help.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
08-03-2021
08:00 AM
Hi Experts, We were facing lot of challenges while querying the view in hive, it is taking lot of resources, hence we thought of creating a materialized view but unfortunately it not supported with our current version (Hive 1.1.0-cdh5.14.4). So we have come with a different solution that instead of view actual table from the view we can have as an intermediate table so every 1 hour what is left out delta information is fetched. Has any done this before, considering our existing view having poor outcomes and as materialized view is not supported, we have found this as only solution. Please suggest what all runtime behavior on hive should be changed and how this can be implemented.
... View more
Labels:
- Labels:
-
Apache Hive
07-12-2021
07:19 AM
Hello Experts, We have identified that 2 records have been duplicated in our hive tables. We have taken the backup of tables in case if we need to rollback. But now when we run insert overwrite command (e.g. insert overwrite table demo select distinct * from demo;)on smallest table with raw volume of "570 GB" but we are getting the following error. INFO : 2021-07-11 15:33:47,756 Stage-0_0: 122/122 Finished Stage-1_0: 70(+380,-64)/978 INFO : state = STARTED INFO : state = FAILED ERROR : Status: Failed ERROR : FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask DEBUG : Shutting down query insert overwrite table raw_switch.partitiontable select distinct * from raw_switch.partitiontable INFO : Completed executing command(queryId=hive_19660743242525_d9c3a756-452f-472c-a92e-2b966c37d0ce); Time taken: 4078.407 seconds DEBUG : Shutting down query insert overwrite table raw_switch.partitiontable select distinct * from raw_switch.partitiontable Error: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask (state=08S01,code=3) Please find below hive server2 logs:- 2021-07-11 15:33:49,834 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: [HiveServer2-Background-Pool: Thread-29919]: Call: delete took 30ms 2021-07-11 15:33:49,834 ERROR org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-29919]: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask The default parameters of hive are as follows:- hive.execution.engine=spark; spark.executor.memory=12g; spark.executor.cores=4; hive.optimize.sort.dynamic.partition=true; hive.exec.dynamic.partition.mode=strict; Kindly suggest how to resolve this issue. Do we need to change any of the above default parameters or some other parameters which we have missed. Hope, we are running the correct query of insert overwrite to remove duplicate records.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
12-14-2020
11:53 PM
@Tim Armstrong Thanks for your quick reply. I did ran the compute stats command to know the query performance however I got the error below: compute stats <table name>; ERROR: AnalysisException: COMPUTE STATS not supported for view: I also got the details from the Query details after running the query. I had to cancel it as it takes a lot of resources and goes in executing state for longer time. Query Info Query ID: 130b304cc42b5010:19ef656c00000000 User: hadmin@COMPS-NVIRGINIA.LOCAL Database: prod_cdb Coordinator: usnprod4.n-virginia.dc Query Type: QUERY Query State: EXCEPTION Start Time: Dec 15, 2020 7:17:34 AM End Time: Dec 15, 2020 7:18:59 AM Duration: 1m, 24s Rows Produced: 0 Admission Result: Admitted immediately Admission Wait Time: 0ms Aggregate Peak Memory Usage: 13.2 GiB Bytes Streamed: 15.6 GiB Client Fetch Wait Time: 0ms Client Fetch Wait Time Percentage: 0 Connected User: hadmin@COMPS-NVIRGINIA.LOCAL Estimated per Node Peak Memory: 12.8 GiB File Formats: PARQUET/NONE,PARQUET/SNAPPY HDFS Average Scan Range: 90.5 KiB HDFS Bytes Read: 24.7 GiB HDFS Bytes Read From Cache: 0 B HDFS Bytes Read From Cache Percentage: 0 HDFS Local Bytes Read: 23.7 GiB HDFS Local Bytes Read Percentage: 96 HDFS Remote Bytes Read: 1.1 GiB HDFS Remote Bytes Read Percentage: 4 HDFS Scanner Average Read Throughput: 155.9 MiB/s HDFS Short Circuit Bytes Read: 23.7 GiB HDFS Short Circuit Bytes Read Percentage: 96 Impala Version: impalad version 2.11.0-cdh5.14.4 RELEASE Memory Accrual: 158,669,819,348 byte seconds Memory Spilled: 1.0 GiB Network Address: 10.206.100.226:42238 Node with Peak Memory Usage: usnprod3.n-virginia.dc:22000 Out of Memory: false Per Node Peak Memory Usage: 5.2 GiB Planning Wait Time: 6.69s Planning Wait Time Percentage: 8 Pool: root.default Query Status: Cancelled Session ID: 9b32167b6eef775e:293bgd8a4350f200 Session Type: BEESWAX Statistics Corrupt: false Statistics Missing: true Threads: CPU Time: 2.9m Threads: CPU Time Percentage: 2 Threads: Network Receive Wait Time: 11.4m Threads: Network Receive Wait Time Percentage: 6 Threads: Network Send Wait Time: 45.4m Threads: Network Send Wait Time Percentage: 24 Threads: Storage Wait Time: 2.2h Threads: Storage Wait Time Percentage: 69 Threads: Total Time: 3.2h Please suggest is it due to memory issue that the query is actually getting time out and there is no output from the impala shell, however it shows query executing in cloudera. Please note: The Default Query Memory Limit in Impala = 6 GB Max Memory = 270 GB A quick reply would be highly appreciated.
... View more
12-14-2020
03:02 AM
Hello, One of our end user tried to run an impala query, which is actually a view with a long query statement. We have noticed that the query time out's and we have found below logs in query details in cloudera. WARNING: The following tables are missing relevant table and/or column statistics. prod_cdb.mig_pdg_common, cdb.mig.accepted_common, prod_cdb.mig_post_correlation_common, prod_cdb.output_pre_correlation_common, prod_cdb.retained_for_correlation_common What could be the reason for this, does the view need to be validated or does impala does not support such view. what is the other way to know the root cause. Any suggestions, would be highly appreciated.
... View more
- Tags:
- impala
Labels:
- Labels:
-
Apache Impala
11-07-2020
12:41 AM
Hi Ateka, Can you share what error you are getting. while submitting the command to better understand the issue.
... View more
11-01-2020
02:54 AM
Hi, I am trying to see a summary of a query's progress that updates in real-time, Hence I ran 'set LIVE_PROGRESS=1;'. But I am getting the message stating as below: ERROR: User hadmin@HADOOP-GROUP.LOCAL is not authorized to access the runtime profile or execution summary. However, the user hadmin is the authorized user to view that particular database and table. Please suggest what could be the reason and how to fix this, Since I have taken the Kerberos ticket as well.
... View more
Labels:
- Labels:
-
Apache Impala
10-24-2020
11:40 PM
Hi Tushar Thanks a lot for your quick reply. The resolution you provided has worked. I am accepting it as a solution. Thanks once again.
... View more
10-20-2020
01:41 AM
Hello, We are using a beeline or impala-shell to extract data from hive tables as per requirement from the end-user. However, the request for data extraction is for high records for more than 1000 or sometimes more than 3000. It is very tedious to extract using the select query and dump to excel sheet. Is there any alternative way to take the output in a CSV file?. Like the output of select query moves to CSV file. Please suggest.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
10-16-2020
11:26 AM
Hi, To know what should be the heap memory of the name node, please provide the storage capacity of each node and the replication factor. Based on this we can calculate the heap memory. the default block size is recommended as a default block size for the large and small clusters.
... View more
10-14-2020
09:46 AM
Hi,
I want to enable an on-demand metadata feature for impala as it gives a lot of improvement.
one of the features is that the cached metadata gets evicted automatically under memory pressure.
I want to know if this feature can be enabled in Cloudera CDH 5.14.4 version.
Has anyone implemented this on prod on CDH 5.14.4 version?
... View more
Labels:
10-14-2020
07:10 AM
Hi Tim, Your suggestion was very helpful. I have a good understanding now. I am accepting as a solution. I just have one more thing to ask, to fix the issue of the query utilizing the resources it is better to increase the Impala Daemon Memory Limit (mem_limit). what do you suggest?
... View more
10-03-2020
07:21 AM
Hi Tim, Can you explain in more detail how can I do this? E.g. you could set up memory-based admission control with a min memory limit of 2GB and a max memory limit of 20GB to prevent anyone query from taking up all the memory on a node.
... View more
10-03-2020
12:29 AM
Hi Tim, Thanks for your reply I can only see two parameters for the memory limit. one is Single Pool Mem Limit (default_pool_mem_limit) = -1 B the second one is Impala Daemon memory limit (mem_limt) = 60 GB So how do I set now min memory limit and max memory limit.
... View more
10-02-2020
11:18 AM
After running Impala query select distinct(partition_date) parts from mddb_servt; I am getting below error: Execution time 10 seconds 2) java.sql.SQLException: [Cloudera][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:ExecQueryFInstances rpc query_id=937d334667doe010:4967d4b900000111 failed: Failed to get minimum memory reservation of 68.00 MB on daemon ec2-3-128-13.us-east-2:22000 for query 937d334667doe010:4967d4b900000111 because it would exceed an applicable memory limit. Memory is likely oversubscribed. Reducing query concurrency or configuring admission control may help avoid this error. Memory usage: Process: Limit=60.00 GB Total=50.68 GB Peak=50.70 GB Buffer Pool: Free Buffers: Total=0 Buffer Pool: Clean Pages: Total=2.31 GB Buffer Pool: Unused Reservation: Total=-2.29 GB Free Disk IO Buffers: Total=1.10 GB Peak=1.15 GB RequestPool=root.default: Total=48.28 GB Peak=48.37 GB Query(78befceb1eef47:d33db5f200030000): Reservation=47.49 GB ReservationLimit=48.00 GB OtherMemory=293.93 MB Total=47.78 GB Peak=47.81 GB Query(e12345ed0a094d14:f4616fb90030000): Reservation=238.00 MB ReservationLimit=48.00 GB OtherMemory=4.27 MB Total=242.27 MB Peak=303.22 MB Query(be7896564af6f2c:1e675bb00000000): Reservation=272.00 MB ReservationLimit=48.00 GB OtherMemory=4.23 MB Total=276.23 MB Peak=314.22 MB Query(914d001522ce0e10:264bd4b900000000): Reservation=0 ReservationLimit=48.00 GB OtherMemory=0 Total=0 Peak=0 RequestPool=root.anp: Total=0 Peak=536.50 MB Untracked Memory: Total=1.28 GB Please note: Impala Daemon Memory Limit mem_limit = 60 GiB Please let me know what could be the reason.
... View more
Labels:
- Labels:
-
Apache Impala
08-15-2020
11:10 PM
@GangWar thanks for your quick reply. As per your suggestion I have checked the logs on both the locations on namenode and datanodes. On one of the data nodes I have checked the logs in /var/run/cloudera-scm-agent/process/28-hdfs-DATANODE/logs, I have found below results in the log when searched by the keyword Error: ++ replace_pid -Xms521142272 -Xmx521142272 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:+HeapDumpOnOutOfMemoryError '-XX:HeapDumpPath=/tmp/hdfs_hdfs-DATANODE-111b6db5e742dbffe061f0c1d6bc8878_pid{{PID}}.hprof' -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh ++ sed 's#{{PID}}#5409#g' However, I don't see any error message in /var/log/hadoop-hdfs, please also suggest which log file to check to debug. audit hadoop-cmf-hdfs-NAMENODE-namenode1.us-east1-b.c.coherent-elf-271314.internal.log.out hdfs-audit.log SecurityAuth-hdfs.audit stacks
... View more
08-15-2020
02:56 AM
I am stuck in the middle of Cluster setup after installing Cloudera manager. The first 3 steps have completed however when it comes to starting HDFS it has successfully formatted the name directories of the current NameNode but got stuck in Start HDFS, please find below the error. Cluster Setup First Run Command Status Running Aug 15, 9:16:45 AM There was an error when communicating with the server. See the log file for more information. Completed 3 of 8 step(s). Show All Steps Show Only Failed Steps Show Running Steps Ensuring that the expected software releases are installed on hosts. Aug 15, 9:16:45 AM 90ms Deploying Client Configuration Cluster 1 Aug 15, 9:16:45 AM 16.13s Start Cloudera Management Service, ZooKeeper Aug 15, 9:17:01 AM 27.87s Start HDFS 0/1 steps completed. Aug 15, 9:17:29 AM Execute 3 steps in sequence Waiting for command (Start (77)) to finish Aug 15, 9:17:29 AM Formatting the name directories of the current NameNode. If the name directories are not empty, this is expected to fail. NameNode (namenode1) Aug 15, 9:17:29 AM 14.86s Start HDFS There was an error when communicating with the server. See the log file for more information. I am unable to check the logs as the cluster is not fully setup, please suggest what could be the reason and how to fix these. I am installing with version 5.16.2
... View more
Labels:
- Labels:
-
Cloudera Manager
-
HDFS
08-11-2020
04:30 AM
Thanks a lot, @GangWar . You are absolutely correct I was using OpenJDK instead of Oracle JDK. I thought there is a bug in this but I didn't have any clue how to fix this but after making changes as per your suggestion it worked. Thanks a lot. I am accepting it as a solution.
... View more
08-06-2020
03:25 AM
I can't open /run/cloudera-scm-agent/process/256-yarn-NODEMANAGER/container-executor.cfg: Permission denied. + perl -pi -e 's#{{CGROUP_GROUP_CPU}}##g' /run/cloudera-scm-agent/process/256-yarn-NODEMANAGER/yarn-site.xml I am getting this error after enabling Kerberos in CDH cluster, HDFS and yarn are not able to start. After checking the yarn Node Manager logs I see the below error. yarn nodemanager logs: : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.security.authorize.AuthorizationException: : User: cloudera@CLUSTERIE.LOCAL is not allowed to impersonate yarn/ip-10-0-xxxxx@xyz.com Any suggestion why am I getting this error, when I disable the Kerberos everything works well. Please assist as the severity of this is very high.
... View more
07-30-2020
09:33 AM
Thanks a lot for your quick reply and for providing a detailed explanation, really appreciated. I was using a 5.14.1 version of Cloudera. Yes, there is a bug in the older version I realized soon after that, but your explanation has made me understand the reason behind the error.
... View more
07-27-2020
03:33 AM
I am setting up a Pre prod cluster from Path B. I ran the command "/usr/share/cmf/schema/scm_prepare_database.sh mysql -h mysqldatabase scm temp password" to check the database status. I am getting the error of log4j however the SCM database is configured correctly. Please find the below error and suggest any workaround for this error. [root@clouderamanage cloudera-scm-server]# /usr/share/cmf/schema/scm_prepare_database.sh mysql -h webserver scmmysqldatabase password JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera Verifying that we can write to /etc/cloudera-scm-server Creating SCM configuration file in /etc/cloudera-scm-server Executing: /usr/java/jdk1.7.0_67-cloudera/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/java/postgresql-connector-java.jar:/usr/share/cmf/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db. log4j: ERROR Could not find value for key log4j.appender.A log4j: ERROR Could not instantiate appender named "A". [2020-07-27 10:20:19,893] INFO 0[main] - com.cloudera.enterprise.dbutil.DbCommandExecutor.testDbConnection(DbCommandExecutor.java) - Successfully connected to database. All done, your SCM database is configured correctly!
... View more
Labels:
- Labels:
-
Cloudera Manager
05-03-2020
05:04 AM
Hi everyone, My name is Hanzala Shaikh. I have 5+ years of experience in core IT which includes projects on Big Data Hadoop platform specialization on Cloudera EDH in AWS cloud. Worked on Multi-Cloud Environment on AWS/GCP. My expertise is in Hadoop Administration and managing production-grade cluster on AWS Cloud. I have worked on multiple projects in the healthcare domain on a secured cluster. I have good exposure to Cloudera and Hortonworks distribution as I have worked on both platforms. Now, I am here to contribute my knowledge to the community in the best possible way as I can and also to gain some more knowledge and get my queries answered.
... View more