Member since
01-31-2019
26
Posts
7
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4503 | 01-30-2020 08:10 AM | |
1976 | 08-02-2019 02:35 AM | |
770 | 04-24-2019 10:07 AM | |
3701 | 04-24-2019 02:27 AM |
12-16-2021
05:53 AM
2 Kudos
hello Becky You can either reduce " split max size" in order to gain more mappers SET mapreduce.input.fileinputformat.split.maxsize; Or you can try : Set mapreduce.job.maps=XX for the second option, you may need to disable map files hive.merge.mapfiles=false Let me know if any solution works for you Good luck
... View more
01-18-2021
02:50 AM
hello, try to find your log4j.properties file ( my case /etc/hadoop/conf.cloudera.hdfs/log4j.properties) and add these two lines : log4j.appender. RFA =org.apache.log4j.ConsoleAppender log4j.appender. RFA .layout=org.apache.log4j.PatternLayout good luck
... View more
01-18-2021
02:33 AM
1 Kudo
hello Michael, I had a similar issue with my CDH bases cluster, solved by a stupid-like solution. What I did is that first I turned the replication factor into 2 instead of 3, /*under replicated blocks notice should appear */ run the rebalance (by Blockpool then by Datanode to make some shuffles between data nodes ) , then reconfigure the replication factor to 3, then I noticed some major changes. Not sure if that gonna work for you but just wanted to share my experience if want to try it. Good luck
... View more
01-18-2021
02:16 AM
hello, So to have a clearer vision of what you're seeking can you tell us if you're willing to connect your host to an already installed cluster or you just want to install Hadoop on a single machine (standalone )? On another side, your server needs repositories to install from the components (you have to configure your local repository on your server or any other server that can connect to it via a private IP for example )
... View more
09-21-2020
08:44 AM
I have tested the backup/restore solution and seems to be working like charm with spark :
-First, check and record the names as given in the list of the kudu_master (or the primary elected master in case of multi masters ) http://Master1:8051/tables
-Download the kudu-backupX.X.jar in case you can't find it in /opt/cloudera/parcels/CDH-X.Xcdh.XX/lib/ and put it there
-In kuduMasterAddresses you put the name of your Kudu_master or the names of your three masters separated by ','
-Backup : sudo -u hdfs spark2-submit --class org.apache.kudu.backup.KuduBackup /opt/cloudera/parcels/CDH-X.Xcdh.XX/lib/kudu-backup2_2.11-1.13.0.jar --kuduMasterAddresses MASTER1(,MASTER2,..) --rootPath hdfs:///PATH_HDFS impala::DB.TABLE
-COPY : sudo -u hdfs hadoop distcp -i - hdfs:///PATH_HDFS/DB.TABLE hdfs://XXX:8020/kudu_backups/ -Restore:
sudo -u hdfs spark2-submit --class org.apache.kudu.backup.KuduRestore /opt/cloudera/parcels/CDH-X.Xcdh.XX/lib/kudu-backup2_2.11-1.13.0.jar --kuduMasterAddresses MASTER1(,MASTER2,..) --rootPath hdfs:///PATH_HDFS impala::DB.TABLE finally INVALIDATE METADATA
... View more
09-19-2020
12:42 AM
hi @Harish19 there is a solution I'm going to test mentioned in https://kudu.apache.org/docs/administration.html and https://docs.cloudera.com/cdp/latest/data-migration/topics/cdp-data-migration-restoring-kudu-data.html the main idea is to create a backup with spark move it with distcp then restore your backup good luck
... View more
03-24-2020
07:17 AM
1 Kudo
HI Rosa, Sorry I gave up that time because It was an urgent matter , so I just took the short way and used hive to hash my data and put it in a table , where I can run my queries later with Impala , I'll come back for it later for sure since hive is a bit slow while having java based functions. I'd recommend you to try with C language ,it's suitable for impala tho it will work faster . So please If you came up with anything share it with us , otherwise I'll post for sure my solution once it's done . best luck Bilal
... View more
03-11-2020
02:50 AM
hello how about when you change " http://ip-10-189-107-50.eu-west 1.compute.internal" with the IP address/domain of your host that contains the ResourceManager role?
... View more
01-30-2020
08:10 AM
1 Kudo
After some researches I did, it seems that Impala does not support GenericUDFs yet. https://issues.apache.org/jira/browse/IMPALA-7877 https://issues.apache.org/jira/browse/IMPALA-8369 so I'll just try to create my own function for Impala.
... View more
01-29-2020
08:35 AM
Hi all, I'm trying to create a function to use in imapla. my function is simply re-using hive's sha2() function. the creation of the function goes smoothly : create function to_sha2(string,int) returns string location 'user/hive/hive.jar' symbol='org.apache.hadoop.hive.ql.udf.generic.GenericUDFSha2' ;
but when I try to use it doesn't work launching this warning : select to_sha2('test',256);
Query State: EXCEPTION
Query Status: ClassCastException: class org.apache.hadoop.hive.ql.udf.generic.GenericUDFSha2
I have tried to search for UDFSha2 that doesn't contain the Generic word in the hive's jar but I couldn't find it. the original built-in function in hive : sha2(string/binary, len) - Calculates the SHA-2 family of hash functions (SHA-224, SHA-256, SHA-384, and SHA-512). | the other functions are working normally in impala (for example I have tried to create UDF MD5 function from hive's jar and it was working ). so my question is do I have to create my own UDF-Sha2 function? or there is a saving situation for my case, any help will be appreciated Impala version 2.9 Hive : 1.1.0 CDH : 5.12
... View more
Labels:
- Labels:
-
Apache Impala
01-23-2020
10:34 AM
works perfectly , thanks
... View more
08-06-2019
03:22 AM
hi @eMazarakis as far as I know Impala does not support HDFS Impersonation . (security reasons I guess ) means you can't delete hdfs files using hue user cheers !
... View more
08-05-2019
04:34 AM
hi @Harish19 check here : https://community.cloudera.com/t5/forums/forumtopicprintpage/board-id/Questions/message-id/13133/print-single-message/true/page/1
... View more
08-05-2019
03:17 AM
@Amritha as I have checked some Kudu manuals, Chrony is not fully tested with KUDU for network time synchronization. " Kudu releases are only tested with NTP. Other time synchronization providers like Chrony may or may not work." so you can either try to install NTP. or just try this tip I've found : "In order to use chrony for synchronization (for KUDU), chrony.conf must be configured with the rtcsync option" good luck
... View more
08-02-2019
02:35 AM
hello @GopiG , have you tried setting the executor's and the driver's params in spark-defaults.conf ? spark.driver.extraJavaOptions -Duser.timezone=UTC spark.executor.extraJavaOptions -Duser.timezone=UTC you can set the default time zone UTC or any example you want like GMT+8 etc... cheers.
... View more
08-02-2019
02:20 AM
hi @Amritha , when you're running the command ntpq -np , (ntp should be already installed I guess ) what does it give as result? greetings .
... View more
05-06-2019
06:06 AM
hi @MartinP , one way to do it is to build a new cluster in your CentOS platforme , then copy all your data to the new cluster as mentionned here : https://www.cloudera.com/documentation/enterprise/5-5-x/topics/cdh_admin_distcp_data_cluster_migrate.html just pay attention to the CDH distribution source and destination ,if they are supported for the migration or not . cheers,
... View more
04-25-2019
01:40 AM
hi @Harish19 , can you tell us what version of java,spark and CDH you're running ? otherwise try to add spark-assembly-XXX.jar to hdfs (copy from local to hdfs) then add that parameter : spark.yarn.jars hdfs://IP/spark/spark-assembly-XXX.jar to the file spark-defaults.conf and don't forget to restart yarn. I hope this will work for you ,
... View more
04-24-2019
10:07 AM
1 Kudo
hi @DataMike , The Balancer role is normally added (by default) when the HDFS service is installed, so the "Balancer" Resides usally in your nameNode , but to make sure where it's assigned you can check HDFS->Instances then check the role Type , you'll find the role 'balancer' assigned to a host ( usally it's the nameNode ) . for your second question I guess it's better to use the nameNode just to maintain the architecture since we're talking about checking all the other DataNodes,moving blocks...
... View more
04-24-2019
02:27 AM
@bgooley , thank you for your feedback and your clear explanation , in fact the problem was resolved by removing the contents of /var/lib/hadoop-yarn/yarn-nm-recovery/ directory and then the Nodemanager role started successfully. the solution that I've found was from : https://community.cloudera.com/t5/Batch-Processing-and-Workflow/Yarn-NodeManager-fails-to-start-and-crashing-with-SIGBUS/m-p/66590#M3611
... View more
04-23-2019
10:57 AM
hello folks , the nodeManager has suddently stopped in a instance (while stille running for other nodes/intances ). so when I try to start/restart it -via cloudera manager - , an error is shown in the first step : Failed to start role. and I'm using CentOS release 6.10 (Final) please what do you suggest me to look or check in order to resolve this problem ? here's my stdout log : Tue Apr 23 10:18:56 PDT 2019
JAVA_HOME=/usr/java/jdk.1.8.0_144
using /usr/java/jdk.1.8.0_144 as JAVA_HOME
using 5 as CDH_VERSION
using /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/lib/hadoop-yarn as CDH_YARN_HOME
using /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/lib/hadoop-mapreduce as CDH_MR2_HOME
using /var/run/cloudera-scm-agent/process/23960-yarn-NODEMANAGER as CONF_DIR
CONF_DIR=/var/run/cloudera-scm-agent/process/23960-yarn-NODEMANAGER
CMF_CONF_DIR=/etc/cloudera-scm-agent
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGBUS (0x7) at pc=0x00007f8c1fde51a1, pid=3004, tid=0x00007f8c4f44c700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_144-b01) (build 1.8.0_144-b01)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.144-b01 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C [libleveldbjni-64-1-8170950501904951615.8+0x491a1] leveldb::ReadBlock(leveldb::RandomAccessFile*, leveldb::ReadOptions const&, leveldb::BlockHandle const&, leveldb::BlockContents*)+0x191
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
# and this is my log.out error : NodeManager Node Manager health check script is not available or doesn't have execute permission, so not starting the node health script runner. AsyncDispatcher Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher AsyncDispatcher Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher AsyncDispatcher Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.LocalizationEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService AsyncDispatcher Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServicesEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices AsyncDispatcher Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl AsyncDispatcher Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher
... View more
Labels:
- Labels:
-
Apache YARN
02-07-2019
03:55 AM
for my case I was just working on cloudera VM , I had to configure the node ip , => ifconfig LOCAL_NODE_IP eth1:2 netmask XXXX after that pinging to that ip is going well thanks a lot
... View more