Member since
11-30-2020
18
Posts
0
Kudos Received
0
Solutions
02-26-2021
11:30 AM
Hello, I am trying to install the cloud manager of the CDP cluster and getting Certificate ssl error during importing the signing GPG key for CDP installation. this is the command I am running sudo rpm --import https://[username]:[password]@archive.cloudera.com/p/cm7/7.2.4/redhat7/yum/ RPM-GPG-KEY-cloudera I tried adding sslverify=false to the /etc/yum.repos.d file.. still no luck is this something to be fixed from server side or shou Error : curl: (60) Peer's certificate issuer has been marked as not trusted by the user. More details here: http://curl.haxx.se/docs/sslcerts.html curl performs SSL certificate verification by default, using a "bundle" of Certificate Authority (CA) public keys (CA certs). If the default bundle file isn't adequate, you can specify an alternate file using the --cacert option. If this HTTPS server uses a certificate signed by a CA represented in the bundle, the certificate verification probably failed due to a problem with the certificate (it might be expired, or the name might not match the domain name in the URL). If you'd like to turn off curl's verification of the certificate, use the -k (or --insecure) option.
... View more
02-22-2021
11:21 AM
Hello All, I have read a lot about MetaData and how it could be useful to us Managing the Data. But everywhere they only talk about how good it is but no where I could find how to manage or collect or do the Metadata storage or format process. We have different Databases and we are planning to do Data Governance for which Metadata is really important. we need to create a standard structure or format for the Data we have in our Databases and need to Ingest into Hadoop cluster or into the Data with the appropriate metadata formed. Is there any videos, references, articles or any opensource tools which will aid in achieving Meta Data collection would be really helpful. Please share the ways to do Metadata collection and manage the data using Metadata. Thank you so much!
... View more
Labels:
02-21-2021
10:06 PM
Thanks for answering it @GangWar Now that we shouldnt remove these packages which will affect cloudera agent mysql-community-client x86_64 5.7.25-1.el7 @mysql57-community 107 M mysql-community-common x86_64 5.7.25-1.el7 @mysql57-community 2.6 M mysql-community-libs x86_64 5.7.25-1.el7 @mysql57-community 9.5 M mysql-community-libs-compat x86_64 5.7.25-1.el7 @mysql57-community 9.2 M To what version should I upgrade this Mysql and if I upgrade the Mysql will the agent take the newly installed package without any issues or need to edit some config file.. Our server team doesnt want us to keep the below Mysql version on the server as its old. mysql -V mysql Ver 14.14 Distrib 5.7.21, for Linux (x86_64) using EditLine wrapper and this is the package details from other server in the cluster which of these packages are used by the cloudera-agent now ?? mysql -V mysql Ver 15.1 Distrib 5.5.68-MariaDB, for Linux (x86_64) using readline 5.1 rpm -qa|grep -i mysql akonadi-mysql-1.9.2-4.el7.x86_64 perl-DBD-MySQL-4.023-6.el7.x86_64 MySQL-python-1.2.5-1.el7.x86_64 qt-mysql-4.8.7-9.el7_9.x86_64
... View more
02-19-2021
11:29 AM
Hello All, we recieved an alert from our server team to upgrade the current version of Postgresql to the latest one. When checked its not the DB server of the cluster having Postgresql as DB but other server of the cluster and I found this package "postgresql-libs-9.2.24-4.el7_8.x86_64" Just curious why this package exisits on only 1 server apart from the main DB server from out of 5 other cluster servers. Can I go ahead and remove this package.. will it affect the cluster in any way ? If i cant remove, how to upgrade this version of postgresql and update the same in the cluster so that the cluster works without any issues. thank you !
... View more
Labels:
02-19-2021
04:57 AM
Hello! One of my cluster servers has this Mysql and its a old version which needs to be upgraded as per server team because of vulnerability issues. I see the server doesnt have any sql service or process running and our cluster uses Postgres , nor I am able to see the status of the mysql on the server root 17280 7440 0 18:23 pts/0 00:00:00 grep --color=auto -i mysql so, when i try to remove the package it says the cloudera-manager-agent is dependent on this mysql, but we never use mysql but only postgres as our hadoop DB . Question: can I remove this or not ? does the CM agent need this mysql or not ? If It is needed and I upgrade the Mysql to latest version how do I change the upgraded config to the cluster settings ? Dependencies Resolved ============================================================================================================================================================================================= Package Arch Version Repository Size ============================================================================================================================================================================================= Removing: mysql-community-client x86_64 5.7.25-1.el7 @mysql57-community 107 M mysql-community-common x86_64 5.7.25-1.el7 @mysql57-community 2.6 M mysql-community-libs x86_64 5.7.25-1.el7 @mysql57-community 9.5 M mysql-community-libs-compat x86_64 5.7.25-1.el7 @mysql57-community 9.2 M Removing for dependencies: MySQL-python x86_64 1.2.5-1.el7 @base 284 k cloudera-manager-agent x86_64 5.14.0-1.cm5140.p0.25.el7 @Cloudera-manager 76 M net-snmp x86_64 1:5.7.2-49.el7_9.1 @centos7_optional_local 850 k net-snmp-agent-libs x86_64 1:5.7.2-49.el7_9.1 @centos7_optional_local 2.1 M perl-DBD-MySQL x86_64 4.023-6.el7 @centos7_online 323 k postfix x86_64 2:2.10.1-9.el7 @32F960A7-7DB1-D008-64D9-307AB478D4B1 12 M qt-mysql x86_64 1:4.8.7-9.el7_9 @B8A1CC78-DAEF-672D-98A2-2EB8FE14490E 74 k Transaction Summary ============================================================================================================================================================================================= Remove 4 Packages (+7 Dependent packages) Installed size: 220 M Is this ok [y/N]: n Exiting on user command
... View more
Labels:
02-09-2021
03:05 AM
Hi All, I am looking for a tool or feature which is free to use on CDP cloudera data platform to manage the Metadata information of the Data we have stored in many of the Databases. I am looking for a way where I can manage, refer, recreated every data on our external databases or Hadoop system even with its Metadata information. For examply, if we add data into one the Databases we should exactly know where we are placing the data based on the stored Metadata. Even if we take some data from the DBs into HDFS we should know where we took it from by looking at the Metadata., Hope I am clear on my needs. please suggest some tools, framework or any CDP components to achieve this.. Thank you !
... View more
Labels:
01-19-2021
05:57 AM
Hey Guys! I am trying to find out the memory used for the last 6 months and I need the data for atleast 6 hours per day so that I can compare the maximum, average etc. But with the charts on Yarn I can see the data with every 6 hours only for last 2 months and if I go before that it changes to Daily on the X axis , not sure why. any thoughts ?? how to view all the data with every 6 hours granularity for all the historic data ? is there any query to view the data for every 6 hours or its just adjustment of the data granularity option ? Thanks
... View more
Labels:
01-19-2021
05:25 AM
@tjangid I am checking the metrics for Memory available for yarn, for the last 2 months data I can see the data with data granularity of every 6 hours but the previous months data shows only weekly in the chart even if I set the data granularity as per 6 hours. I am confused why it is so. any clue ??
... View more
01-13-2021
04:08 AM
Hey Hadoopers, We are using CDH and we are planning to save some space for the future incoming data on HDFS. So, we were thinking if we could compress/Zip old HDFS data so that we could reduce some space. 1. Is Compression or zipping of HDFS files possible ? 2. Can we compress it and store it on local file system or HDFS file system 3. If yes, how to uncompress the data and keep it back to the same HDFS in future with the exact Metadata information or path where it existed before. 4. Will it compress even with the original file path or metadata ? Thanks a ton !
... View more
Labels:
01-10-2021
12:56 AM
Hey guys ! In the YARN cluster metrics URL, I would like to know the duration for which the report is generated. http://IP:8088/ws/v1/cluster/metrics In particular, in below report it says <appsSubmitted> 379 </appsSubmitted> <appsCompleted> 379 </appsCompleted> I would like to know between what duration these jobs/apps were submitted in the yarn ? and why is apps submitted and completed are same ? do Failed apps also come under completed ? as it gives an output saying it as failed ? Also how does the number of containers get determined on Yarn ? I think its based on the size of the job or application. <clusterMetrics> <appsSubmitted> 379 </appsSubmitted> <appsCompleted> 379 </appsCompleted> <appsPending> 0 </appsPending> <appsRunning> 0 </appsRunning> <appsFailed> 0 </appsFailed> <appsKilled> 0 </appsKilled> <reservedMB> 0 </reservedMB> <availableMB> 52224 </availableMB> <allocatedMB> 0 </allocatedMB> <reservedVirtualCores> 0 </reservedVirtualCores> <availableVirtualCores> 48 </availableVirtualCores> <allocatedVirtualCores> 0 </allocatedVirtualCores> <containersAllocated> 0 </containersAllocated> <containersReserved> 0 </containersReserved> <containersPending> 0 </containersPending> <totalMB> 52224 </totalMB> <totalVirtualCores> 48 </totalVirtualCores> <totalNodes> 3 </totalNodes> <lostNodes> 0 </lostNodes> <unhealthyNodes> 0 </unhealthyNodes> <decommissioningNodes> 0 </decommissioningNodes> <decommissionedNodes> 0 </decommissionedNodes> <rebootedNodes> 0 </rebootedNodes> <activeNodes> 3 </activeNodes> </clusterMetrics> Thank you !
... View more
Labels:
01-09-2021
11:25 PM
@akriti Thank you ! Well, then we will continue using express edition as it's not going to be affected. and we aren't going for the subscription. We have already got License of CDP . Hope we continue our testings with our express edition without going for subscription and we are allowed to use it without it being disrupted.
... View more
01-07-2021
04:24 AM
You could do a hdfs dfs -ls /pathname | grep -i (yesterday's date as per the list command)
... View more
01-07-2021
04:04 AM
Thanks for the Information. As the express edition is already downloaded and in use. how will it be affected unless I try to add any new service or add a node in the cluster by trying to download the parcels from Cloudera. But as such , will it affect the existing cluster too or stop working ?
... View more
01-05-2021
03:17 AM
Hi All, We are using CDH Express editions in all our clusters and just worried about the news of Paywall subscriptions starting from Feb 2021. With this paywall subscription thing from Jan 31, will it affect only the Trial and Enterprise edition or will it affect the express edition too ?? Please help on this as days are very few for us to make a plan and move the data. Thank you !
... View more
Labels:
01-05-2021
01:34 AM
I am in the same boat using the 6.3.2 express edition. With this paywall subscription thing from Jan 31, will it affect only the Trial and Enterprise edition or will it affect the express edition too ?? please help on this as days are very few to make a plan
... View more
12-17-2020
11:13 AM
same thing happened to me. I had 2 data nodes and the installation worked in one of the servers and failed with the same error on the wizard and in logs as [17/Dec/2020 23:32:42 +0000] 8744 MainThread heartbeat_tracker INFO HB stats (seconds): num:1 LIFE_MIN:0.02 min:0.02 mean:0.02 max:0.02 LIFE_MAX:0.02 [17/Dec/2020 23:33:49 +0000] 8744 Monitor-HostMonitor throttling_logger ERROR Timed out waiting for worker process collecting filesystem usage to complete. This may occur if the host has an NFS or other remote filesystem that is not responding to requests in a timely fashion. Current nodev filesystems: /dev/shm,/run,/sys/fs/cgroup,/run/user/0,/run/cloudera-scm-agent/process,/run/cloudera-scm-agent/process,/run/user/1032 I checked my /etc/hosts. by mistake it had uppercase. I made everything to lowercase, restarted the agent. Voila ! it got installed and moved to next stage
... View more
12-07-2020
10:46 AM
Hi Iwang, I am facing an issue on my CDH 6.3.2 installation process. I am installing this using the proxy settings on the Cloudera Manager. I tried adding a suggestion of adding the proxy details on the /etc/bashrc. still it gives the same error. This is where it fails.." Failed to install oracle-j2sdk1.8 package. " I am on the " Install Agents" step of the Cluster installation.. this is the complete output of installation screen. I have requested access for the ports 80 and 443 meantime, but using proxy works on the server and only since it has proxy settings updated on CM ..it is taking me to the next installation steps. Not sure why it fails only in this step using the proxy. I just want to know if the cluster installation doesn't support the proxy settings way to install the rest of the cluster components. please help on this. I am stuck.. is there any other way to install the cluster apart from proof of concept. if yes, please share the steps of installation. since our environment is a secure one, Its taking time to get access for the ports 80 and 443. Using proxy I can download things on the server but the auto installation from the cloudera manager doesnt work ************************** /tmp/scm_prepare_node.G4KaTEvn using SSH_CLIENT to get the SCM hostname: 10.127.116.29 33338 22 opening logging file descriptor Starting installation script... Acquiring installation lock... BEGIN flock 4 END (0) Detecting root privileges... effective UID is 1033 BEGIN which pbrun which: no pbrun in (/usr/local/bin:/usr/bin) END (1) BEGIN sudo -S id uid=0(root) gid=0(root) groups=0(root) END (0) Using 'sudo ' to acquire root privileges Detecting distribution... BEGIN grep Tikanga /etc/redhat-release END (1) BEGIN grep 'Scientific Linux release 5' /etc/redhat-release END (1) BEGIN grep Santiago /etc/redhat-release END (1) BEGIN grep 'CentOS Linux release 6' /etc/redhat-release END (1) BEGIN grep 'CentOS release 6' /etc/redhat-release END (1) BEGIN grep 'Scientific Linux release 6' /etc/redhat-release END (1) BEGIN grep Maipo /etc/redhat-release END (1) BEGIN grep 'CentOS Linux release 7' /etc/redhat-release CentOS Linux release 7.9.2009 (Core) END (0) /etc/redhat-release ==> CentOS 7 Detecting Cloudera Manager Server... BEGIN host -t PTR 10.127.116.29 Host 29.116.127.10.in-addr.arpa. not found: 3(NXDOMAIN) END (1) BEGIN which python /usr/bin/python END (0) BEGIN python -c 'import socket; import sys; s = socket.socket(socket.AF_INET); s.settimeout(5.0); s.connect((sys.argv[1], int(sys.argv[2]))); s.close();' 10.127.116.29 7182 END (0) BEGIN which wget END (0) /usr/bin/wget BEGIN wget -qO- -T 1 -t 1 http://169.254.169.254/latest/meta-data/public-hostname && /bin/echo END (4) Installing package repositories... Checking https://archive.cloudera.com/cm6/6.3.1/redhat7/yum/repodata/repomd.xml Checking https://archive.cloudera.com/cm6/6.3.1/repodata/repomd.xml Using installing repository file /tmp/scm_prepare_node.G4KaTEvn/repos/rhel7/cloudera-manager.repo repository file /tmp/scm_prepare_node.G4KaTEvn/repos/rhel7/cloudera-manager.repo installed installing rpm keys BEGIN gpg --import /tmp/scm_prepare_node.G4KaTEvn/customGPG gpg: keyring `/tmp/scm_prepare_node.G4KaTEvn/gnupg.KcI2jBadtW/secring.gpg' created gpg: keyring `/tmp/scm_prepare_node.G4KaTEvn/gnupg.KcI2jBadtW/pubring.gpg' created gpg: /tmp/scm_prepare_node.G4KaTEvn/gnupg.KcI2jBadtW/trustdb.gpg: trustdb created gpg: key 02A818DD: public key "Cloudera Apt Repository" imported gpg: key E8F86ACD: public key "Yum Maintainer <webmaster@cloudera.com>" imported gpg: key B0B19C9F: public key "Parameterized Build <security@cloudera.com>" imported gpg: key 84415700: public key "Cloudera <security@cloudera.com>" imported gpg: key 36F57F35: public key "Cloudera <security@cloudera.com>" imported gpg: Total number processed: 5 gpg: imported: 5 (RSA: 3) END (0) BEGIN sudo rpm --import /tmp/scm_prepare_node.G4KaTEvn/F36A89E33CC1BD0F71079007327574EE02A818DD.pub END (0) BEGIN sudo rpm --import /tmp/scm_prepare_node.G4KaTEvn/5F14D39EF0681ACA6F044A43F90C0D8FE8F86ACD.pub END (0) BEGIN sudo rpm --import /tmp/scm_prepare_node.G4KaTEvn/9543951160C284C0E7CA254573985D43B0B19C9F.pub END (0) BEGIN sudo rpm --import /tmp/scm_prepare_node.G4KaTEvn/CECDB80C4E9004B0CFE852962279662784415700.pub END (0) BEGIN sudo rpm --import /tmp/scm_prepare_node.G4KaTEvn/DF2C4DD7629B1AC08A0966E00F65552736F57F35.pub END (0) Refreshing package metadata... BEGIN sudo yum --disablerepo=* --enablerepo=cloudera* clean all Loaded plugins: fastestmirror Cleaning repos: cloudera-manager END (0) BEGIN sudo rm -Rf /var/cache/yum/x86_64 END (0) BEGIN sudo yum --disablerepo=* --enablerepo=cloudera* makecache Loaded plugins: fastestmirror Determining fastest mirrors One of the configured repositories failed (Unknown), and yum doesn't have enough cached data to continue. At this point the only safe thing yum can do is fail. There are a few ways to work "fix" this: 1. Contact the upstream for the repository and get them to fix the problem. 2. Reconfigure the baseurl/etc. for the repository, to point to a working upstream. This is most often useful if you are using a newer distribution release than is supported by the repository (and the packages for the previous distribution release still work). 3. Run the command with the repository temporarily disabled yum --disablerepo=<repoid> ... 4. Disable the repository permanently, so yum won't use it by default. Yum will then just ignore the repository until you permanently enable it again or use --enablerepo for temporary usage: yum-config-manager --disable <repoid> or subscription-manager repos --disable=<repoid> 5. Configure the failing repository to be skipped, if it is unavailable. Note that yum will try to contact the repo. when it runs most commands, so will have to try and fail each time (and thus. yum will be be much slower). If it is a very temporary problem though, this is often a nice compromise: yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true Cannot find a valid baseurl for repo: cloudera-manager END (1) Installing oracle-j2sdk1.8 package... BEGIN sudo yum --disablerepo=* --enablerepo=cloudera* list installed oracle-j2sdk1.8 Loaded plugins: fastestmirror Installed Packages oracle-j2sdk1.8.x86_64 1.8.0+update181-1 @Cloudera-manager END (0) BEGIN echo oracle-j2sdk1.8 cloudera-manager-agent cloudera-manager-daemons | grep oracle-j2sdk1.8 oracle-j2sdk1.8 cloudera-manager-agent cloudera-manager-daemons END (0) BEGIN sudo yum --disablerepo=* --enablerepo=cloudera* info oracle-j2sdk1.8 Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile One of the configured repositories failed (Unknown), and yum doesn't have enough cached data to continue. At this point the only safe thing yum can do is fail. There are a few ways to work "fix" this: 1. Contact the upstream for the repository and get them to fix the problem. 2. Reconfigure the baseurl/etc. for the repository, to point to a working upstream. This is most often useful if you are using a newer distribution release than is supported by the repository (and the packages for the previous distribution release still work). 3. Run the command with the repository temporarily disabled yum --disablerepo=<repoid> ... 4. Disable the repository permanently, so yum won't use it by default. Yum will then just ignore the repository until you permanently enable it again or use --enablerepo for temporary usage: yum-config-manager --disable <repoid> or subscription-manager repos --disable=<repoid> 5. Configure the failing repository to be skipped, if it is unavailable. Note that yum will try to contact the repo. when it runs most commands, so will have to try and fail each time (and thus. yum will be be much slower). If it is a very temporary problem though, this is often a nice compromise: yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true Cannot find a valid baseurl for repo: cloudera-manager END (1) remote package oracle-j2sdk1.8 is not available, giving up waiting for rollback request detected rollback request rolling back installation Reverting changes... rollback started Removing package repositories... ls: cannot access /etc/yum.repos.d/cloudera-manager.repo.~*~: No such file or directory repository file /etc/yum.repos.d/cloudera-manager.repo removed Cleaning the package manager cache... BEGIN sudo yum --disablerepo=* --enablerepo=cloudera* clean all Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile There are no enabled repos. Run "yum repolist all" to see the repos you have. To enable Red Hat Subscription Management repositories: subscription-manager repos --enable <repo> To enable custom repositories: yum-config-manager --enable <repo> END (1) BEGIN sudo rm -Rf /var/cache/yum/x86_64 END (0) Uninstalled. rollback completed
... View more
Labels:
11-30-2020
01:24 AM
Hello All ! I have a question after enabling Kerberos on my CDH 5.10 Cluster. I am using the Cloudera express edition. Our applications used to access HDFS using namenode URL http://IP:50070 but after kerberos the Namenode URL is authenticated. Even with a proper TGT or a valid kerberos prinicipal I am not able to access the Namenode URL nor using Curl command. I am getting authentication error. With a proper Kinit of a user , Kerberos is perfectly working fine listing hdfs files only for users or Prinicpals that are in the KDC and denying access for all other users. My Question : how do I make external programs or applications or users access HDFS files in a kerberized cluster with a defined user on KDC by using Namenode URL . Please help to understand if hdfs files can be still accessed using Namenode URL after kerberization or there is another way for applications to access the HDFS files externally. The basic idea is to work with HDFS files externally on a Kerberized Cluster awaiting solutions. I tried everything possible out there ,not sure what is missing Thank you !
... View more
Labels: