Member since
11-14-2016
5
Posts
0
Kudos Received
0
Solutions
03-29-2019
07:29 AM
Thanks, I'll look at it. Unfortunately we have a mandate to use ansible and full automation to the largest extent possible. That's because we need to be able to set up a large variety of configurations to match what our customers use. A good model is my HDFS playbook. It 1. installs the required YUM packages 2. formats the HDFS filesystem 3. adds the standard test users 4. prepares the Kerberos keytab files (tbd) 5. prepares the SSL keystores (tbd) and sets the flags for standard mode. We can then easily turn on Kerberos and/or RPC privacy via plays that modify just a few properties and restart the services. There's an HBase playbook that sets up the HBase servers. It can use HDFS but from the conf files it looks like we could also use a traditional file and do many of our tests without also setting up a full HDFS node. That means it will require fewer resources and can run on a smaller instance or even the dev's laptop. Since it's all yum and ansible anyone can modify the image without needing to learn new tools. TPTB are fine with creating an AMI that only requires updating the crypto material but they want to be able to rebuild the AMI image from the most basic resources. Hmm, I might be able to sell this particular story as an exception. The two use cases are 1) creating new configurations that we don't have a playbook for yet and 2) verifying the configuration files for an arbitrary configuration. This won't be used in the automated tests. (tbd - I know how to do it. The blocker is reaching a consensus on the best way to manage the resources so our applications don't require tweaking the configuration everytime. Do we use a standalone KDC, an integrated solution like FreeIPA, etc.)
... View more
03-29-2019
05:56 AM
Summary Improvement, but oozie still fails to initialize and it's possibly OOM. Investigating that possibility. Details I compared 'rpm -qa' and the contents of the yum repo and explicitly added a few missed packages. I don't think they're related, e.g., most were related to impala, but wanted to eliminate all variables. HDFS format worked. No lucene errors. Cloudera-scm-server did not crash - I'm able to log into the CM dashboard. The warnings seem to be mostly related to the size of the EC2 instance. However there's still a few problems. OOZIE However oozie initialization is still failing. The stack traces are Latest: 2019-03-29 12:29:17,583 ERROR WebServerImpl:com.cloudera.server.web.cmf.TsqueryAutoCompleter: Error getting predicates org.apache.avro.AvroRemoteException: java.net.ConnectException: Connection refused (Connection refused) at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:104) at com.sun.proxy.$Proxy179.getImpalaFilterMetadata(Unknown Source) at com.cloudera.cmf.protocol.firehose.nozzle.TimeoutNozzleIPC.getImpalaFilterMetadata(TimeoutNozzleIPC.java:370) at com.cloudera.server.web.cmf.impala.components.ImpalaDao.fetchFilterMetadata(ImpalaDao.java:837) Before that: 2019-03-29 12:29:07,748 WARN ProcessStalenessDetector-0:com.cloudera.cmf.service.config.components.ProcessStalenessDetector: Encountered exception while performing staleness check java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: Unable to find commissioned ResourceManager in good health at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) 2019-03-29 12:28:28,011 INFO main:org.quartz.core.QuartzScheduler: Scheduler meta-data: Quartz Scheduler (v2.0.2) 'com.cloudera.cmf.scheduler-1' with instanceId 'NON_CLUSTERED' Scheduler class: 'org.quartz.core.QuartzScheduler' - running locally. NOT STARTED. Currently in standby mode. Number of jobs executed: 0 Using thread pool 'org.quartz.simpl.SimpleThreadPool' - with 1 threads. Using job-store 'org.quartz.simpl.RAMJobStore' - which does not support persistence. and is not clustered. The /var/logs/hadoop-yarn directory is empty. Perhaps OOZIE is failing because YARN isn't coming up? HDFS When I start HDFS via CM I get this error: There was an error when communicating with the server. See the server log file, typically /var/log/cloudera-scm-server/cloudera-scm-server.log, for more information. CRASH / RESTART It looks like the server crashed at this point. The logs show that it's trying to restart but failing. I don't see an explanation. I'm going to attribute that to OOM until I can rule that out.
... View more
03-29-2019
05:30 AM
AWS EC2
... View more
03-28-2019
11:09 AM
Short version
cloudera_scm_server is crashing during final configuration. I reach Add Cluster - Configuration - Command Details and fail on Formatting the name durectories of the current Namenode and Creating Oozie Database Tables.
It's a hard failure - the cloudera_scm_server crashes.
I'm probably missing some little detail in my setup but haven't found anything yet in either this community or the usual onlne resources.
cloudera_scm_server.log shows:
2019-03-28 17:04:09,030 INFO SearchRepositoryManager-0:com.cloudera.server.web.cmf.search.components.SearchRepositoryManager: Finished constructing repo:2019-03-28T17:04:09.030Z 2019-03-28 17:04:09,734 WARN scm-web-92:com.cloudera.server.cmf.descriptor.components.DescriptorFactory: Could not generate client configs for service: YARN (MR2 Included)
Caused by: com.cloudera.cmf.service.config.ConfigGenException: Unable to generate config of 'mapreduce.application.framework.path'
and
2019-03-28 17:04:00,173 INFO WebServerImpl:com.cloudera.server.web.cmf.search.LuceneSearchRepository: Directory /var/lib/cloudera-scm-server/search/lucene.en..1553792502675 does not seem to be a Lucene index (no segments.gen). 2019-03-28 17:04:00,173 WARN WebServerImpl:com.cloudera.server.web.cmf.search.components.SearchRepositoryManager: Failed to initialize search dir, deleting it: /var/lib/cloudera-scm-server/search/lucene.en..1553792575457 org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.MMapDirectory@/var/lib/cloudera-scm-server/search/lucene.en..1553792575457 lockFactory=org.apache.lucene.store.NativeFSLockFactory@7d930261: files: [write.lock, _0.fdt, _0.fdx]
System Check Failures
There are a few system check failures. I think I included all required RPMs but I'll double-check - for maintainability I've tried to focus on the top-level RPMs and relied on package dependencies to pull in everything it needs. That avoids problems that could happen between releases if the names of those dependencies change, etc.
Missing resources:
hue plugins
keytrustee_kp and keytrustee_server
mr1
sqoop2
The missing mr1 is esp. suspicious since one of the failure messages refers to YARN.
Background / Ansible-based Installation [CentOS 7]
We need ansible scripts that can quickly bring up and tear down specific configurations for testing purposes - the idea is that our "big" tests can spin up a dedicated instance, run integration tests against it, and then spin down that instance. We want a fresh instance every time so our tests will have good isolation, and our bean counters will be happy since we're not paying for clusters that we aren't using. Ansible fits into our framework nicely and there's been a lot of pressure to automate the process 100% instead of relying on manual installation/configuration of an AMI image that contains a pre-configured system.
I'm most of the way there - I have the ansible plays + roles to:
create an EC2 instance
add the Cloudera YUM repos
install postgresql, java, CM packages, and CDH packages
create the required databases and accounts
tweak the system as required (/etc/hosts, etc.)
launch cloudera_scm_server and cloudera_scm_agent
I can log into the manager, select 'managed' node (my new EC2 instance), select 'packages', and work my way through the installation process to Add Cluster - Configuration - Command Details. Of the 7 steps I successfully complete 5. The ones that fail are Formatting the name directories of the current Namenode and Creating Oozie Database Tables.
My standalone ansible playbook to create a standard single-node HDFS cluster works fine. I don't think I'm missing anything required to format the name directory although it is possible that I commented out a critical step when converting from the standalone playbook to this one.
... View more
Labels: