Support Questions

Find answers, ask questions, and share your expertise

Cannot start mgmt services after upgrading from 5.10.1 to 5.13.0

New Contributor

 

The management services will not start and all throw this Exeception in there role logs:

 

3:37:16.695 PM	WARN	BasicScmProxy	
Exception while getting fetch scmDescriptor hash: none
com.cloudera.enterprise.JsonUtil$JsonRuntimeException: org.codehaus.jackson.JsonParseException: Unexpected character ('<' (code 60)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: java.io.StringReader@2462cb01; line: 1, column: 2]
	at com.cloudera.enterprise.JsonUtil.valueFromString(JsonUtil.java:161)
	at com.cloudera.cmf.BasicScmProxy.fetchFragmentAndHash(BasicScmProxy.java:701)
	at com.cloudera.cmf.BasicScmProxy.access$1000(BasicScmProxy.java:53)
	at com.cloudera.cmf.BasicScmProxy$9.call(BasicScmProxy.java:664)
	at com.cloudera.cmf.BasicScmProxy$9.call(BasicScmProxy.java:650)
	at com.cloudera.cmf.BasicScmProxy.fetch(BasicScmProxy.java:555)
	at com.cloudera.cmf.BasicScmProxy.getFragmentAndHash(BasicScmProxy.java:650)
	at com.cloudera.cmf.DescriptorAndFragments.newDescriptorAndFragments(DescriptorAndFragments.java:64)
	at com.cloudera.cmon.firehose.Main.main(Main.java:379)
Caused by: org.codehaus.jackson.JsonParseException: Unexpected character ('<' (code 60)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: java.io.StringReader@2462cb01; line: 1, column: 2]
	at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1292)
	at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
	at org.codehaus.jackson.impl.JsonParserMinimalBase._reportUnexpectedChar(JsonParserMinimalBase.java:306)
	at org.codehaus.jackson.impl.ReaderBasedParser._handleUnexpectedValue(ReaderBasedParser.java:630)
	at org.codehaus.jackson.impl.ReaderBasedParser.nextToken(ReaderBasedParser.java:364)
	at org.codehaus.jackson.map.ObjectMapper._initForReading(ObjectMapper.java:2439)
	at org.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:2396)
	at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1602)
	at com.cloudera.enterprise.JsonUtil.valueFromStringUnsafe(JsonUtil.java:185)
	at com.cloudera.enterprise.JsonUtil.valueFromString(JsonUtil.java:159)
	... 8 more
3:37:16.705 PM	WARN	Main	
No descriptor fetched from http://scm-server:7180 on after 1 tries, sleeping for 2 secs

Looking at the cloudera scm server logs:  I see this exception, which might be related, but I cannot tell. 

 

 

2017-10-17 15:37:15,934 INFO 2059105901@scm-web-7:com.cloudera.server.web.common.ExceptionReport: Exception report generated accessing http://scm-server:7180/cmf/descriptor/fragment.json
java.lang.IllegalArgumentException
        at com.google.common.base.Preconditions.checkArgument(Preconditions.java:76)
        at com.cloudera.cmf.descriptors.ScmDescriptorFragment.addConfig(ScmDescriptorFragment.java:287)
        at com.cloudera.cmf.descriptors.ScmDescriptorFragment.addConfig(ScmDescriptorFragment.java:277)
        at com.cloudera.server.cmf.descriptor.components.DescriptorFactory.addConfigToDescriptor(DescriptorFactory.java:491)
        at com.cloudera.server.cmf.descriptor.components.DescriptorFactory.generateScmFragment(DescriptorFactory.java:623)
        at com.cloudera.server.cmf.descriptor.components.DescriptorFactory.getDescriptorFragment(DescriptorFactory.java:529)
        at com.cloudera.server.cmf.descriptor.DescriptorFragmentsCache$FragmentCache.generateFragment(DescriptorFragmentsCache.java:207)
        at com.cloudera.server.cmf.descriptor.DescriptorFragmentsCache$FragmentCache.getFragmentJson(DescriptorFragmentsCache.java:180)
        at com.cloudera.server.cmf.descriptor.DescriptorFragmentsCache.getFragmentJson(DescriptorFragmentsCache.java:314)
        at com.cloudera.server.web.cmf.DescriptorController.getFragmentJsonFromCache(DescriptorController.java:218)
        at com.cloudera.server.web.cmf.DescriptorController.getFragmentJson(DescriptorController.java:198)

 

5 REPLIES 5

New Contributor

Could use some help from someone at Cloudera who access to the source code here.  I'm definitely getting an error with this URL: 

http://scm-server:7180/cmf/descriptor/fragment.json?fragmentName=scmDescriptor

    (as seen in the above stacktrace).   It throws a server error which is causing the JSON response to be invalid.

 

http://scm-server:7180/cmf/descriptor/fragment.json?fragmentName=configDefaults

works fine. 

 

 

Explorer

Did you have any luck with this? We're hitting this as well, but only on one of ~10 managers we have.

New Contributor

We are running into this issue on one of our production clusters.  Any pointers from the OP or Cloudera would be greatly appreciated!

Super Collaborator

Would you mind using the following code to check whether you're hitting a know issue.

 

Copy the contents below into a filename OPSAPS-36374.py and $ python OPSAPS-36374.py -j deployment.json

 

 
# deployment.json can be generated by fetching the CM API deployment endpoint # see https://cloudera.github.io/cm_api/apidocs/v12/path__cm_deployment.html # Example command: # curl -s -x GET -u replace_with_cm_admin_user_here:cm_password http://CM-SERVER-HOST:PORT/api/v12/cm/deployment -o deployment.json


 

#!/usr/bin/env python
"""
Purpose: Find gateway roles with configuration,
to validate if user is affected by OPSAPS-36374
Author: Michalis
"""
import json
import os
import sys, getopt

def main(argv):
  json_file = ''
  # deployment.json can be generated by fetching the CM API deployment endpoint 
  # see https://cloudera.github.io/cm_api/apidocs/v12/path__cm_deployment.html 
  # Example command: 
  # curl -s -x GET -u replace_with_cm_admin_user_here:cm_password http://CM-SERVER-HOST:PORT/api/v12/cm/deployment -o deployment.json
  HELP = '%s -j <deployment.json>' % os.path.basename(__file__)
  try:
      opts, args = getopt.getopt(argv,"hj:",["json="])
  except getopt.GetoptError:
      print HELP
      sys.exit(2)
  for opt, arg in opts:
    if opt == '-h':
       print HELP
       sys.exit()
    elif opt in ("-j", "--json"):
       json_file = arg

  if json_file:
    data = json.load(open(json_file))
    cluster = [cluster_data for cluster_data in data['clusters']][0]
    services = [service for service in cluster['services']]
    hosts = [host_data for host_data in data['hosts']]

    for service in services:
      for role in service['roles']:
        if 'GATEWAY' in role['type'] and role['config']['items']:
          host = [host for host in hosts if host['hostId'] == role['hostRef']['hostId']][0]
          role_kv = role['config']['items'][0]
          print "Service: [%s] contains role type [%s] - configuration [key: %s - value: %s]" % (service['type'], role['type'], role_kv['name'], role_kv['value'])
          print "Affected instance hostname: %s // ipAddress: %s" % (host['hostname'], host['ipAddress'])
  else:
    print HELP

if __name__ == "__main__":
   main(sys.argv[1:])

 

 

 

Explorer

Hi, Michalis

I hit the same situation with non residential gateway host.

In my case, the script told,

Service: [HDFS] contains role type [GATEWAY] - configuration [key: role_config_suppression_cdh_version_validator - value: true] 
Affected instance hostname: cdh-host-x // ipAddress: 10.x.x.x 
Service: [YARN] contains role type [GATEWAY] - configuration [key: role_config_suppression_cdh_version_validator - value: true] 
Affected instance hostname: cdh-host-x // ipAddress: 10.x.x.x 
Service: [HBASE] contains role type [GATEWAY] - configuration [key: role_config_suppression_cdh_version_validator - value: true] 
Affected instance hostname: cdh-host-x // ipAddress: 10.x.x.x 
Service: [HIVE] contains role type [GATEWAY] - configuration [key: role_config_suppression_cdh_version_validator - value: true] 
Affected instance hostname: cdh-host-x // ipAddress: 10.x.x.x 
Service: [SPARK2_ON_YARN] contains role type [GATEWAY] - configuration [key: role_config_suppression_cdh_version_validator - value: true] 
Affected instance hostname: cdh-host-x // ipAddress: 10.x.x.x 


I once removed the host mentioned from the cluster and cloudera manager.  
At this point, mgmt service began working as ordinary.

Then, re-added and applied host template to deploy cdh components again.


After fixing package deployment, I ran the script above again that showed no list 🙂
Thanks a lot.