Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Failure while deploying cluster

Failure while deploying cluster

I am testing deploying an HDF cluster with Ambari. For my test I have two nodes, master and ingest-1. Both boxes are running Ubuntu 16.04. I went through the Ambari wizard for creating a new cluster and got to the step labeled Install, Start and Test. The first step in the install is DRPC Server Install where it failed.

 

stderr gave has the following.

2019-08-26 16:12:49,573 - The 'storm-client' component did not advertise a version. This may indicate a problem with the component packaging.
Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/stacks/HDF/3.3/services/STORM/package/scripts/drpc_server.py", line 86, in <module>
    DrpcServer().execute()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/stacks/HDF/3.3/services/STORM/package/scripts/drpc_server.py", line 39, in install
    self.install_packages(env)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 839, in install_packages
    name = self.format_package_name(package['name'])
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 562, in format_package_name
    return self.get_package_from_available(name)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 529, in get_package_from_available
    raise Fail("No package found for {0}(expected name: {1})".format(name, name_with_version))
resource_management.core.exceptions.Fail: No package found for storm-${stack_version}(expected name: storm-3-4-1-1)

 

 The last relevant lines of the logs are as follows.

2019-08-26 16:12:38,484 - Writing File['/etc/apt/sources.list.d/ambari-hdf-3.list'] because it doesn't exist
2019-08-26 16:12:38,484 - call[['apt-get', 'update', '-qq', '-o', u'Dir::Etc::sourcelist=sources.list.d/ambari-hdf-3.list', '-o', 'Dir::Etc::sourceparts=-', '-o', 'APT::Get::List-Cleanup=0']] {'sudo': True, 'quiet': False}
2019-08-26 16:12:38,812 - call returned (0, '')
2019-08-26 16:12:38,813 - Package['unzip'] {'retry_on_repo_unavailability': False, 'retry_count': 5}
2019-08-26 16:12:38,853 - Installing package unzip ('/usr/bin/apt-get -o Dpkg::Options::=--force-confdef --allow-unauthenticated --assume-yes install unzip')
2019-08-26 16:12:42,762 - Package['curl'] {'retry_on_repo_unavailability': False, 'retry_count': 5}
2019-08-26 16:12:42,796 - Skipping installation of existing package curl
2019-08-26 16:12:42,796 - Package['hdf-select'] {'retry_on_repo_unavailability': False, 'retry_count': 5}
2019-08-26 16:12:42,863 - Installing package hdf-select ('/usr/bin/apt-get -o Dpkg::Options::=--force-confdef --allow-unauthenticated --assume-yes install hdf-select')
2019-08-26 16:12:45,195 - call[('ambari-python-wrap', u'/usr/bin/hdf-select', 'versions')] {}
2019-08-26 16:12:45,213 - call returned (1, 'Traceback (most recent call last):\n  File "/usr/bin/hdf-select", line 406, in <module>\n    printVersions()\n  File "/usr/bin/hdf-select", line 251, in printVersions\n    for f in os.listdir(root):\nOSError: [Errno 2] No such file or directory: \'/usr/hdf\'')
2019-08-26 16:12:45,404 - Command repositories: HDF-3.4-repo-3, HDP-UTILS-1.1.0.22-repo-3
2019-08-26 16:12:45,404 - Applicable repositories: HDF-3.4-repo-3, HDP-UTILS-1.1.0.22-repo-3
2019-08-26 16:12:49,429 - Looking for matching packages in the following repositories: https:__public-repo-1.hortonworks.com_HDF_ubuntu16_3.x_updates_3.4.1.1, https:__public-repo-1.hortonworks.com_HDP-UTILS-1.1.0.22_repos_ubuntu16
2019-08-26 16:12:49,555 - call[('ambari-python-wrap', u'/usr/bin/hdf-select', 'versions')] {}
2019-08-26 16:12:49,572 - call returned (1, 'Traceback (most recent call last):\n  File "/usr/bin/hdf-select", line 406, in <module>\n    printVersions()\n  File "/usr/bin/hdf-select", line 251, in printVersions\n    for f in os.listdir(root):\nOSError: [Errno 2] No such file or directory: \'/usr/hdf\'')
2019-08-26 16:12:49,573 - The 'storm-client' component did not advertise a version. This may indicate a problem with the component packaging.

Command failed after 1 tries

As the message indicates the directory /usr/hdf does not exist on ingest-1.

⟫ ls /usr/
bin games include lib local sbin share src

The error can easily be reproduced on the command line for ingest-1.

⟫ hdf-select versions
Traceback (most recent call last):
  File "/usr/bin/hdf-select", line 406, in 
    printVersions()
  File "/usr/bin/hdf-select", line 251, in printVersions
    for f in os.listdir(root):
OSError: [Errno 2] No such file or directory: '/usr/hdf'

 

Both of these machines are brand new standard installations of Ubuntu 16.04. I have only installed the packages and made the recommended changes as outlined in the documentation.

7 REPLIES 7
Highlighted

Re: Failure while deploying cluster

Cloudera Employee

@maxolasersquad - I can see you are getting below error while installation:

 

resource_management.core.exceptions.Fail: No package found for storm-${stack_version}(expected name: storm-3-4-1-1)

 

Can you please edit the script.py  file under /usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py on all the host and comment the line 541 as the following:

541 # package_version = None 
542 if (package_version is None or '-' not in package_version) and default('/repositoryFile', None): 
543 self.load_available_packages() 
544 package_name = self.get_package_from_available(name, self.available_packages_in_repos) 
545 if package_name is None: 
546 raise Fail("Cannot match package for regexp name {0}. Available packages: {1}".format(name, self.available_packages_in_repos)) 
547 return package_name

 

After this restart the ambari-agent once.

Re: Failure while deploying cluster

@ngarg The script at /usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py is different than your example. Starting at line 533 this is what the script looks like.

  def format_package_name(self, name):
    from resource_management.libraries.functions.default import default
    """
    This function replaces ${stack_version} placeholder with actual version.  If the package
    version is passed from the server, use that as an absolute truth.

    :param name name of the package
    :param repo_version actual version of the repo currently installing
    """
    if not STACK_VERSION_PLACEHOLDER in name:
      return name

    stack_version_package_formatted = ""

    package_delimiter = '-' if OSCheck.is_ubuntu_family() else '_'

    # repositoryFile is the truth
    # package_version should be made to the form W_X_Y_Z_nnnn
    package_version = default("repositoryFile/repoVersion", None)

    # TODO remove legacy checks
    if package_version is None:
      package_version = default("roleParams/package_version", None)

    # TODO remove legacy checks
    if package_version is None:
      package_version = default("hostLevelParams/package_version", None)

    if (package_version is None or '-' not in package_version) and default('/repositoryFile', None):
      return self.get_package_from_available(name)

    if package_version is not None:
      package_version = package_version.replace('.', package_delimiter).replace('-', package_delimiter)

@jsensharma  The /usr/hdf directory is definitely not present, which looks to be the source of the error I am encountering. I'm not sure at what step in the process that directory is supposed to be created and populated with the correct files, but I am following the steps of the installation guide, or I am accidently overlooking a critical step and have not been able to find where I am making an error.

 

The ambari agent does run as the root user.

 

Package installation with apt works fine.

Re: Failure while deploying cluster

Super Mentor

@maxolasersquad 

As we see the error:

 

2019-08-26 16:12:45,213 - call returned (1, 'Traceback (most recent call last):\n File "/usr/bin/hdf-select", line 406, in <module>\n printVersions()\n File "/usr/bin/hdf-select", line 251, in printVersions\n for f in os.listdir(root):\nOSError: [Errno 2] No such file or directory: \'/usr/hdf\'')

 

Which indicates that the "hdf-select" command is not able to determine the HDF version. Usually this command line utility looks inside the "/usr/hdf" directory and then finds the installed versions of HDF.

 

Most possible cause could be, either "/usr/hdf/3.4.1.1-4" directory is not present or not readable hence during installation of storm client package on that host it is failing to determine the package version.

Example:

 

# ls -l /usr/hdf/
drwxr-xr-x. 5 root root 49 Aug 26 23:30 3.4.1.1-4
drwxr-xr-x. 2 root root 83 Aug 26 23:30 current

# ls -l /usr/hdf/current/
lrwxrwxrwx. 1 root root 31 Aug 2 23:30 nifi -> /usr/hdf/3.4.1.1-4/nifi
lrwxrwxrwx. 1 root root 31 Aug 2 23:30 nifi-registry -> /usr/hdf/3.4.1.1-4/nifi-registry
lrwxrwxrwx. 1 root root 31 Aug 26 23:30 nifi-toolkit -> /usr/hdf/3.4.1.1-4/nifi-toolkit

 

Also can you please confirm if the ambari agent is running as root user or non root user?

So it looks like due to some reason no package is installed on the problematic host. Are you able to install any package manually on that host?


Can you please try the following command to isolate any package installation issue ... may be the repo is not accessible ?

 

# apt-get install nifi

 

 

Then verify if the "/usr/hdf/3.4.1.1-4" directories are created or not?

Re: Failure while deploying cluster

After manually installing nifi with apt-get install nifi the /usr/hdf directory now exists.

⟫ ls /usr/hdf
3.4.1.1-4  current

The installation is still failing.

stderr:

2019-08-27 12:31:45,740 - The 'storm-client' component did not advertise a version. This may indicate a problem with the component packaging. However, the stack-select tool was able to report a single version installed (3.4.1.1-4). This is the version that will be reported.
2019-08-27 12:31:50,687 - The 'storm-client' component did not advertise a version. This may indicate a problem with the component packaging. However, the stack-select tool was able to report a single version installed (3.4.1.1-4). This is the version that will be reported.
Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/stacks/HDF/3.3/services/STORM/package/scripts/drpc_server.py", line 86, in 
    DrpcServer().execute()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/stacks/HDF/3.3/services/STORM/package/scripts/drpc_server.py", line 39, in install
    self.install_packages(env)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 839, in install_packages
    name = self.format_package_name(package['name'])
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 562, in format_package_name
    return self.get_package_from_available(name)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 529, in get_package_from_available
    raise Fail("No package found for {0}(expected name: {1})".format(name, name_with_version))
resource_management.core.exceptions.Fail: No package found for storm-${stack_version}(expected name: storm-3-4-1-1)

And the logs

2019-08-27 12:31:45,635 - Package['hdf-select'] {'retry_on_repo_unavailability': False, 'retry_count': 5}
2019-08-27 12:31:45,669 - Skipping installation of existing package hdf-select
2019-08-27 12:31:45,723 - call[('ambari-python-wrap', u'/usr/bin/hdf-select', 'versions')] {}
2019-08-27 12:31:45,740 - call returned (0, '3.4.1.1-4')
2019-08-27 12:31:45,740 - The 'storm-client' component did not advertise a version. This may indicate a problem with the component packaging. However, the stack-select tool was able to report a single version installed (3.4.1.1-4). This is the version that will be reported.
2019-08-27 12:31:45,909 - Command repositories: HDF-3.4-repo-3, HDP-UTILS-1.1.0.22-repo-3
2019-08-27 12:31:45,909 - Applicable repositories: HDF-3.4-repo-3, HDP-UTILS-1.1.0.22-repo-3
2019-08-27 12:31:50,543 - Looking for matching packages in the following repositories: https:__public-repo-1.hortonworks.com_HDF_ubuntu16_3.x_updates_3.4.1.1, https:__public-repo-1.hortonworks.com_HDP-UTILS-1.1.0.22_repos_ubuntu16
2019-08-27 12:31:50,670 - call[('ambari-python-wrap', u'/usr/bin/hdf-select', 'versions')] {}
2019-08-27 12:31:50,687 - call returned (0, '3.4.1.1-4')
2019-08-27 12:31:50,687 - The 'storm-client' component did not advertise a version. This may indicate a problem with the component packaging. However, the stack-select tool was able to report a single version installed (3.4.1.1-4). This is the version that will be reported.

Command failed after 1 tries

I can see the packages are available in apt.

⟫ sudo apt-cache search storm
storm - storm is a virtual package that brings storm-3-4-1-1-4 as a dependency.
storm-3-4-1-1-4 - Storm is a distributed, fault-tolerant, and

think it is trying to install storm-3-4-1-1 instead of storm-3-4-1-1-4, but I'm not too sure on that.

Re: Failure while deploying cluster

I restarted the Ambari wizard and the installation worked. So my question is, from a fresh installation, after the Ambari wizard initializes the host, what do I need to manually install the nifi package in order for the rest of the process to work properly? Am I missing a step?

Re: Failure while deploying cluster

Super Mentor

@maxolasersquad 
It looks like due to some reason the package installation failed earlier and hence the "/usr/hdf" directory was missing in your case.  So when you attempted to install Nifi package manually on this host then re initiating the wizard might have progressed well.

 

In order o find out why the initial package installation failed we might need to check the "/var/log/yum.log" and ambari agent log during the time of the issue.  ...etc

Re: Failure while deploying cluster

I don't have a good explanation about what I changed, but now when I reset the box to a fresh install of Ubuntu 16.04 and rerun my Ansible script it all works. Thank you for your assistance with this.