Member since
07-03-2017
8
Posts
1
Kudos Received
0
Solutions
02-27-2018
09:13 AM
I do need security, we have Kerberos implemented for HBase. Are you saying using the HBase Rest API if we need a lot of tables?
... View more
02-23-2018
11:32 AM
Hi guys, Looking for some advice/guidance on designing an architecture solution for storing data in HBase. Our current flow is this. NiFi -> Kafka -> Storm -> HBase. This is working as expected, but as we receive more requirements, we have a need to be more flexible. Our HBase store is now going to be used to store a lot more information from different aspects of our company, requiring more HBase tables as we receive new requirements. I was looking into ways of designing a solution where we can create a generic Storm topology, which would take in the table name and other data from Kafka at run time, allowing us to dynamically pass in any data/table/column family. Our Storm topology main responsibility would then be to simply parse the input, and write to the table name it received as part of the Tuple message. However, I believe this is not advised as the HBase Bolt requires you to the pass in the table name in the prepare() method, which would not allow for this flexible solution I am after. Anyone have any other tools/ideas for this? Currently we would have to have a HBase topology, and any time we added a new table to HBase, update that topology with a new HBase Bolt. This is not the end of the world, and probably what we will go with if we don't find another way of doing it, but just seeing what else is out there. Some requirements we are hoping to achieve: 1. A single point of entry to write to HBase. Means only one component needs maintenance/updating when versions need to be updated. Also provides other benefits (easier authorization to write data, audits etc.) 2. Separating data into 2 streams: a. Raw data that needs be simply archived in HBase, no processing required on the data b. Data that needs to go through some form of processing. We will be using Spark for a lot of this. The processed data would then be stored to HBase by the same archival solution. I have looked into using NiFi, but I would prefer to use NiFi simply as our data ingestion/routing/model transformation tool, and keep writing data separate to another tool. NiFi could become unmanageable, as we added more and more tables and process groups. Spark might do it but it just seems overkill. Any other guidance?
... View more
Labels:
08-08-2017
03:00 PM
1 Kudo
Thanks Juan, I have made the changes you suggested and now Registry service is starting without issue. For clarity, I edited the file /usr/hdf/current/registry/bootstrap/bootstrap-storage.sh I updated the following lines with the proxy information, as suggested by Juan, by adding in the values for -Dhttps.proxyHost=proxy_name -Dhttps.proxyPort=xxxx Example: function dropTables {
${JAVA} -Dbootstrap.dir=$BOOTSTRAP_DIR -Dhttps.proxyHost=<YOUR_PROXY_HOST> -Dhttps.proxyPort=<YOUR_PROXY_PORT> -cp ${CLASSPATH} ${TABLE_INITIALIZER_MAIN_CLASS} -m ${MYSQL_JAR_URL_PATH} -c ${CONFIG_FILE_PATH} -s ${SCRIPT_ROOT_DIR} --drop
}
function createTables {
${JAVA} -Dbootstrap.dir=$BOOTSTRAP_DIR -Dhttps.proxyHost=<YOUR_PROXY_HOST> -Dhttps.proxyPort=<YOUR_PROXY_PORT> -cp ${CLASSPATH} ${TABLE_INITIALIZER_MAIN_CLASS} -m ${MYSQL_JAR_URL_PATH} -c ${CONFIG_FILE_PATH} -s ${SCRIPT_ROOT_DIR} --create
}
function checkStorageConnection {
${JAVA} -Dbootstrap.dir=$BOOTSTRAP_DIR -Dhttps.proxyHost=<YOUR_PROXY_HOST> -Dhttps.proxyPort=<YOUR_PROXY_PORT> -cp ${CLASSPATH} ${TABLE_INITIALIZER_MAIN_CLASS} -m ${MYSQL_JAR_URL_PATH} -c ${CONFIG_FILE_PATH} -s ${SCRIPT_ROOT_DIR} --check-connection
}
Hope this helps someone in the future
... View more
08-08-2017
10:56 AM
Hi Juan, Thanks for sharing. Where exactly did you add this proxy information?
... View more
07-05-2017
04:59 PM
Sorry I should have said that, I added that in as well, but no luck
... View more
07-05-2017
04:15 PM
Unfortunately, no luck either with that. Still the same error as above
... View more
07-05-2017
02:12 PM
Hi Jay, Full error log here from /var/lib/ambari-agent/data/errors-156.txt Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/REGISTRY/0.3.0/package/scripts/registry_server.py", line 120, in <module>
RegistryServer().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 329, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/REGISTRY/0.3.0/package/scripts/registry_server.py", line 66, in start
user="root")
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'source /usr/hdf/current/registry/conf/registry-env.sh ; /usr/hdf/current/registry/bootstrap/bootstrap-storage.sh' returned 1. Using Configuration file: /usr/hdf/current/registry/bootstrap/../conf/registry.yaml
Downloading mysql jar from url: https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.40.zip
Downloading file mysql-connector-java-5.1.40.zip into /tmp
Failed to download the mysql driver from https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.40.zip
Error occurred while downloading MySQL jar. bootstrap dir: /usr/hdf/current/registry/bootstrap I also tried the wget request and it failed. I then updated the wget config file to use a http proxy and https proxy and the wget download then succeeded. However the issue above still persists when trying to start the registry service through Ambari. For completeness here is the log from /var/lib/ambari-agent/data/output-202.txt 2017-07-05 10:05:03,512 - Group['hadoop'] {}
2017-07-05 10:05:03,515 - Group['nifi'] {}
2017-07-05 10:05:03,515 - User['streamline'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2017-07-05 10:05:03,516 - User['registry'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2017-07-05 10:05:03,517 - User['storm'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2017-07-05 10:05:03,517 - User['infra-solr'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2017-07-05 10:05:03,518 - User['zookeeper'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2017-07-05 10:05:03,519 - User['ams'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2017-07-05 10:05:03,519 - User['ambari-qa'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['users']}
2017-07-05 10:05:03,520 - User['kafka'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2017-07-05 10:05:03,520 - User['nifi'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'nifi']}
2017-07-05 10:05:03,521 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2017-07-05 10:05:03,525 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}
2017-07-05 10:05:03,532 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] due to not_if
2017-07-05 10:05:03,549 - Execute[('setenforce', '0')] {'not_if': '(! which getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if': 'test -f /selinux/enforce'}
2017-07-05 10:05:03,561 - Skipping Execute[('setenforce', '0')] due to only_if
2017-07-05 10:05:03,764 - Stack Feature Version Info: stack_version=3.0, version=3.0.0.0-453, current_cluster_version=3.0.0.0-453 -> 3.0.0.0-453
2017-07-05 10:05:03,766 - Directory['/var/log/registry'] {'group': 'hadoop', 'cd_access': 'a', 'create_parents': True, 'recursive_ownership': True, 'owner': 'registry', 'mode': 0755}
2017-07-05 10:05:03,768 - Directory['/var/run/registry'] {'group': 'hadoop', 'cd_access': 'a', 'create_parents': True, 'recursive_ownership': True, 'owner': 'registry', 'mode': 0755}
2017-07-05 10:05:03,769 - Directory['/usr/hdf/current/registry/conf'] {'group': 'hadoop', 'cd_access': 'a', 'create_parents': True, 'recursive_ownership': True, 'owner': 'registry', 'mode': 0755}
2017-07-05 10:05:03,770 - Directory['/var/lib/ambari-agent/data/registry'] {'group': 'hadoop', 'cd_access': 'a', 'create_parents': True, 'mode': 0755, 'owner': 'registry', 'recursive_ownership': True}
2017-07-05 10:05:03,775 - File['/usr/hdf/current/registry/conf/registry-env.sh'] {'content': InlineTemplate(...), 'owner': 'registry'}
2017-07-05 10:05:03,776 - Directory['/etc/security/limits.d'] {'owner': 'root', 'create_parents': True, 'group': 'root'}
2017-07-05 10:05:03,777 - Directory['/hdf/registry'] {'owner': 'registry', 'group': 'hadoop', 'create_parents': True, 'mode': 0755, 'cd_access': 'a'}
2017-07-05 10:05:03,781 - File['/etc/security/limits.d/registry.conf'] {'content': Template('registry.conf.j2'), 'owner': 'root', 'group': 'root', 'mode': 0644}
2017-07-05 10:05:03,788 - File['/usr/hdf/current/registry/conf/registry.yaml'] {'owner': 'registry', 'content': Template('registry.yaml.j2'), 'group': 'hadoop', 'mode': 0644}
2017-07-05 10:05:03,790 - File['/usr/lib/ambari-agent/DBConnectionVerification.jar'] {'content': DownloadSource('http://ourhostname.com:8080/resources/DBConnectionVerification.jar')}
2017-07-05 10:05:03,790 - Not downloading the file from http://ourhostname.com:8080/resources/DBConnectionVerification.jar, because /var/lib/ambari-agent/tmp/DBConnectionVerification.jar already exists
2017-07-05 10:05:03,792 - Execute['source /usr/hdf/current/registry/conf/registry-env.sh ; /usr/hdf/current/registry/bootstrap/bootstrap-storage.sh'] {'user': 'root'}
2017-07-05 10:07:12,077 - Execute['find /var/log/registry -maxdepth 1 -type f -name '*' -exec echo '==> {} <==' \; -exec tail -n 40 {} \;'] {'logoutput': True, 'ignore_failures': True, 'user': 'registry'}
... View more
07-05-2017
12:21 PM
Hi, After installing the HDF (3.0.0.0) platform with NiFi, I am trying to start the Registry service (0.3.0), but it fails due to the following error: Downloading mysql jar from url: https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.40.zip
Downloading file mysql-connector-java-5.1.40.zip into /tmp
Failed to download the mysql driver from https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.40.zip
Error occurred while downloading MySQL jar. bootstrap dir: /usr/hdf/current/registry/bootstrap
I have modified my ambari-server/ambari-env.sh file with the correct proxy host name and port, but does not appear to help. I have also tried manually installing this version of the mysql-connector, again no luck. Any ideas?
... View more