Member since
09-14-2017
120
Posts
11
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3147 | 06-17-2021 06:55 AM | |
1925 | 01-13-2021 01:56 PM | |
17198 | 11-02-2017 06:35 AM | |
19013 | 10-04-2017 02:43 PM | |
34418 | 09-14-2017 06:40 PM |
11-18-2020
04:06 PM
Thanks for the solution!! Same issue for me after enabling MIT Kerberos in the CDH 5.16.2 cluster zookeeper wouldn't start with the above message javax.security.auth.login.LoginException: Message stream modified (41) I was using openjdk version "1.8.0_272". As per your solution commented the line in /etc/krb5.conf on all servers: #renew_lifetime = 604800 After that restart of cluster all services worked except Hue Kerberos Ticket Renewer which gives error Couldn't renew kerberos ticket in order to work around Kerberos 1.8.1 issue. Please check that the ticket for 'hue/fqdn@KRBREALM' is still renewable: The Kerberos Ticket Renewer is a separate issue and we need to run on the MIT KDC server: kadmin.local: modprinc -maxrenewlife 90day krbtgt/KRBREALM kadmin.local: modprinc -maxrenewlife 90day +allow_renewable hue/fqdn@KRBREALM for all hue servers fqdn After that Hue Kerberos Ticket Renewer restarted successfully.
... View more
08-14-2020
08:04 AM
1 Kudo
Hello Experts,
I can no longer find the CDP Data Center upgrade docs on the website. I only see CDP Private Cloud.
For example OS reqs link https://docs.cloudera.com/cdp/latest/release-guide/topics/cdpdc-os-requirements.html the url has cdpdc-os in it but the content is private cloud. I thought I saw Data Center few days back but cannot find now. Any ideas?
Thanks!
... View more
08-05-2020
08:16 AM
Thanks for the info. This is not well documented in the CM upgrade page which will be good to have this info how to generate credentials: https://docs.cloudera.com/cdp/latest/upgrade-cdh/topics/ug_cm_upgrade_server.html However after clicking the Download Now option from your link I now get different error which means we need to work with Cloudera to get entitled for CDP Data Center license. Thanks! Access Restricted You must be a CDP Data Center customer to access these downloads. If you believe you should have this entitlement then please reach out to support or your customer service representative.
... View more
08-04-2020
10:13 AM
Hello Experts, I am doing an upgrade from CM 5.16 to CM 7.1.2. There is an instruction to enter userid:password in the url in the .repo file: Using the Cloudera public repository Substitute your USERNAME and PASSWORD in the Package Repository URL where indicated in the URL in /etc/yum.repos.d/cloudera-manager.repo baseurl=https://USERID:PASSWORD@archive.cloudera.com/p/cm7/7.1.2/redhat6/yum/ After setting above and running the command: yum deplist cloudera-manager-agent it gives an error below: ........@archive.cloudera.com/p/cm7/7.1.2/redhat6/yum/repodata/repomd.xml: [Errno 14] PYCURL ERROR 6 - "Couldn't resolve host 'xyz.com:<password>@archive.cloudera.com'" Anyone seen this error how to resolve. Thanks!
... View more
Labels:
04-24-2020
02:08 PM
Thanks, you are a genius 🙂 . Installing thrift-sasl-0.4.2 and impyla 0.16.2 did allow successful running of the script. However now I have a different issue. The call cursor.fetchmany(size=3) hangs indefinitely in Jupyter notebook. It executes immediately in similar pyhive script on same small table. from impala.dbapi import connect conn = connect(host='myhost', port=21050, auth_mechanism='GSSAPI', kerberos_service_name='impala') cursor = conn.cursor() cursor.execute('SELECT * FROM default.mytable LIMIT 100') cursor.fetchmany(size=3) cursor.close() conn.close() It show query status as Executing in Cloudera manager->Impala Queries monitor. But also says Query State: FINISHED in the query details . The hang seems to be in the statement buff = self.sock.recv(sz) /data/opt/anaconda3/lib/python3.7/site-packages/thriftpy2/transport/socket.py in read(self, sz)
107 while True:
108 try:
--> 109 buff = self.sock.recv(sz)
110 except socket.error as e:
111 if e.errno == errno.EINTR:
KeyboardInterrupt: After trying various options and setting timeout=100 in the connect statement, it appears the script queries impala table successfully but every 2nd or 3rd time it fails with the below error: /data/opt/anaconda3/lib/python3.7/site-packages/impala/hiveserver2.py in _rpc(self, func_name, request)
992 response = self._execute(func_name, request)
993 self._log_response(func_name, response)
--> 994 err_if_rpc_not_ok(response)
995 return response
996
/data/opt/anaconda3/lib/python3.7/site-packages/impala/hiveserver2.py in err_if_rpc_not_ok(resp)
746 resp.status.statusCode != TStatusCode.SUCCESS_WITH_INFO_STATUS and
747 resp.status.statusCode != TStatusCode.STILL_EXECUTING_STATUS):
--> 748 raise HiveServer2Error(resp.status.errorMessage)
749
750
HiveServer2Error: Invalid query handle: b14cce8e19xxxx:5b51463xxxx
Any more thoughts?
... View more
04-24-2020
10:01 AM
Anyone found an answer for this I am also getting same error when I run below. This is a kerberos cluster and Impala works fine through HUE and odbc: -------------------- from impala.dbapi import connect conn = connect(host='myhost', port=21050) cursor = conn.cursor() cursor.execute('SELECT * FROM default.testtable') print (cursor.description) # prints the result set's schema results = cursor.fetchall() ---------------------------------------------------------------------------
HiveServer2Error Traceback (most recent call last)
<ipython-input-13-82112a6ffca2> in <module>()
2 conn = connect(host='myhost', port=21050)
3
----> 4 cursor = conn.cursor()
5 cursor.execute('SELECT * FROM default.testtable')
6 print (cursor.description) # prints the result set's schema
/data/opt/anaconda3/lib/python3.7/site-packages/impala/hiveserver2.py in cursor(self, user, configuration, convert_types, dictify, fetch_error)
122 log.debug('.cursor(): getting new session_handle')
123
--> 124 session = self.service.open_session(user, configuration)
125
126 log.debug('HiveServer2Cursor(service=%s, session_handle=%s, '
/data/opt/anaconda3/lib/python3.7/site-packages/impala/hiveserver2.py in open_session(self, user, configuration)
1062 username=user,
1063 configuration=configuration)
-> 1064 resp = self._rpc('OpenSession', req)
1065 return HS2Session(self, resp.sessionHandle,
1066 resp.configuration,
/data/opt/anaconda3/lib/python3.7/site-packages/impala/hiveserver2.py in _rpc(self, func_name, request)
990 def _rpc(self, func_name, request):
991 self._log_request(func_name, request)
--> 992 response = self._execute(func_name, request)
993 self._log_response(func_name, response)
994 err_if_rpc_not_ok(response)
/data/opt/anaconda3/lib/python3.7/site-packages/impala/hiveserver2.py in _execute(self, func_name, request)
1021
1022 raise HiveServer2Error('Failed after retrying {0} times'
-> 1023 .format(self.retries)) 1024
1025 def _operation(self, kind, request):
HiveServer2Error: Failed after retrying 3 times --------------------------------------------------- Next I tried adding the kerberos parms: conn = connect(host='myhost', port=21050, auth_mechanism='GSSAPI', kerberos_service_name='impala') Now get a different error: /data/opt/anaconda3/lib/python3.7/site-packages/thrift_sasl/__init__.py in open(self)
65
66 def open(self):
---> 67 if not self._trans.isOpen():
68 self._trans.open()
69
AttributeError: 'TSocket' object has no attribute 'isOpen'
... View more
04-17-2020
06:18 AM
Hello, Is the upgrade from on-premise CDH 5.x--> CDP-DC 7.x option available yet or only fresh install available now? Does copying the old data from CDH 5.x ---> CDP 7.x possible using distcp or other means?
... View more
01-28-2020
08:41 AM
You are absolutely right. After struggling with various syntax, realized CDH5.16 version impalad version 2.12.0-cdh5.16.1 doesnt support get_json_object(). So finally using json.dumps() removed all the unicode u' characters and also removed all strange characters in the json fields to normal characters like CH4_NO2_WE_AUX. After that ended up using Hive instead of Impala with a query like below to extract the values as columns. The json_column1 is a string datatype. -------------------------------------------- select b.b1, c.c1,c.c2,d.d1,d.d2 from json_table1 a lateral view json_tuple(a.json_column1, 'CH4_NO2_WE_AUX', 'CH7_CO_CONCENTRATION_WE') b as b1,b2 lateral view json_tuple(b.b1,'unit','value') c as c1,c2 lateral view json_tuple(b.b2,'unit','value') d as d1,d2 ;
... View more
01-23-2020
07:47 AM
Hello, I have a Impala table with a JSON column with values like below. I am trying to get the json values into columns using get_json_object(col1, etc.) . Can you help me with the syntax how to extract all the values as columns using SQL. Note the spaces, unicode and nesting in the json variables. col1 -------------------- {u'CH4: NO2 (we-aux)': {u'unit': u'mV', u'value': 4.852294921875}, u'CH6: Ox concentration (we-aux)': {u'unit': u'ppb', u'value': -84.73094995471016}} {u'CH4: NO2 (we-aux)': {u'unit': u'mV', u'value': 5.852294921875}, u'CH6: Ox concentration (we-aux)': {u'unit': u'ppb', u'value': -94.73094995471016}} ....
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
12-10-2019
10:48 AM
Cloudera Data Platform Data Center Edition 7 is now generally available. ------------------------------------------------------------------------------------------------------------ Any idea when the Upgrade documentation for CDH 5.x to CDP 7.x will be available for Data Center Edition?
... View more