Member since
04-22-2014
1218
Posts
341
Kudos Received
157
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 26261 | 03-03-2020 08:12 AM | |
| 16421 | 02-28-2020 10:43 AM | |
| 4725 | 12-16-2019 12:59 PM | |
| 4475 | 11-12-2019 03:28 PM | |
| 6679 | 11-01-2019 09:01 AM |
12-14-2018
02:58 PM
1 Kudo
@orak, One thing that would help us provide some more suggestions is to understand the following: How you came to know that your "hadoop cluster ran out of space". What did you see exactly that told you there was a problem? What did you run to see that a database was using 77TB? What was the ouptput? What command did you run to see that only 5TB was of table data was taken? What was the output?
... View more
11-29-2018
10:05 AM
@VijayM, Oh, and the same rules apply to Hive as well. Forgot to add that.
... View more
11-29-2018
10:02 AM
@VijayM, The way that Hue is designed, it needs to know that an impala connection it has open (where it excecuted a query on a coordinator) will connect to the same coordinator. This is because Hue needs to pull information regarding the query for display. This means that the balancer in between Hue and Impala needs to use IP persistence. Also, to avoid intermittent session errors with impala, it is recommended that the timeout at the HAProxy side be increased to a long time so that connections are not timed out. No configuration in Hue is required. Just make sure that Hue knows to connect to the right server/port (Impala Load Balancer (HAProxy)) in its config. Here is an example of a configuration that has 3 ports: one for impala-shell one for JDBC-based applications one for Hue Since Hue has some specific needs that may not be required for other applications, this makes sense. Here is an example config that does pass-through TLS. NOTE: I don't think the ssl stuff is necessary for pass-through since the packets should be passed to the backend servers without TLS negotiation, so you can probably ignore that. NOTE2: Currently it is not possible to have true load balancing for Hue connections to Impala, but we are working on it and have some code that could change that. For now, you can achieve failover for the Hue connections, but not real balancing of connections. # For impala-shell users on port 21000. #--------------------------------------------------------------------- # main frontend which proxys to the backends #--------------------------------------------------------------------- frontend impala_front bind *:21000 ssl crt /opt/cloudera/security/x509/certkeynopw.pem mode tcp option tcplog default_backend impala-shell #--------------------------------------------------------------------- # round robin balancing between the various backends #--------------------------------------------------------------------- backend impala-shell balance leastconn mode tcp server impalad1 impalad-1.example.com:21000 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem server impalad2 impalad-2.example.com:21000 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem server impalad3 impalad-3.example.com:21000 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem # For JDBC or ODBC version 2.x driver, use port 21050 instead of 21000. #--------------------------------------------------------------------- # main frontend which proxys to the backends #--------------------------------------------------------------------- frontend impala_front bind *:21050 ssl crt /opt/cloudera/security/x509/certkeynopw.pem mode tcp option tcplog default_backend impala-jdbc #--------------------------------------------------------------------- # round robin balancing between the various backends #--------------------------------------------------------------------- backend impala-jdbc balance leastconn mode tcp server impalad1 impalad-1.example.com:21050 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem server impalad2 impalad-2.example.com:21050 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem server impalad3 impalad-3.example.com:21050 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem # Setup for Hue or other JDBC-enabled applications. # In particular, Hue requires SOURCE IP PERSISTANCE # The application connects to load_balancer_host:21051, and HAProxy balances # connections to the associated hosts, where Impala listens for JDBC # requests on port 21050. # Notice the timeouts below that do not exist in the other configs # these are to stop the connections from being killed even though # hue is using them #--------------------------------------------------------------------- # main frontend which proxys to the backends #--------------------------------------------------------------------- frontend impalajdbc_front bind *:21051 ssl crt /opt/cloudera/security/x509/certkeynopw.pem mode tcp option tcplog timeout client 720m timeout server 720m default_backend impala-hue #--------------------------------------------------------------------- # source balancing between the various backends #--------------------------------------------------------------------- backend impala-hue balance source mode tcp server impalad1 impalad-1.example.com:21050 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem server impalad2 impalad-2.example.com:21050 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem server impalad3 impalad-3.example.com:21050 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
... View more
11-21-2018
11:56 AM
@Tomas79, As for your "duplicate entry" issue, it makes sense if your email address is longer than the username column width (which is 30 characters, I think). You log in the first and the username obtained from the SAML response nameID is truncated when added to the database. Then, the next time you log in, a search for the full email address is done (and not found due to the username truncation) since no rows are returned, Hue considers this a new user and attempts to add it. This fails since the truncated name already exists. In order to get around that problem, I suppose you could expand the "username" column to 40 or 50 characters, but I think you wanted to use attribute-based usernames instead. Getting back to the attributes, I do see you have: Name="uid" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri" This seems good, but there is some back-end computation that results in Hue not being able to see your attribute value. To understand this better, note this flow of mapping to get from the SAML response to a Hue username: SAML response attribute/value ====> pysaml attribute/value pysaml attribute/value ====> djangosaml "username" attribute djangosaml user ====> Hue user For the SAML response to pysaml attribute mapping, there is a built-in mapping in hue/desktop/libs/libsaml/attribute-maps/SAML2.py for urn:oasis:names:tc:SAML:2.0:attrname-format:uri It maps response attributes to pysaml attributes and stores the value in memory. There is no "uid" mapping by default. Rather, the SAML2.py mapping file looks for the OID for "uid" which is urn:oid:0.9.2342.19200300.100.1.1 Since urn:oid:0.9.2342.19200300.100.1.1 is not found in the response, your attribute and value is not seen. That is a long explanation to come to a couple options you have: (1) Configure your IDP to emit the uid attribute with the format urn:oasis:names:tc:SAML:2.0:attrname-format:unspecified instead of urn:oasis:names:tc:SAML:2.0:attrname-format:uri This will tell the client that the attribute does not conform to standard OID formatting so the attribute name should be obtained literally (as is) and the value retrieved. (2) Create a custom mapping for the "uid" attribute in a mapping attribute for urn:oasis:names:tc:SAML:2.0:attrname-format:uri To do so, you can follow these instructions: [A] Create a directory that will house your attribute mapping file. For example: # mkdir /opt/cloudera/saml/attribute_mapping # chown hue:hue /opt/cloudera/saml/attribute_mapping [B] Place the custom attribute mapping file in the directory created in step A "saml_uri.py" file contents: MAP = {
"identifier": "urn:oasis:names:tc:SAML:2.0:attrname-format:uri",
"fro": {
'uid': 'uid',
},
"to": {
'uid': 'uid',
} NOTE: I believe the above should work as the "fro" section maps assertion attribute name to pysaml name and the "to" section does the reverse. I think it should be OK to have both sides the same. NOTE2: make sure hue can read from the saml_uri.py file. For example: # chown hue:hue saml_uri.py There should now be an attribute mapping file named saml_uri.py with the following location: /opt/cloudera/saml/attribute_mapping/saml_uri.py [C] Configure Hue's Service-Wide safety valve with the following addition in the [libsaml] section: attribute_map_dir=/opt/cloudera/saml/attribute_mapping user_attribute_mapping='{"uid":"username"}' Make sure to restart Hue after the change to the safety valve. NOTE: the only attribute that Hue really needs or cares about in this case is whatever maps to "username" Let me know if you have any questions.
... View more
11-20-2018
10:31 AM
2 Kudos
@Tomas79, Sorry about that; I started writing a response and got pulled away so I didn't see that you had already moved on! Great job as now we are at a point where Hue can parse the response and is happy with it. The problem is now that it cannot map a user from the response data to a Hue user. We see: [20/Nov/2018 09:24:35 -0800] response ERROR Missing Attribute Statement [20/Nov/2018 09:24:35 -0800] response DEBUG --- AVA: {} [20/Nov/2018 09:24:35 -0800] client_base INFO --- ADDED person info ---- [20/Nov/2018 09:24:35 -0800] backends ERROR The attributes dictionary is empty [20/Nov/2018 09:24:35 -0800] backends ERROR Could not find saml_user value [20/Nov/2018 09:24:35 -0800] views WARNING Could not authenticate user received in SAML Assertion. Session info: {'authn_info': [('urn:oasis:names:tc:SAML:2.0:ac:classes:Password', [], '2018-11-07T07:43:32.657Z')], 'name_id': <saml2.saml.NameID object at 0x7f6f1a93c690>, 'not_on_or_after': 1542737974, 'session_index': '_c1e08c03-ccf6-4f32-9a96-5e00cc4233e4', 'came_from': None, 'ava': {}, 'issuer': 'https://sts.windows.net/f0ba4e-redacted-client-id-4kha/'} By default, Hue will use the following configuration for [libsaml]: username_source=attributes user_attribute_mapping={'uid': ('username', )} The problem is that you do not have any SAML attribute "uid" in your SAML response so nothing is found to map to the Hue "username". Another problem is that if you were trying to use attributes (not SAML response nameid) for the Hue username, then the user_attribute_mapping is not correct by default. The following is the format I use with success: user_attribute_mapping='{"uid":"username"}' Since you do not use "uid" in your response, what attribute do you want to use or do you want to use nameId? If you use NameId, that will map to an email address format in Hue which may not be compatible with other hadoop configuration. The best thing to do is decide what attribute will have a "logon" user name in Azure SSO and make sure that is included in the response. for testing, though, you can just test to see if this works with NameId by adding this to your [libsaml] section in your Hue safety valve: username_source=nameid name_id_format="urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress" Restart Hue after that. I am going based on this in your response xml: <saml:NameID Format="urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress">REDACTED-MY-USER-NAME</saml:NameID>
... View more
11-20-2018
10:01 AM
@Tomas79, That sounds a bit sketchy. IDPs need to provide their metadata in order for the Service Provider to be configured properly. Having to create your own metata is error-prone and too much to ask. This page seems to imply that there is a link that you can use to download the metadata for your entity: https://developers.exlibrisgroup.com/blog/SAML-with-Azure I think you should see it in your Azure Portal Single Sign on for your app. I think it might be called SAML XML Metadata.
... View more
11-20-2018
08:51 AM
@Tomas79, It appears that you manually truncated the XML response so we cannot see the key information that helps us understand why the failures is occurring. Based on the code path, the SAML response is returned and there is an attempt to parse it so that the response can be validated and the needed information extracted. Along those steps, there is a check to see if the assertion is signed. If it is signed (which Azure SSO should be based on the documentation) then a check is done to verify the signature. That is failing as it appears either your IDP metadata or the response does not include a valid signature. I suggest looking at the IDP metadata and the response assertion to help understand what might be causing the problem
... View more
11-19-2018
09:32 AM
@Tomas79, Can you clarify what steps you took and what your hue configuration looks like when you receive the exception. What documentation steps did you follow that mentioned you should upgrade your pysaml2 package? What brand of IDP are you using? At quick glance, it appears that your SAML response is using HTTP_REDIRECT binding and there is some problem parsing it perhaps. We see the request uses HTTP_POST for ProtocolBinding. Let's see if we can view the SAML response in the Hue log by enabling DEBUG. This may give us some clues.
... View more
11-16-2018
02:56 PM
1 Kudo
@desind, It appears something went wrong while indexing the fsimage file and now the index is "corrupt". The most common cause of such a problem is insufficient disk space on the volume where the indexes are stored. Make sure there is at least 3 times the size of the fsimage space free (more is better, though). To recover, you can remove the current index files and restart Reports Manager. (1) copy all child directories and their contents in /xxx/hadoop/cloudera-scm-headlamp to another location (you shouldn't need anything in that directory, but I prefer to not delete until everything is up. (2) delete all files and directories under /xxx/hadoop/cloudera-scm-headlamp (3) Restart Reports Manager Reports Manager will see there are no indexes and then download the fsimage file from your NameNode; the fsimage will then be indexed.
... View more
11-16-2018
10:03 AM
global log 127.0.0.1 local2 pidfile /var/run/haproxy.pid maxconn 4000 user haproxy group haproxy daemon stats socket /tmp/haproxy tune.ssl.default-dh-param 2048 defaults mode http log global option httplog option dontlognull option forwardfor except 127.0.0.0/8 option redispatch retries 3 timeout http-request 10s timeout queue 1m timeout connect 10s timeout client 10m timeout server 10m timeout check 10s maxconn 3000 listen admin bind *:8000 stats enable frontend hiveserver2_front bind *:10001 option tcplog mode tcp default_backend hiveserver2 backend hiveserver2 mode tcp balance source server hs2_1 host1.example.com:10000 server hs2_2 host2.example.com:10000
... View more