Member since
07-30-2018
23
Posts
2
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1113 | 07-14-2020 08:08 PM | |
1641 | 07-09-2020 09:04 PM | |
646 | 05-03-2020 11:28 PM | |
599 | 08-14-2019 08:50 PM |
03-09-2021
08:43 PM
Tested. Works. Awesome.
... View more
07-21-2020
01:43 AM
Hi, Webhdfs is apparently not going to work. The stakeholders are trying to use mlflow and pyarrow. We're currently dealing with AD realm issues that is delaying progress. I was able to test communicating between separate clusters with: hdfs dfs -ls hdfs://core15.fqdn:8022/ Those clusters are signed by the same certificate authority however (and share the same AD realm). The new hosts currently are possibly not - I have not been able to get the stakeholders to finalise that. With RPC calls, what does the remote host require? Same domain? Does it actually require the same signed certificates? Does it need the destination cluster core-site and hdfs-site files? Currently, the errors I am being sent are (which I think are related to the realm issues): (base) bash-4.2$ hdfs dfs -ls hdfs://core15.fqdn:8022/ 20/07/20 17:26:23 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] ls: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "host1073.fqdn/xx.x.xxxx.xx"; destination host is: "core15.fqdn":8022;
... View more
07-19-2020
09:40 PM
Hi, I am trying to set up RPC HDFS remote access, but can find nothing online that explains the steps required. Can anyone provide any links/resources etc to assist? Evan
... View more
Labels:
07-14-2020
08:08 PM
Hi, This isn't meant to be a blog post. Here's the answer: csv file has lots of new line characters. Probably (I'm guessing) from where the devs were writing their queries? Who knows. Either way, it looked like this: Timestamp,Username,"IP Address","Service Name",Operation,Resource,Allowed,Impersonator,sub_operation,entity_id,stored_object_name,additional_info,collection_name,solr_version,operation_params,service,operation_text,url,operation_text,table_name,resource_path,database_name,object_type,Source,Destination,Permissions,"Delegation Token ID","Table Name",Family,Qualifier,"Operation Text","Database Name","Table Name","Object Type","Resource Path","Usage Type","Operation Text","Database Name","Table Name","Object Type","Resource Path","Usage Type","Operation Text","Query ID","Session ID",Status,"Database Name","Table Name","Object Type",Privilege$ 2020-07-01T22:49:13.000Z,user1,::ffff:xx.xx.xx.xx,IMPALA,QUERY,env_db:table_ $ name,true,"hue/host19.fqdn@DOMAIN",,,,,,,,,,,"select * fro$ m table_name",table_name2 etc etc",,, Crazy. New line characters in the middle of words. Found this online: awk 'NR == 1{ printf $0; next }; { printf "%s%s", (/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]T[0-9][0-9]:[0-9][0-9]+/? ORS : ""), $0 } END{ print "" }' inputfile.csv > outputfile.csv I am terrible at regex. Can't tell you why it's doing what it's doing. But it works. Check against the timestamp, which each line starts with: 2020-07-08T23:49:13.000Z Strip off the header line first, then run the newline stripper code: sed -i 1d inputfile.csv Testing looks good. Time for lunch.
... View more
07-14-2020
04:58 PM
So, this: ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = '"', "escapeChar" = "\\" ) STORED AS TEXTFILE does not handle commas in csv records, even if they are enclosed with " " (ie: "CREATE TABLE db.table AS ( SELECT db.col1, db.col2, db.col3, db.col3,... etc" There has to be a way to do this. Does anyone know?
... View more
07-14-2020
12:38 AM
Hi everyone, I am uploading Navigator logs into Hive for analysis. I used this: ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = '"', "escapeChar" = "\\" ) STORED AS TEXTFILE Based off: https://community.cloudera.com/t5/Support-Questions/Hive-escaping-field-delimiter-in-column-value/m-p/233346#M195173 to get an external table to handle the commas in queries. I think it's working fine (but will test tomorrow to make sure). However, I then do a move to a managed partitioned table, but it is not handling the commas in queries correctly. I created it as: PARTITIONED BY(event_day INT, event_month INT, event_year INT); How do I handle the commas in queries as part of this move? Does the managed table need the same delimter & escape info as the external table? If that's the case, what's the syntax for it? Schwifty
... View more
Labels:
07-09-2020
09:04 PM
So the answer is the 'org.apache.hadoop.hive.serde2.OpenCSVSerde' isn't supported in Impala, but it is in Hive.
... View more
07-09-2020
06:29 PM
Hi, Trying to upload logs into a Hive external table. Some records contain queries which have multiple columns in them, but commas are also the field delimiters. Other questions here recommend using org.apache.hadoop.hive.serde2.OpenCSVSerde. So I used: ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' STORED AS TEXTFILE LOCATION 'hdfs://hdfs/path/to/directory/external/2020Jun30' TBLPROPERTIES ('skip.header.line.count'='1'); This didn't deal with the additional non-escaped commas in the log file though. So I tried: ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = '"', "escapeChar" = "\\" ) STORED AS TEXTFILE However, in Hue I get this error when using org.apache.hadoop.hive.serde2.OpenCSVSerde: AnalysisException: Failed to load metadata for table: 'db.ext_table' CAUSED BY: TableLoadingException: Failed to load metadata for table: db.ext_table CAUSED BY: InvalidStorageDescriptorException: Impala does not support tables of this type. REASON: SerDe library 'org.apache.hadoop.hive.serde2.OpenCSVSerde' is not supported. I can read from the table via beeline, but not via Hue. Is this something (a library?) that I need to add to Hue? Or do I need to create the table in some other way? How do I deal with sql query commas in a comma delimitered log file?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
-
Cloudera Hue
05-13-2020
10:15 PM
Thanks @kramalingam !
... View more
05-12-2020
05:11 PM
Hi @kramalingam , Thanks, but that's not what I'm asking. I already have a kerberos user with access to HBase. I'm asking about the specific code I listed above that states using the HBase keytab. I would have thought that that was a major security risk in a multi-tenant environment. That code also seems to include user config as well, which is doubly confusing: UserGroupInformation.setConfiguration(configuration);
UserGroupInformation.loginUserFromKeytab(principal, keytabLocation); Why do you need the HBase keytab and a user keytab?
... View more
05-11-2020
05:12 PM
// this is needed even if you connect over rpc/zookeeper
configuration.set("hbase.master.kerberos.principal", "hbase/_HOST@FIELD.HORTONWORKS.COM");
configuration.set("hbase.master.keytab.file", "src/hbase.service.keytab"); This seems like a security risk to hand over the HBase keytab to users. How would you set this code up to run in a secured environment? ie: you have multiple tenants accessing HBase. I wouldn't think that this would be a good security practice.
... View more
05-03-2020
11:28 PM
1 Kudo
Solved it. This works. (|(memberOf:1.2.840.113556.1.4.1941:=CN=hue-users,OU=x,DC=x,DC=x,DC=x)(memberOf:1.2.840.113556.1.4.1941:=CN=service-users,OU=x,DC=x,DC=x,DC=x))
... View more
05-03-2020
05:23 PM
EDIT: which is the base_dn in the CDH Hue config It's the LDAP User Filter user_filter Not the base_dn
... View more
05-01-2020
01:02 AM
Hi, We are running Hue as part of CDH 6.2 (and currently upgrading to 6.3.3). We have an AD group - hue-users - which is the base_dn in the CDH Hue config. This group controls basic access to Hue. I would like to add another AD group. How do I do this? For context, this is so that we can split permissions between groups. One group (hue-users) is for user access (all modules). The other is for unattended service accounts, so we only want them be able to access/schedule/run Oozie jobs. Edit: they're all in the same AD realm, although not necessarily the same OU path. My current config is (the numbers are for reading nested AD groups) (memberOf:1.2.840.113556.1.4.1941:=CN=hue-users,OU=x,DC=x,DC=x,DC=x) I've tried (with various brackets etc): ((memberOf:1.2.840.113556.1.4.1941:=CN=hue-users,OU=x,DC=x,DC=x,DC=x) || (memberOf:1.2.840.113556.1.4.1941:=CN=service-users,OU=x,DC=x,DC=x,DC=x)) (|(memberOf:1.2.840.113556.1.4.1941:=CN=hue-users,OU=x,DC=x,DC=x,DC=x)(memberOf:1.2.840.113556.1.4.1941:=CN=service-users,OU=x,DC=x,DC=x,DC=x)) None have worked. Any suggestions?
... View more
Labels:
- Labels:
-
Cloudera Hue
-
Cloudera Manager
01-07-2020
02:31 PM
I think the answer is no, based on some of the info I've just been reading: https://community.cloudera.com/t5/Support-Questions/How-to-run-Oozie-workfllow-or-action-as-another-user/td-p/26794 unless someone has a better idea?
... View more
01-07-2020
02:26 PM
Hi,
Currently the owner of the Oozie workflow are the only ones that can resume/suspend/return/kill a workflow. All workflows are deployed using the service account. It is possible for a user that is a member of the same Active Directory group (as the workflow owner) to have the ability to administer the Oozie workflow?
Is this possible: user ABC123 is part of ABC-group1 and can resume/suspend/return/kill an Oozie workflow which is owned by XYZ123 (also a member of ABC-group1).
E
... View more
Labels:
- Labels:
-
Apache Oozie
11-18-2019
02:53 PM
Hi, I have not found a solution. I don't think that you can get that quota info out of CDH (even though it displays it).
... View more
08-14-2019
08:50 PM
1 Kudo
Hi, Yes, they are. The hbase.quota.enabled property is not displayed in CDH. It must be added via a safety valve snippet under "Hbase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml" in the HBase Configuration tab. Name: hbase.quota.enabled Value: true Description: Enable hbase quotas Also note that to then deploy the change requires a restart of multiple services, such as Impala, CDSW etc EDIT: I'm using CDH 6.2 Evan
... View more
08-29-2018
10:51 PM
Hi, Does anyone know if it's possible to get HDFS quota info via a Cloudera API call? I can get the usernames, size and raw sizes etc from https://host.domain:7183/api/v19/clusters/my_cluster/services/hdfs/reports/hdfsUsageReport but I can't find any reference to the quotas set in HDFS in the API documentation (unless I'm completely missing it). EDIT: Trying to get this data in a JSON format. EDIT: Also looking at the timeseries/tsquery statements, but really just randomly trying things... E
... View more
Labels:
- Labels:
-
HDFS
08-28-2018
06:01 PM
Brilliant! Thanks so much
... View more
08-27-2018
12:20 AM
Hi, I'm also looking into this, but have not yet discovered how to get it. I'm not even sure it's possible. E
... View more