Member since
07-30-2018
23
Posts
2
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3438 | 07-14-2020 08:08 PM | |
3963 | 07-09-2020 09:04 PM | |
1643 | 05-03-2020 11:28 PM | |
1231 | 08-14-2019 08:50 PM |
03-09-2021
08:43 PM
Tested. Works. Awesome.
... View more
07-21-2020
01:43 AM
Hi, Webhdfs is apparently not going to work. The stakeholders are trying to use mlflow and pyarrow. We're currently dealing with AD realm issues that is delaying progress. I was able to test communicating between separate clusters with: hdfs dfs -ls hdfs://core15.fqdn:8022/ Those clusters are signed by the same certificate authority however (and share the same AD realm). The new hosts currently are possibly not - I have not been able to get the stakeholders to finalise that. With RPC calls, what does the remote host require? Same domain? Does it actually require the same signed certificates? Does it need the destination cluster core-site and hdfs-site files? Currently, the errors I am being sent are (which I think are related to the realm issues): (base) bash-4.2$ hdfs dfs -ls hdfs://core15.fqdn:8022/ 20/07/20 17:26:23 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] ls: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "host1073.fqdn/xx.x.xxxx.xx"; destination host is: "core15.fqdn":8022;
... View more
07-19-2020
09:40 PM
Hi, I am trying to set up RPC HDFS remote access, but can find nothing online that explains the steps required. Can anyone provide any links/resources etc to assist? Evan
... View more
Labels:
07-14-2020
08:08 PM
Hi, This isn't meant to be a blog post. Here's the answer: csv file has lots of new line characters. Probably (I'm guessing) from where the devs were writing their queries? Who knows. Either way, it looked like this: Timestamp,Username,"IP Address","Service Name",Operation,Resource,Allowed,Impersonator,sub_operation,entity_id,stored_object_name,additional_info,collection_name,solr_version,operation_params,service,operation_text,url,operation_text,table_name,resource_path,database_name,object_type,Source,Destination,Permissions,"Delegation Token ID","Table Name",Family,Qualifier,"Operation Text","Database Name","Table Name","Object Type","Resource Path","Usage Type","Operation Text","Database Name","Table Name","Object Type","Resource Path","Usage Type","Operation Text","Query ID","Session ID",Status,"Database Name","Table Name","Object Type",Privilege$ 2020-07-01T22:49:13.000Z,user1,::ffff:xx.xx.xx.xx,IMPALA,QUERY,env_db:table_ $ name,true,"hue/host19.fqdn@DOMAIN",,,,,,,,,,,"select * fro$ m table_name",table_name2 etc etc",,, Crazy. New line characters in the middle of words. Found this online: awk 'NR == 1{ printf $0; next }; { printf "%s%s", (/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]T[0-9][0-9]:[0-9][0-9]+/? ORS : ""), $0 } END{ print "" }' inputfile.csv > outputfile.csv I am terrible at regex. Can't tell you why it's doing what it's doing. But it works. Check against the timestamp, which each line starts with: 2020-07-08T23:49:13.000Z Strip off the header line first, then run the newline stripper code: sed -i 1d inputfile.csv Testing looks good. Time for lunch.
... View more
07-14-2020
04:58 PM
So, this: ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = '"', "escapeChar" = "\\" ) STORED AS TEXTFILE does not handle commas in csv records, even if they are enclosed with " " (ie: "CREATE TABLE db.table AS ( SELECT db.col1, db.col2, db.col3, db.col3,... etc" There has to be a way to do this. Does anyone know?
... View more
07-14-2020
12:38 AM
Hi everyone, I am uploading Navigator logs into Hive for analysis. I used this: ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = '"', "escapeChar" = "\\" ) STORED AS TEXTFILE Based off: https://community.cloudera.com/t5/Support-Questions/Hive-escaping-field-delimiter-in-column-value/m-p/233346#M195173 to get an external table to handle the commas in queries. I think it's working fine (but will test tomorrow to make sure). However, I then do a move to a managed partitioned table, but it is not handling the commas in queries correctly. I created it as: PARTITIONED BY(event_day INT, event_month INT, event_year INT); How do I handle the commas in queries as part of this move? Does the managed table need the same delimter & escape info as the external table? If that's the case, what's the syntax for it? Schwifty
... View more
Labels:
07-09-2020
09:04 PM
So the answer is the 'org.apache.hadoop.hive.serde2.OpenCSVSerde' isn't supported in Impala, but it is in Hive.
... View more
07-09-2020
06:29 PM
Hi, Trying to upload logs into a Hive external table. Some records contain queries which have multiple columns in them, but commas are also the field delimiters. Other questions here recommend using org.apache.hadoop.hive.serde2.OpenCSVSerde. So I used: ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' STORED AS TEXTFILE LOCATION 'hdfs://hdfs/path/to/directory/external/2020Jun30' TBLPROPERTIES ('skip.header.line.count'='1'); This didn't deal with the additional non-escaped commas in the log file though. So I tried: ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = '"', "escapeChar" = "\\" ) STORED AS TEXTFILE However, in Hue I get this error when using org.apache.hadoop.hive.serde2.OpenCSVSerde: AnalysisException: Failed to load metadata for table: 'db.ext_table' CAUSED BY: TableLoadingException: Failed to load metadata for table: db.ext_table CAUSED BY: InvalidStorageDescriptorException: Impala does not support tables of this type. REASON: SerDe library 'org.apache.hadoop.hive.serde2.OpenCSVSerde' is not supported. I can read from the table via beeline, but not via Hue. Is this something (a library?) that I need to add to Hue? Or do I need to create the table in some other way? How do I deal with sql query commas in a comma delimitered log file?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
-
Cloudera Hue
05-13-2020
10:15 PM
Thanks @kramalingam !
... View more
05-12-2020
05:11 PM
Hi @kramalingam , Thanks, but that's not what I'm asking. I already have a kerberos user with access to HBase. I'm asking about the specific code I listed above that states using the HBase keytab. I would have thought that that was a major security risk in a multi-tenant environment. That code also seems to include user config as well, which is doubly confusing: UserGroupInformation.setConfiguration(configuration);
UserGroupInformation.loginUserFromKeytab(principal, keytabLocation); Why do you need the HBase keytab and a user keytab?
... View more