About getschwifty

getschwifty · ‎03-09-2021

Tested. Works. Awesome.

getschwifty · ‎07-21-2020

Hi, Webhdfs is apparently not going to work. The stakeholders are trying to use mlflow and pyarrow. We're currently dealing with AD realm issues that is delaying progress. I was able to test communicating between separate clusters with: hdfs dfs -ls hdfs://core15.fqdn:8022/ Those clusters are signed by the same certificate authority however (and share the same AD realm). The new hosts currently are possibly not - I have not been able to get the stakeholders to finalise that. With RPC calls, what does the remote host require? Same domain? Does it actually require the same signed certificates? Does it need the destination cluster core-site and hdfs-site files? Currently, the errors I am being sent are (which I think are related to the realm issues): (base) bash-4.2$ hdfs dfs -ls hdfs://core15.fqdn:8022/ 20/07/20 17:26:23 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] ls: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "host1073.fqdn/xx.x.xxxx.xx"; destination host is: "core15.fqdn":8022;

getschwifty · ‎07-19-2020

Hi, I am trying to set up RPC HDFS remote access, but can find nothing online that explains the steps required. Can anyone provide any links/resources etc to assist? Evan

getschwifty · ‎07-14-2020

Hi, This isn't meant to be a blog post. Here's the answer: csv file has lots of new line characters. Probably (I'm guessing) from where the devs were writing their queries? Who knows. Either way, it looked like this: Timestamp,Username,"IP Address","Service Name",Operation,Resource,Allowed,Impersonator,sub_operation,entity_id,stored_object_name,additional_info,collection_name,solr_version,operation_params,service,operation_text,url,operation_text,table_name,resource_path,database_name,object_type,Source,Destination,Permissions,"Delegation Token ID","Table Name",Family,Qualifier,"Operation Text","Database Name","Table Name","Object Type","Resource Path","Usage Type","Operation Text","Database Name","Table Name","Object Type","Resource Path","Usage Type","Operation Text","Query ID","Session ID",Status,"Database Name","Table Name","Object Type",Privilege$ 2020-07-01T22:49:13.000Z,user1,::ffff:xx.xx.xx.xx,IMPALA,QUERY,env_db:table_ $ name,true,"hue/host19.fqdn@DOMAIN",,,,,,,,,,,"select * fro$ m table_name",table_name2 etc etc",,, Crazy. New line characters in the middle of words. Found this online: awk 'NR == 1{ printf $0; next }; { printf "%s%s", (/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]T[0-9][0-9]:[0-9][0-9]+/? ORS : ""), $0 } END{ print "" }' inputfile.csv > outputfile.csv I am terrible at regex. Can't tell you why it's doing what it's doing. But it works. Check against the timestamp, which each line starts with: 2020-07-08T23:49:13.000Z Strip off the header line first, then run the newline stripper code: sed -i 1d inputfile.csv Testing looks good. Time for lunch.

getschwifty · ‎07-14-2020

So, this: ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = '"', "escapeChar" = "\\" ) STORED AS TEXTFILE does not handle commas in csv records, even if they are enclosed with " " (ie: "CREATE TABLE db.table AS ( SELECT db.col1, db.col2, db.col3, db.col3,... etc" There has to be a way to do this. Does anyone know?

getschwifty · ‎07-14-2020

Hi everyone, I am uploading Navigator logs into Hive for analysis. I used this: ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = '"', "escapeChar" = "\\" ) STORED AS TEXTFILE Based off: https://community.cloudera.com/t5/Support-Questions/Hive-escaping-field-delimiter-in-column-value/m-p/233346#M195173 to get an external table to handle the commas in queries. I think it's working fine (but will test tomorrow to make sure). However, I then do a move to a managed partitioned table, but it is not handling the commas in queries correctly. I created it as: PARTITIONED BY(event_day INT, event_month INT, event_year INT); How do I handle the commas in queries as part of this move? Does the managed table need the same delimter & escape info as the external table? If that's the case, what's the syntax for it? Schwifty

getschwifty · ‎07-09-2020

So the answer is the 'org.apache.hadoop.hive.serde2.OpenCSVSerde' isn't supported in Impala, but it is in Hive.

getschwifty · ‎07-09-2020

Hi, Trying to upload logs into a Hive external table. Some records contain queries which have multiple columns in them, but commas are also the field delimiters. Other questions here recommend using org.apache.hadoop.hive.serde2.OpenCSVSerde. So I used: ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' STORED AS TEXTFILE LOCATION 'hdfs://hdfs/path/to/directory/external/2020Jun30' TBLPROPERTIES ('skip.header.line.count'='1'); This didn't deal with the additional non-escaped commas in the log file though. So I tried: ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = '"', "escapeChar" = "\\" ) STORED AS TEXTFILE However, in Hue I get this error when using org.apache.hadoop.hive.serde2.OpenCSVSerde: AnalysisException: Failed to load metadata for table: 'db.ext_table' CAUSED BY: TableLoadingException: Failed to load metadata for table: db.ext_table CAUSED BY: InvalidStorageDescriptorException: Impala does not support tables of this type. REASON: SerDe library 'org.apache.hadoop.hive.serde2.OpenCSVSerde' is not supported. I can read from the table via beeline, but not via Hue. Is this something (a library?) that I need to add to Hue? Or do I need to create the table in some other way? How do I deal with sql query commas in a comma delimitered log file?

getschwifty · ‎05-13-2020

Thanks @kramalingam !

getschwifty · ‎05-12-2020

Hi @kramalingam , Thanks, but that's not what I'm asking. I already have a kerberos user with access to HBase. I'm asking about the specific code I listed above that states using the HBase keytab. I would have thought that that was a major security risk in a multi-tenant environment. That code also seems to include user config as well, which is doubly confusing: UserGroupInformation.setConfiguration(configuration); UserGroupInformation.loginUserFromKeytab(principal, keytabLocation); Why do you need the HBase keytab and a user keytab?

Online	Offline
Last Visited	‎03-10-2021 11:07 PM

Member Since	‎07-30-2018 10:33 PM
Last Visited	‎03-10-2021 11:07 PM
Posts	23
Kudos received	2

Cloudera Community

Re: Hive and comma delimited fields in managed tab...

Re: Hive- how to escape field delimiter in column ...

Re: How to make Hue look at multiple AD groups for...

Re: HBase Quota Support in CDH 6

Re: Connecting Nifi to Impala using Simba JDBC d...

Re: HDFS RPC

HDFS RPC

Re: Hive and comma delimited fields in managed tab...

Re: Hive and comma delimited fields in managed tab...

Hive and comma delimited fields in managed table

Re: Hive- how to escape field delimiter in column ...

Hive- how to escape field delimiter in column valu...

Re: Connecting to Kerberos secured HBase cluster f...

Re: Connecting to Kerberos secured HBase cluster f...