Member since
07-30-2018
23
Posts
2
Kudos Received
4
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 5591 | 07-14-2020 08:08 PM | |
| 6347 | 07-09-2020 09:04 PM | |
| 2471 | 05-03-2020 11:28 PM | |
| 1919 | 08-14-2019 08:50 PM |
03-09-2021
08:43 PM
Tested. Works. Awesome.
... View more
07-14-2020
08:08 PM
Hi, This isn't meant to be a blog post. Here's the answer: csv file has lots of new line characters. Probably (I'm guessing) from where the devs were writing their queries? Who knows. Either way, it looked like this: Timestamp,Username,"IP Address","Service Name",Operation,Resource,Allowed,Impersonator,sub_operation,entity_id,stored_object_name,additional_info,collection_name,solr_version,operation_params,service,operation_text,url,operation_text,table_name,resource_path,database_name,object_type,Source,Destination,Permissions,"Delegation Token ID","Table Name",Family,Qualifier,"Operation Text","Database Name","Table Name","Object Type","Resource Path","Usage Type","Operation Text","Database Name","Table Name","Object Type","Resource Path","Usage Type","Operation Text","Query ID","Session ID",Status,"Database Name","Table Name","Object Type",Privilege$ 2020-07-01T22:49:13.000Z,user1,::ffff:xx.xx.xx.xx,IMPALA,QUERY,env_db:table_ $ name,true,"hue/host19.fqdn@DOMAIN",,,,,,,,,,,"select * fro$ m table_name",table_name2 etc etc",,, Crazy. New line characters in the middle of words. Found this online: awk 'NR == 1{ printf $0; next }; { printf "%s%s", (/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]T[0-9][0-9]:[0-9][0-9]+/? ORS : ""), $0 } END{ print "" }' inputfile.csv > outputfile.csv I am terrible at regex. Can't tell you why it's doing what it's doing. But it works. Check against the timestamp, which each line starts with: 2020-07-08T23:49:13.000Z Strip off the header line first, then run the newline stripper code: sed -i 1d inputfile.csv Testing looks good. Time for lunch.
... View more
07-14-2020
04:58 PM
So, this: ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = '"', "escapeChar" = "\\" ) STORED AS TEXTFILE does not handle commas in csv records, even if they are enclosed with " " (ie: "CREATE TABLE db.table AS ( SELECT db.col1, db.col2, db.col3, db.col3,... etc" There has to be a way to do this. Does anyone know?
... View more
07-14-2020
12:38 AM
Hi everyone, I am uploading Navigator logs into Hive for analysis. I used this: ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = '"', "escapeChar" = "\\" ) STORED AS TEXTFILE Based off: https://community.cloudera.com/t5/Support-Questions/Hive-escaping-field-delimiter-in-column-value/m-p/233346#M195173 to get an external table to handle the commas in queries. I think it's working fine (but will test tomorrow to make sure). However, I then do a move to a managed partitioned table, but it is not handling the commas in queries correctly. I created it as: PARTITIONED BY(event_day INT, event_month INT, event_year INT); How do I handle the commas in queries as part of this move? Does the managed table need the same delimter & escape info as the external table? If that's the case, what's the syntax for it? Schwifty
... View more
Labels:
07-09-2020
09:04 PM
So the answer is the 'org.apache.hadoop.hive.serde2.OpenCSVSerde' isn't supported in Impala, but it is in Hive.
... View more
07-09-2020
06:29 PM
Hi, Trying to upload logs into a Hive external table. Some records contain queries which have multiple columns in them, but commas are also the field delimiters. Other questions here recommend using org.apache.hadoop.hive.serde2.OpenCSVSerde. So I used: ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' STORED AS TEXTFILE LOCATION 'hdfs://hdfs/path/to/directory/external/2020Jun30' TBLPROPERTIES ('skip.header.line.count'='1'); This didn't deal with the additional non-escaped commas in the log file though. So I tried: ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = '"', "escapeChar" = "\\" ) STORED AS TEXTFILE However, in Hue I get this error when using org.apache.hadoop.hive.serde2.OpenCSVSerde: AnalysisException: Failed to load metadata for table: 'db.ext_table' CAUSED BY: TableLoadingException: Failed to load metadata for table: db.ext_table CAUSED BY: InvalidStorageDescriptorException: Impala does not support tables of this type. REASON: SerDe library 'org.apache.hadoop.hive.serde2.OpenCSVSerde' is not supported. I can read from the table via beeline, but not via Hue. Is this something (a library?) that I need to add to Hue? Or do I need to create the table in some other way? How do I deal with sql query commas in a comma delimitered log file?
... View more
Labels:
05-13-2020
10:15 PM
Thanks @kramalingam !
... View more
05-12-2020
05:11 PM
Hi @kramalingam , Thanks, but that's not what I'm asking. I already have a kerberos user with access to HBase. I'm asking about the specific code I listed above that states using the HBase keytab. I would have thought that that was a major security risk in a multi-tenant environment. That code also seems to include user config as well, which is doubly confusing: UserGroupInformation.setConfiguration(configuration);
UserGroupInformation.loginUserFromKeytab(principal, keytabLocation); Why do you need the HBase keytab and a user keytab?
... View more
05-11-2020
05:12 PM
// this is needed even if you connect over rpc/zookeeper
configuration.set("hbase.master.kerberos.principal", "hbase/_HOST@FIELD.HORTONWORKS.COM");
configuration.set("hbase.master.keytab.file", "src/hbase.service.keytab"); This seems like a security risk to hand over the HBase keytab to users. How would you set this code up to run in a secured environment? ie: you have multiple tenants accessing HBase. I wouldn't think that this would be a good security practice.
... View more
05-03-2020
11:28 PM
1 Kudo
Solved it. This works. (|(memberOf:1.2.840.113556.1.4.1941:=CN=hue-users,OU=x,DC=x,DC=x,DC=x)(memberOf:1.2.840.113556.1.4.1941:=CN=service-users,OU=x,DC=x,DC=x,DC=x))
... View more