About thierry_vernhet

thierry_vernhet · ‎06-08-2018

OK Thanks for all

thierry_vernhet · ‎06-07-2018

Hi @Vinicius Higa Murakami Thanks Vinicius. You're right. I didn't see the "ARMING" records with the "select...limit 2;" because the's a lot of "client" records and few "ARMING" records. But now, how exclude other records and have only "ARMING" rows in the final table and no row "NULL" ?

thierry_vernhet · ‎06-06-2018

Hi, I would like to create an external table with an Hive regex expression by selecting lines containing "ARMING" (in uppercase). The records HDFS look like this 2018-06-06T11:28:54+02:00 sazqye named[980]: ARMING trigger on (action:LOG) (T6-Recursive-attacks recursive-time: 1283) 2018-06-06T11:20:27+02:00 sazqyd named[92960]: client (1.debian) ... My request : CREATE EXTERNAL TABLE my_arming_table ( dc_syslog_date STRING, dc_syslog_hostname STRING, dc_syslog_process STRING, dc_logtype STRING, dc_message STRING) PARTITIONED BY (yearbrut INT, monthbrut INT, daybrut INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES( 'input.regex'='^(\\S+)\\s(\\S+)\\s(\\S+)\\s(ARMING)\\s(.*)') STORED AS TEXTFILE; The result is KO : > select * from my_arming_table limit 2 ; OK NULL NULL NULL NULL NULL 0 0 0 NULL NULL NULL NULL NULL 0 0 0 And if I try this request (with client in lowercase) CREATE EXTERNAL TABLE my_client_table ( dc_syslog_date STRING, dc_syslog_hostname STRING, dc_syslog_process STRING, dc_logtype STRING, dc_message STRING) PARTITIONED BY (yearbrut INT, monthbrut INT, daybrut INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES( 'input.regex'='^(\\S+)\\s(\\S+)\\s(\\S+)\\s(client)\\s(.*)') STORED AS TEXTFILE; The result is OK > select * from my_client_table limit 2 ; OK 2018-06-06T11:12:55+02:00 sazqyd named[92960]: client (swza6z) 0 0 0 2018-06-06T11:13:10+02:00 sazqyd named[92960]: client (osce01) 0 0 0 Does anybody knows why it doesn't work with uppercase in the regex expression ? Thanks

thierry_vernhet · ‎10-25-2017

@Aditya Sirna, Of course... I'm going to try this. Thanks

thierry_vernhet · ‎10-25-2017

Thanks but it doesn't work for the same reason. When you "mv /mydirectory /targetdirectory" the result is always /targetdirectory/mydirectory.

thierry_vernhet · ‎10-25-2017

Thanks Not possible because the result is /targetdirectory/mydirectory and I expect all the files moved in path /targetdirectory/*

thierry_vernhet · ‎10-25-2017

Hello, I've got 30 thousand of files to move to another hdfs directory. Do you know a better way than "hdfs dfs -mv /mydirectory/* /targetdirectory" to go faster ? Average size of a file : 10 Kb. And I can't merge the files in a bigger one before. Thanks for your feedback

thierry_vernhet · ‎10-23-2017

Thanks a lot Pavan., I've just modified ... $5 == "0" ... by ... $5 != "0" ... because I don't want to move files with "0"size. for f in $(hdfs dfs -ls /tmp/files | awk '$1 !~ /^d/ && $5 != "0" { print $8 }');do hdfs dfs -mv "$f"/tmp/files/exclude-files;done

thierry_vernhet · ‎10-23-2017

Except "." character and timestamp, all the files have the same name. So It's impossible to use pattern.

thierry_vernhet · ‎10-19-2017

Hello, I would like to move lot of files of a hdfs directory but not the files with size at 0 and name like ".*" For example, move only files "file3" "file4" file5" but not the files "file1" and "file2". These ones have'ont been not entirely written in hdfs directory when I execute the "hdfs dfs -mv" command. hdfs@host:~> hadoop dfs -ls /mydirectory Found 1942 items -rw-r----- 3 xagcla02 hdfs 0 2017-10-19 18:07 /mydirectory/.file1 -rw-r----- 3 xagcla02 hdfs 0 2017-10-19 18:07 /mydirectory/.file2 -rw-r----- 3 xagcla02 hdfs 2540 2017-10-19 18:07 /mydirectory/file3 -rw-r----- 3 xagcla02 hdfs 2540 2017-10-19 18:07 /mydirectory/file4 -rw-r----- 3 xagcla02 hdfs 5252 2017-10-19 18:07 /mydirectory/file5 … Thanks for your feedbacks

Online	Offline
Last Visited	‎05-23-2019 01:10 PM

Member Since	‎05-24-2016 01:03 PM
Last Visited	‎05-23-2019 01:10 PM
Posts	45
Kudos received	4

Cloudera Community

Re: HIVE Regex with Uppercase

Re: HIVE Regex with Uppercase

HIVE Regex with Uppercase

Re: How to speed up "hdfs dfs -mv" for more than 3...

Re: How to speed up "hdfs dfs -mv" for more than 3...

Re: How to speed up "hdfs dfs -mv" for more than 3...

How to speed up "hdfs dfs -mv" for more than 30 00...

Re: how to exclude .* files while executing "hdfs ...

Re: how to exclude .* files while executing "hdfs ...

how to exclude .* files while executing "hdfs dfs ...