Member since
05-24-2016
45
Posts
4
Kudos Received
0
Solutions
06-07-2018
08:07 AM
Hi @Vinicius Higa Murakami Thanks Vinicius. You're right. I didn't see the "ARMING" records with the "select...limit 2;" because the's a lot of "client" records and few "ARMING" records. But now, how exclude other records and have only "ARMING" rows in the final table and no row "NULL" ?
... View more
06-06-2018
10:26 AM
Hi, I would like to create an external table with an Hive regex expression by selecting lines containing "ARMING" (in uppercase). The records HDFS look like this 2018-06-06T11:28:54+02:00 sazqye named[980]: ARMING trigger on (action:LOG) (T6-Recursive-attacks recursive-time: 1283) 2018-06-06T11:20:27+02:00 sazqyd named[92960]: client (1.debian) ... My request : CREATE EXTERNAL TABLE my_arming_table (
dc_syslog_date STRING,
dc_syslog_hostname STRING,
dc_syslog_process STRING,
dc_logtype STRING, dc_message STRING)
PARTITIONED BY (yearbrut INT, monthbrut INT, daybrut INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES(
'input.regex'='^(\\S+)\\s(\\S+)\\s(\\S+)\\s(ARMING)\\s(.*)')
STORED AS TEXTFILE; The result is KO : > select * from my_arming_table limit 2 ; OK
NULL NULL NULL NULL NULL 0 0 0 NULL NULL NULL NULL NULL 0 0 0 And if I try this request (with client in lowercase) CREATE EXTERNAL TABLE my_client_table (
dc_syslog_date STRING,
dc_syslog_hostname STRING,
dc_syslog_process STRING,
dc_logtype STRING,
dc_message STRING)
PARTITIONED BY (yearbrut INT, monthbrut INT, daybrut INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES(
'input.regex'='^(\\S+)\\s(\\S+)\\s(\\S+)\\s(client)\\s(.*)')
STORED AS TEXTFILE; The result is OK > select * from my_client_table limit 2 ;
OK
2018-06-06T11:12:55+02:00 sazqyd named[92960]: client (swza6z) 0 0 0 2018-06-06T11:13:10+02:00 sazqyd named[92960]: client (osce01) 0 0 0 Does anybody knows why it doesn't work with uppercase in the regex expression ? Thanks
... View more
Labels:
- Labels:
-
Apache Hive
10-25-2017
01:36 PM
@Aditya Sirna, Of course... I'm going to try this. Thanks
... View more
10-25-2017
01:04 PM
Thanks but it doesn't work for the same reason. When you "mv /mydirectory /targetdirectory" the result is always /targetdirectory/mydirectory.
... View more
10-25-2017
12:22 PM
Thanks Not possible because the result is /targetdirectory/mydirectory and I expect all the files moved in path /targetdirectory/*
... View more
10-25-2017
12:07 PM
Hello, I've got 30 thousand of files to move to another hdfs directory. Do you know a better way than "hdfs dfs -mv /mydirectory/* /targetdirectory" to go faster ? Average size of a file : 10 Kb. And I can't merge the files in a bigger one before. Thanks for your feedback
... View more
Labels:
- Labels:
-
Apache Hadoop
10-23-2017
11:35 AM
1 Kudo
Thanks a lot Pavan., I've just modified ... $5 == "0" ... by ... $5 != "0" ... because I don't want to move files with "0"size.
for f in $(hdfs dfs -ls /tmp/files | awk '$1 !~ /^d/ && $5 != "0" { print $8 }');do hdfs dfs -mv "$f"/tmp/files/exclude-files;done
... View more
10-23-2017
06:40 AM
Except "." character and timestamp, all the files have the same name. So It's impossible to use pattern.
... View more
10-19-2017
04:14 PM
Hello, I would like to move lot of files of a hdfs directory but not the files with size at 0 and name like ".*" For example, move only files "file3" "file4" file5" but not the files "file1" and "file2". These ones have'ont been not entirely written in hdfs directory when I execute the "hdfs dfs -mv" command. hdfs@host:~>
hadoop dfs -ls /mydirectory Found 1942
items -rw-r----- 3 xagcla02 hdfs 0 2017-10-19 18:07 /mydirectory/.file1 -rw-r----- 3 xagcla02 hdfs 0 2017-10-19 18:07 /mydirectory/.file2 -rw-r----- 3 xagcla02 hdfs 2540 2017-10-19 18:07 /mydirectory/file3 -rw-r----- 3 xagcla02 hdfs 2540 2017-10-19 18:07 /mydirectory/file4 -rw-r----- 3 xagcla02 hdfs 5252 2017-10-19 18:07 /mydirectory/file5 … Thanks for your feedbacks
... View more
Labels:
- Labels:
-
Apache Hadoop