Member since
07-14-2017
99
Posts
5
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1446 | 09-05-2018 09:58 AM | |
1967 | 07-31-2018 12:59 PM | |
1440 | 01-15-2018 12:07 PM | |
1344 | 11-23-2017 04:19 PM |
08-31-2017
12:02 PM
@Bryan Bende Thanks for the explanation. I have a usecase where syslog listener has to receive 10M messages/sec. I am worried if this can be achieved? because the processing of those messages takes quite good amount of time. Since I have to extract data out of each messages and store it to hdfs in csv, I use extracttext->replacetext->routeoncontent->puthdfs. Can you suggest me if 10Million msg/sec will be achieved.
... View more
08-15-2017
03:39 PM
@Shawn Weeks I have created a dummy test file with insert statements, works fine, data is inserted in to the table. nothing new in the configurations of puthiveQL, I just selected hive configuration pool and gave the database connection url. Thanks
... View more
08-15-2017
02:36 PM
@Matt Clarke a file is only 1-2 kb file. configuration is concurrent tasks; 1, rest are not changed. Thank you
... View more
08-15-2017
01:25 PM
I am getting data using gethdfs and did some processing and writing back to hdfs. untill puthdfs data is processing fast, but puthdfs is writing data slow to hdfs. could you please let me know how to improve the speed?
... View more
Labels:
- Labels:
-
Apache NiFi
08-09-2017
01:54 PM
Hi, I want to understand the below. 1. I have placed a log file in to hdfs 2. Then processed the log file in hdfs to csv format, say location of the csv is "/my/loc/" (both steps (1&2) using nifi). 3. created a table in hive "create external table load_table (col1, col2, ...) fields terminated by ',' lines terminated by '\n' stored as textfile location '/my/loc' -----I want to do as below----- 4. copy the above data from load_table to orig_table(internal table) 5. remove the csv file (as the data is stream data in to csv, it may grow endless). Things need clarification: I have used puthiveql processor to do step 4, as insert overwrite table orig_table select col1, col2, ... from load_table; but I was getting error that "extraneous input ";" expecting EOF near <EOF>. I thought because of ";" in the insert statement, removed it but got "org.apache.hadoop.hive.ql.exec.movetask" error. Also, for step 5, how can I remove csv, as streaming data is coming and storing in csv, this may grow huge. Thank you
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache NiFi
08-08-2017
01:50 AM
Hi, I am getting error when inserting into hive tables using puthiveql processor. Scenario: I have a file, which I managed to convert in to INSERT INTO statement ( I copied it to a location to see if any syntax mistakes are there, found no errors). Assume: INSERT INTO mytable values ('a', '', 'c', '', 'd'); when I send the above to puthive ql, I am getting the below error. I thought it is because of the ";" in the end of insert into statement, so I removed it, then I am getting the below error Please help me. Thank you.
... View more
Labels:
- Labels:
-
Apache NiFi
08-03-2017
10:58 PM
I have below data in hdfs a="alphabet_123_a" b="alphabetb" c="alphabet"c" is third one" b="newb" d="alphabet@/d" a="new a" a="changed a", b="changed b" c="changed c" e="alphabet e" My idea is: 1. Make a table in hive as orc, with columns a, b, c,d,e. 2. extract the attributes from the above data. 3. Mapping attributes according to column names in hive and storing them in hive. 4. in first line a,b,c; second line b,d,a; third line a,b,c,e 5. now after extracting all the lines and storing in hive, the values which are not present in lines (e.g. first line dont have "d" and "e"; second line dont have "c" and "e"; third line dont have "d") should be NULL, by the time they store in hive. Approach 1. Table "details" is created with columns a,b,c,d,e 2. Extract text processor is configured with custom properties as (a=)(.*?(?=\s\w+=|$)) --- [This will extract "alphabet_123a" in line 1 along with quotes(") at begening and ending of the values (b=)(.*?(?=\s\w+=|$)) --- [This will extract "aphabetb" in line 1 along with quotes...) 3. I am confused in the replace text processor, as 1. how to remove double quotes? 2. insert NULL values if the corresponding column name is missing in the line? 3. how to generalize the replace text for search value? Also let me know, how can I change the regex in extract text processor(if necessary)? Please help me Thanks
... View more
Labels:
- Labels:
-
Apache NiFi
08-03-2017
08:50 AM
@Michael Young Thanks for the suggestion, I started trying the approach. 1. I did gethdfs to get the file. 2. Splitted the file on lines (count=1) Here I got a doubt while extracting, if I am not wrong I need to extract each attribute using extract text processor. today I have 10 attributes, suppose I want to extend my attributes to 1000, then is the same approach to be followed? it become lenghty, isn't it? And the K:V are not comma saperated they are space saperated, also any value could have space in the middle of it. e.g: source="abc def ghi jkl" destination="abcdefabc" I am bit confused, please suggest me
... View more
08-02-2017
09:33 PM
@Michael Young I think I have confused you. My intention is in hdfs file we have data(say like a log message) in lines i.e, logmessage 1 in line 1, log message2 in line 2 etc. basically all messages have K:V format (key:value), similary I have around 10 K:V in a line. It is not mandatory that all 10 K:V should be there in a line (i.e. some times <10 K;V is also possible) e.g: k1="v1" k2="v2" k3="v3"... k10="v10" ** Also it is not mandatory that K:V should be in order i.e.: k1="v1" k10="v10" k3="v3" k2="v2"... is also possible Now, My idea is to 1. make a hive table as all keys (k1,k2..) as column names and all v1,v2.. as their column values 2. make a Nifi flow to read the lines(messages) in the hdfs file 3. split the lines 4. match every key with its column name and insert values in to corresponding columns. Hope I made the question clear. Can you please help me to approach this. Thankyou
... View more