About mark_hadoop

mark_hadoop · ‎08-31-2017

@Bryan Bende Thanks for the explanation. I have a usecase where syslog listener has to receive 10M messages/sec. I am worried if this can be achieved? because the processing of those messages takes quite good amount of time. Since I have to extract data out of each messages and store it to hdfs in csv, I use extracttext->replacetext->routeoncontent->puthdfs. Can you suggest me if 10Million msg/sec will be achieved.

mark_hadoop · ‎08-15-2017

@Shawn Weeks I have created a dummy test file with insert statements, works fine, data is inserted in to the table. nothing new in the configurations of puthiveQL, I just selected hive configuration pool and gave the database connection url. Thanks

mark_hadoop · ‎08-15-2017

@Matt Clarke a file is only 1-2 kb file. configuration is concurrent tasks; 1, rest are not changed. Thank you

mark_hadoop · ‎08-15-2017

@Matt Clarke can you help me here please

mark_hadoop · ‎08-15-2017

I am getting data using gethdfs and did some processing and writing back to hdfs. untill puthdfs data is processing fast, but puthdfs is writing data slow to hdfs. could you please let me know how to improve the speed?

mark_hadoop · ‎08-09-2017

Hi, I want to understand the below. 1. I have placed a log file in to hdfs 2. Then processed the log file in hdfs to csv format, say location of the csv is "/my/loc/" (both steps (1&2) using nifi). 3. created a table in hive "create external table load_table (col1, col2, ...) fields terminated by ',' lines terminated by '\n' stored as textfile location '/my/loc' -----I want to do as below----- 4. copy the above data from load_table to orig_table(internal table) 5. remove the csv file (as the data is stream data in to csv, it may grow endless). Things need clarification: I have used puthiveql processor to do step 4, as insert overwrite table orig_table select col1, col2, ... from load_table; but I was getting error that "extraneous input ";" expecting EOF near <EOF>. I thought because of ";" in the insert statement, removed it but got "org.apache.hadoop.hive.ql.exec.movetask" error. Also, for step 5, how can I remove csv, as streaming data is coming and storing in csv, this may grow huge. Thank you

mark_hadoop · ‎08-08-2017

Hi, I am getting error when inserting into hive tables using puthiveql processor. Scenario: I have a file, which I managed to convert in to INSERT INTO statement ( I copied it to a location to see if any syntax mistakes are there, found no errors). Assume: INSERT INTO mytable values ('a', '', 'c', '', 'd'); when I send the above to puthive ql, I am getting the below error. I thought it is because of the ";" in the end of insert into statement, so I removed it, then I am getting the below error Please help me. Thank you.

mark_hadoop · ‎08-03-2017

I have below data in hdfs a="alphabet_123_a" b="alphabetb" c="alphabet"c" is third one" b="newb" d="alphabet@/d" a="new a" a="changed a", b="changed b" c="changed c" e="alphabet e" My idea is: 1. Make a table in hive as orc, with columns a, b, c,d,e. 2. extract the attributes from the above data. 3. Mapping attributes according to column names in hive and storing them in hive. 4. in first line a,b,c; second line b,d,a; third line a,b,c,e 5. now after extracting all the lines and storing in hive, the values which are not present in lines (e.g. first line dont have "d" and "e"; second line dont have "c" and "e"; third line dont have "d") should be NULL, by the time they store in hive. Approach 1. Table "details" is created with columns a,b,c,d,e 2. Extract text processor is configured with custom properties as (a=)(.*?(?=\s\w+=|$)) --- [This will extract "alphabet_123a" in line 1 along with quotes(") at begening and ending of the values (b=)(.*?(?=\s\w+=|$)) --- [This will extract "aphabetb" in line 1 along with quotes...) 3. I am confused in the replace text processor, as 1. how to remove double quotes? 2. insert NULL values if the corresponding column name is missing in the line? 3. how to generalize the replace text for search value? Also let me know, how can I change the regex in extract text processor(if necessary)? Please help me Thanks

mark_hadoop · ‎08-03-2017

@Michael Young Thanks for the suggestion, I started trying the approach. 1. I did gethdfs to get the file. 2. Splitted the file on lines (count=1) Here I got a doubt while extracting, if I am not wrong I need to extract each attribute using extract text processor. today I have 10 attributes, suppose I want to extend my attributes to 1000, then is the same approach to be followed? it become lenghty, isn't it? And the K:V are not comma saperated they are space saperated, also any value could have space in the middle of it. e.g: source="abc def ghi jkl" destination="abcdefabc" I am bit confused, please suggest me

mark_hadoop · ‎08-02-2017

@Michael Young I think I have confused you. My intention is in hdfs file we have data(say like a log message) in lines i.e, logmessage 1 in line 1, log message2 in line 2 etc. basically all messages have K:V format (key:value), similary I have around 10 K:V in a line. It is not mandatory that all 10 K:V should be there in a line (i.e. some times <10 K;V is also possible) e.g: k1="v1" k2="v2" k3="v3"... k10="v10" ** Also it is not mandatory that K:V should be in order i.e.: k1="v1" k10="v10" k3="v3" k2="v2"... is also possible Now, My idea is to 1. make a hive table as all keys (k1,k2..) as column names and all v1,v2.. as their column values 2. make a Nifi flow to read the lines(messages) in the hdfs file 3. split the lines 4. match every key with its column name and insert values in to corresponding columns. Hope I made the question clear. Can you please help me to approach this. Thankyou

Online	Offline
Last Visited	‎09-20-2021 09:14 AM

Member Since	‎07-14-2017 11:10 AM
Last Visited	‎09-20-2021 09:14 AM
Posts	99
Kudos received	5

Cloudera Community

Re: update TCP stream with batchsize 10000 at once...

Re: listen syslog

Re: puthbasejson

Re: Extract text and Replace text processors regex

Re: Optimizing Performance of Apache NiFi's Networ...

Re: puthiveql error

Re: puthdfs is writing slow

Re: puthiveql error

puthdfs is writing slow

deleting files after ingesting data

puthiveql error

Extract text and Replace text processors regex

Re: Ingesting unformatted, unordered data from hdf...

Re: Ingesting unformatted, unordered data from hdf...