Member since
06-03-2019
7
Posts
0
Kudos Received
0
Solutions
07-15-2020
01:38 AM
@MattWho Your guide was really helpful and it worked well. However, I think there is some problems if "Full Path" has deeper hierarchy. Firstly, I set full path like below. aa/bb/cc/dd/ It created only one flow file which was "dd", of course its type was a directory. I removed "dd" and set a dir filter and file filter to get files only I wanted. After changed full path from "aa/bb/cc/dd" to "aa/bb/cc", it made all flow file information under the cc directory. Thanks for your advice. Cheers.
... View more
07-08-2020
04:36 PM
@MattWho Hi Firstly, I appreciate your detail guide. I changed Group Result option as you guided, but I saw only one attribute which was about directory. This is my properties on GetHDFSFileInfo. And this is what I got as a result of GetHDFSFileInfo. hdfs.objectName : gfk <-- directory hdfs.path : /paxatadata/export/prod <-- parent directory hdfs.type : directory Could you advice me more?
... View more
07-07-2020
08:57 PM
Hi. I'm trying to utilize GetHDFSFIleInfo(not GetHDFS) to start after previous step. Actually, this is what I exactly want to do (https://community.cloudera.com/t5/Support-Questions/NiFi-fetchHDFS-without-ListHDFS/td-p/211708) flow streamcommand -> GetHDFSFileInfo -> FetchHDFS -> PutSFTP I did make gethdfsfileinfo first, after then tried to get flow file info from previous job at FetchHDFS. However, the result of gethdfsfileinfo was only one attribute and I don't know how to fetch all files at fetchHDFS. This is an attribute of gethdfsfileinfo. I got an attribute as below and want to fetch those all files. filename : faaa~~~~~ hdfs.count.dirs : 1 hdfs.count.files : 44 hdfs.full.tree : {"ojbectName";"gfk",""...., "content":[{"objectName":"Weekly_GfK_02_Merge_F_HP_AT.txt",".....]} hdfs.objectName : gfk .... Could you please tell how to use it? or how to solve this problem?
... View more
Labels:
- Labels:
-
Apache NiFi
06-03-2019
07:06 PM
Hi Ben, Appreciate your quick and kind response. I have one more question. After reading your answer, I've started to develope using "time-seriese" api. During programming, I found a problem with data before about 30 days. I'd like to obtain a every single minute's data of whole period but, It might be only available every 10 minute's( or longer) data set when I tried to get old resource data. I used this api below. Could you help me know how to get every single minute's such as, cpu/mem usage ? import time import datetime api_instance = cm_client.TimeSeriesResourceApi(api_client) from_time = datetime.datetime.fromtimestamp(time.time() - 7776000) to_time = datetime.datetime.fromtimestamp(time.time()) query = "select cpu_user_rate "\ " where entityname = 'xx' " # Retrieve time-series data from the Cloudera Manager (CM) time-series data store using a tsquery. result = api_instance.query_time_series(_from=from_time, query=query, to=to_time)#, desired_rollup='RAW', must_use_desired_rollup = 'true') ts_list = result.items[0] for ts in ts_list.time_series: print (ts.metadata.attributes['entityName'], ts.metadata.metric_name) for point in ts.data: print (point.timestamp, point.value) Appreciate your response. Ben,
... View more
06-03-2019
02:19 AM
Hello. I'm a new guy using Coludera. I have very simple question about a log. I need to check every single minute's resource's log such as cpu usage, mem usage, disk I/O etc, but it's hard to find that kinds of data in log files. To be more specific, I want to utilize resource logs as data frame shape for recognizing whether the server is going well or not. so, can I obtain resource data ? Aplogies for this stupid question. Appreciate your response.
... View more
Labels:
- Labels:
-
Cloudera Manager