Member since
08-08-2023
9
Posts
1
Kudos Received
0
Solutions
10-23-2023
01:08 AM
Can someone help me with this ?
... View more
10-20-2023
01:22 AM
Goodmorning, I am having troubles with Ranger Service configuration for HDFS. Once opened the Cloudera Manager panel I go to: Clusters -> HDFS -> configuration, then I look up for "ranger" to enable this setting: The issue is that just after I saved the changes whenever I move to another webpage the setting is lost. Microsoft Edge would pop up a message saying "if you leave now some setting could go lost". This happens ONLY when I try to change this setting (Ranger Service for HDFS). It doesn't happen when I modify other values for HDFS or other services. Basically I can't use Ranger for HDFS. Can someone help me on this ? Why cloudera manager won't properly save this setting ?
... View more
Labels:
- Labels:
-
Apache Ranger
09-08-2023
12:33 AM
Up! Can someone take a look at this issue ?
... View more
09-06-2023
02:55 AM
Good morning everyone! I am having troubles with the PutKudu processor. The Put Processor should write on a KUDU table, previously created by using Impala on HUE, inside the HDFS. Everything is configured but the PutKudu processor keeps showing up this error message. Kudu is configured with just one master (leader). I am thinking that this error could be caused by my one-node configuration as the docu seems to suggest to use a 3 nodes configuration at least. Is this the case ? The error message is too generic to be useful for any kind of debugging activity. The pipeline is composed by two processors: generateflowfile and putkudu. Down below you can have a look at the error messages and at the putkudu properties. If you need more information please let me know. Thank you and have a nice day.
... View more
Labels:
- Labels:
-
Apache Kudu
-
Apache NiFi
08-09-2023
01:33 AM
Hey Matt! I am truly grateful for your answer. You made me realize what I was doing wrong and I learned a lot while reading trough your explanation. I am going to revise the documentation again, as I am realizing that I did not had some concepts crystal clear before. Thanks again and I wish you a good day.
... View more
08-09-2023
01:30 AM
1 Kudo
Hey Steven! Thank you again for your reply. As @MattWho said in his reply, it looks like I was misusing the List-Fetch tandem processors. If the source directory has more than one file to be listed, linking the processor directly to the fetch will instruct the fetch processor to produce N files (with N being the number of files in the source directory) having the same content (the content of the file specified in the properties of the fetch processors) but the attributes of the original files. I thought that the fetch processor would do a sort of "filtering" by means of the "path" and "filename" variables in its properties, but in reality it does not filter anything. The documentation does not cover this potential issue. On the contrary, to be honest, the documentation seems to suggest to use them linked together. Anyway, thanks for your advice, I will follow the "work flow" you suggested whenever I am going to face other issues or bugs. Have a nice day!
... View more
08-08-2023
05:40 AM
Hey Steven! Thanks for your reply. Dataflow (pretty simple): ListSMB: (I am using "no tracking" just to make testing easier and faster. In prod we will have to set up a tracking strategy.) FetchSMB: PutHDFS: This is a sample screenshot of the queue list (between the list and the fetch processors): This is another screenshot of the queue (between fetch and putHDFS processors) As you can see the "file size" is over 60mb, but it's just a .png which should be no more than 300kb. 60mb is the file size of the file specified in the properties of the Fetch processor (20211021_PL_GIO.zip) I hope that it's better now! If you need other info or screenshots just let me know. Have a nice day.
... View more
08-08-2023
01:14 AM
Quoting directly from the official docu: FetchSMB: "Fetches files from a SMB Share. Designed to be used in tandem with ListSmb." My workflow is pretty simple, I have ListSMB which reads from a shared network directory, connected to FetchSMB which should get 1 file (the processor forces you to put the path of a specific file or it won't work) and finally PutHDFS to write all the files in a distributed file system (hadoop). I don't / can't understand why whenever I use List and Fetch in tandem the pipeline will just take all the files it can find on the directory specified in the List processor but won't write to the HDFS the single file I specified on the Fetch, instead it will write on the HDFS all the files from the List. The fetch, directly connected to the HDFS, should pass only a single file. What am I missing here ? What is the purpose of the fetch if it is "by-passed" by the list processor ? On top of that, all the file written to the HDFS will pick the same weight (KB or MB) of the single file I specified on the Fetch, resulting in many files being broken / corrupted. As you can see from the pic above. They SHOULD NOT have the same weight. I thought that list + fetch were used primarily to move big data, but then I ask myself why the Fetch processor wants me to indicate the path of a specific file and then ignore it ? I didn't find the documentation helpful at all in this regard. Thank you and have a nice day. EDIT: Added some screenshots for clarity. Dataflow (pretty simple): ListSMB: (I am using "no tracking" just to make testing easier and faster. In prod we will have to set up a tracking strategy.) FetchSMB: PutHDFS: This is a sample screenshot of the queue list (between the list and the fetch processors): This is another screenshot of the queue (between fetch and putHDFS processors) As you can see the "file size" is over 60mb, but it's just a .png which should be no more than 300kb. 60mb is the file size of the file specified in the properties of the Fetch processor (20211021_PL_GIO.zip) I hope that it's better now! If you need other info or screenshots just let me know. Have a nice day.
... View more
Labels:
- Labels:
-
HDFS
-
NiFi Registry