Member since
02-07-2019
2748
Posts
241
Kudos Received
31
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2511 | 08-21-2025 10:43 PM | |
| 2801 | 04-15-2025 10:34 PM | |
| 7393 | 10-28-2024 12:37 AM | |
| 2497 | 09-04-2024 07:38 AM | |
| 4528 | 06-10-2024 10:24 PM |
11-15-2023
07:50 AM
1 Kudo
Hello @one4like , Pushing every local file of a job to HDFS will cause issues, especially in larger clusters. Local directories are used as scratch location. Spills of mappers are written there and moving that over to the network will have performance impacts. The local storage of the scratch files and shuffle files is done exactly to prevent this. It also has security impacts as the NM now pushes the keys for each application on to a network location which could be accessible for others. A far better solution is to use the fact that the value of yarn.nodemanager.local-dirs can point to multiple mount points and thus spreading the load over all mount points. So the answer is NO. local-dirs must contain a list of local paths. There's an explicit check in code which only allows local FS to be used. See here: https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java#L224 Please note that an exception is thrown when a non local file system is referenced. If you found this response assisted with your query, please take a moment to log in and click on KUDOS 🙂 & ”Accept as Solution" below this post. Thank you. Bjagtap
... View more
11-13-2023
10:39 PM
We did this at our end and ended up re-cycling the provenance repository much faster than usual. The huge amount of data that an output of a tailfile generates can fill up both your content and provenance repositories.
... View more
11-11-2023
06:40 AM
There is no magic solution for those scenarios and no one solution fits all out of Nifi that I can think of. You have to understand the nature of the input before you start consuming it and you have to provide the solution catered to this input. Sometimes if you are lucky you can combine multiple scenarios into one flow but that still depends on the complexity of the input. Even thought in your first scenario the second option I proposed seem to be simple enough and it did the job, for your second example its more complex and I dont think the out of the box GrokReader will be able to handle such complexity, therefore the first option of using the ExtractText Processor will work better because you can customize your regex as needed. For example, based on the text you provided: JohnCena32 Male New York USA813668 I can use the following regex: [A-Z][a-z]+[A-Z][a-z]+\d+\s(?:Male|Female|M|F)\s[A-Z][a-z]+(?:\s[A-Z][a-z]+)?\s[A-Za-z]+\d+ In the ExtractText processor I will define a dynamic property for each attribute (city, age, firstname...etc.) and surround the segment of the pattern that corresponds to the value with a parenthesis to extract as matching group. For Example: Age: [A-Z][a-z]+[A-Z][a-z]+(\d+)\s(?:Male|Female|M|F)\s[A-Z][a-z]+(?:\s[A-Z][a-z]+)?\s[A-Za-z]+\d+ FirstName: ([A-Z][a-z]+)[A-Z][a-z]+\d+\s(?:Male|Female|M|F)\s[A-Z][a-z]+(?:\s[A-Z][a-z]+)?\s[A-Za-z]+\d+ Gender: [A-Z][a-z]+[A-Z][a-z]+\d+\s((?:Male|Female|M|F))\s[A-Z][a-z]+(?:\s[A-Z][a-z]+)?\s[A-Za-z]+\d+ Country: [A-Z][a-z]+[A-Z][a-z]+\d+\s(?:Male|Female|M|F)\s[A-Z][a-z]+(?:\s[A-Z][a-z]+)?\s([A-Za-z]+)\d+ And so on... This should give you the attribute you need. Then you can use the AttributeToJson processor to get the json output and finally if you want to convert the data to the proper type you can either user JoltTransformation or QueryRecord with cast as shown above. One final note: If you know how to use some external libraries in python for example or groovy or any of the supported code script in the ExecuteScript processor then you can use that to write your custom code to create the required fllowfile\attributes that will help you downstream to generate the final output. If that helps please accept solution. Thanks
... View more
11-10-2023
01:33 AM
@Venin, Welcome to the Cloudera Community. This post may be helpful with your query: https://community.cloudera.com/t5/Support-Questions/Downloading-and-Installing-HDP-for-Windows-Hortonworks/m-p/372948
... View more
11-08-2023
09:23 PM
@Wadok88, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
... View more
11-08-2023
04:03 AM
1 Kudo
Hello Vidya, would you support me in this as well? regards, Mahrous Badr
... View more
11-08-2023
01:17 AM
1 Kudo
To add to the point of @ggangadharan, there are lots of good articles/posts why the float and even the double datatype has these problems. Note that this is not Hive / Hadoop or Java specific. https://stackoverflow.com/questions/3730019/why-not-use-double-or-float-to-represent-currency https://dzone.com/articles/never-use-float-and-double-for-monetary-calculatio https://www.red-gate.com/hub/product-learning/sql-prompt/the-dangers-of-using-float-or-real-datatypes Miklos
... View more
10-26-2023
07:32 PM
I do not want to be harsh, but I do not know better ways to show my disappointment with your support. All I wanted was to locate VMs for use with training and you failed on multiple occasions to point me to the site. Instead, I get links with limited explanations, or replied asking me if the problem was solved with no prior attempts to answer the questions. I did get that the training was relevant, but that was it and all that I now expect. I have no confidence that those who replied even checked the information about which I inquired. If there is not time to answer the question, then do not answer the question with half hearted responses. I looks bad for a company I once had interest and thought well of. You can mark this closed and unresolved as you have not answered the question and I no longer care to hear from your support team. I have wasted enough time for what should be a simple answer.
... View more
10-23-2023
01:04 AM
1 Kudo
the issue is resolve by adding user into dlhue group which was also present on ldap. the group is been given previleges by sentry to access hive tables.
... View more