Member since
02-01-2022
125
Posts
29
Kudos Received
24
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
211 | 03-16-2023 05:29 AM | |
91 | 03-02-2023 06:44 AM | |
214 | 03-01-2023 04:13 AM | |
95 | 02-24-2023 05:55 AM | |
79 | 02-22-2023 08:28 AM |
03-27-2023
06:24 AM
@bennour This error is an issue with your database endpoint ssl. My first suggestion is to use the following arugment in your sqoop command: useSSL=false then if this doesn't work, focus on getting a proper ssl certificate for the host.
... View more
03-21-2023
06:10 AM
The file you need to send or upload to the api, should be the content of your flowfile that routes into InvokeHttp. So some upstream process should read the file, and send the content through success relationship to InvokeHttp. For example if this file is json, I use GenerateFlowFile processor, put the json contents in there, then send to InvokeHttp.
... View more
03-21-2023
05:46 AM
@Techie123 to add headers in invokeHttp just click the + sign to add a new dynamic property, and set key to "Content-Type" and value to " multipart/form-data". You will also need to make sure the rest of your invokeHttp is confirmed correctly. Depending on your version, you can also create and send an attribute called "Content-Type" of the same value. Newest versions of NiFi will see a default property "Request Content-Type" which can also be set to attribute $mime-type which you can also define to suit your API. One other suggestion, make sure you are testing remote API Calls with something like postman to confirm all the required values. Then, work with NiFi after you have a known operational understanding for the api.
... View more
03-20-2023
05:48 AM
@Fahmihamzah84 This appears to be an issue with your schema. The BigQuery error is suggesting an issue trying to cast a string into a collection (array/list/ect). It's hard to tell which array may be causing the issue as there are many. My suggestion is to set the processor to log level DEBUG and see if you can get more verbose error. This will help you figure out which field or fields is the culprit. Keep in mind it could be one of the empty arrays too. I do not suggest the following as a solution just as path to figuring out where the problem is. Sometimes when i have issues with type casting, i make everything a string temporarily and for development. If you do this carefully one at a time, when the error goes away, you can determine which field it is. This also helps you identify a working state for your flow and allow you to work from that operational base to find solution for the end schema being the format you need.
... View more
03-17-2023
06:08 AM
1 Kudo
@anton123 Please check these references, there is important information here that everyone needs to master in order to work with ExecuteScript: https://community.hortonworks.com/articles/75032/executescript-cookbook-part-1.html https://community.hortonworks.com/articles/75545/executescript-cookbook-part-2.html https://community.hortonworks.com/articles/77739/executescript-cookbook-part-3.html Some suggestions: Make sure you have all the imports you need Make sure you are using the correct variable names for "filter". You have this defined but its not used.
... View more
03-16-2023
08:16 AM
1 Kudo
Awesome news, +2 solutions here.
... View more
03-16-2023
05:29 AM
1 Kudo
@Sivagopal Check out this post for a similar scenario. It includes a solution: https://community.cloudera.com/t5/Support-Questions/Nifi-1-16-fails-to-start-with-Decryption-exception/m-p/358190
... View more
03-13-2023
07:30 AM
@Meeran Going out on a limb, but i think the conflict is related to "LogicalType" of uuid not preparing the downstream value to what casssandra expects. The error seems to think the logicalType of the string is the wrong format. I think the error on the cassandra driver level, not with the schema or nifi itself. One suggestion i have is that you could just make that a string (without the type) and see if the driver accepts. If it does, then you know the issue is just related to the LogicalType logic. As long as your uuid is not manipulated you do not need to re-confirm its actually a uuid in the nifi data flow.
... View more
03-13-2023
07:18 AM
@larsfrancke Unfortunately I do not have the exact solution or information you need. However, I do have multiple customers whom have gotten their CDP on Isilon kerberized and in production. There were some tickets on our support side leading through the kerberos setup, but the specific technical solution came from Dell's side since this is supported solution for Isilon. My recommendation is to work with Cloudera Support to see if they have suggestions, and then work with Dell Support coming out of that. Your Cloudera account team and Dell Partner should have access to deeper resources if both support's cannot resolve.
... View more
03-02-2023
06:44 AM
1 Kudo
@fahed What you see with the CDP Public Cloud Data Hubs using GCS (or object store) is a modernization of the platform around object storage. This removes differences across aws, azure, and on-prem (when Ozone is used). It is a change by customer demand so that workloads are able to be built and deployed with minimal changes from on prem to cloud or cloud to cloud. Unfortunately that creates a difference you describe above, but those are risks we are willing to take ourselves in favor of modern data architecture. If you are looking for performance, you should take a look at some of the newer options for databases: impala and kudu (this one uses local disk). Also we have Iceberg coming into this space too.
... View more
03-01-2023
04:14 AM
Nice and Quick! Excellent!
... View more
03-01-2023
04:13 AM
@Pierro6AS First thing you should do is increase the size of the message queue. The default size is quite low (10,000 records and 1gb). It is possible to see this error if the flowfiles have been in the queue for too long. It is also possible to see this error if the file system has other usage outside of nifi. For best performance nifi's backing folder structure (content/flowfile repository) should be dedicated disks that are larger than the demand of the flow (especially during heavy unexpected volume). You can find more about this in these posts: https://community.cloudera.com/t5/Support-Questions/Unable-to-write-flowfile-content-to-content-repository/td-p/346984 https://community.cloudera.com/t5/Support-Questions/Problem-with-Merge-Content-Processor-after-switch-to-v-1-16/m-p/346096/highlight/true#M234750
... View more
02-24-2023
05:59 AM
@saketa Magic sauce right here, great article!!
... View more
02-24-2023
05:55 AM
1 Kudo
@kishan1 In order to restart a specific processor group you will need to use some command line magic against the Nifi API. For example, this could be done by using a command to stop the processor group, then the restart nifi command, then start processor group. You can certainly be creative in how you handle that approache once you have experimented with the API. https://nifi.apache.org/docs/nifi-docs/rest-api/index.html
... View more
02-24-2023
05:51 AM
@mmaher22 You may want to run the python job inside of ExecuteScript. In this manner, you can send output to a flowfile during your loops iterations with: session.commit() This command is inferred at the end of the code execution in ExecuteScript to send output to next processor (1 flow file). So if you just put that in line with your loop, then the script will run, and send flowfiles for every instance. For a full rundown of how to use ExecuteScript be sure to see these great articles: https://community.hortonworks.com/articles/75032/executescript-cookbook-part-1.html https://community.hortonworks.com/articles/75545/executescript-cookbook-part-2.html https://community.hortonworks.com/articles/77739/executescript-cookbook-part-3.html
... View more
02-23-2023
05:09 AM
1 Kudo
@fahed That size is to be able to grow and serve in production manner. At first that disk usage could be low. For DataHubs, My recommendation is to start small and grow as needed. Most of your work load data should be in object store(s) for the data hubs, so dont think of that "hdfs" disk as being size constrained to initial creations of the hub.
... View more
02-22-2023
08:28 AM
1 Kudo
@merlioncurry Lacking a bit of deatils, so making some assumptions that you used an Ambari UI to upload to HDFS. So those files are going to be in hdfs://users/maria_dev, not on the actual machine location for the same users. You will need use hdfs commands to view them. If they do not work, then the path you uploaded may be different. From the sandbox prompt: hdfs dfs -ls /users/ hdfs dfs -ls /users/maria_dev
... View more
02-22-2023
08:16 AM
1 Kudo
@fahed The HDFS Service inside of the DataLake is supporting of the environment, and its services. For example: Atlas. Ranger, Solr, Hbase. It's size, is based on the environment scale. You are correct in the assumption that your end user HDFS Service is part of Data Hubs deployed around the environment. You should not try to use the environment's HDFS Service for applications and workloads that would be part of deeper Data Hubs.
... View more
02-09-2023
06:31 AM
Click into that doc and check out the other escape option. I think you need to handle the quotes too.
... View more
02-09-2023
06:22 AM
1 Kudo
@Techie123 Well, like i said, you have to learn the aws side of providing access to a bucket. A public bucket starting point will show you what you have to do, inside of the bucket config, to allow other systems to access that bucket. For example starting from public open bucket, to whatever access control level you ultimately need to have. Getting lost in that space is not necessarily a "nifi" thing.... so my recommendation is to build nifi with public bucket, THEN when it works, start testing the deeper access requirements. The controller service configuration provides multiple ways to access a bucket and a bunch of settings. Make sure you have a working access/key credentials tested directly in the processor before moving to the Controller Service.
... View more
02-09-2023
06:14 AM
@codiste_m By default hive will be using Static Partitioning. With Hive you can do Dynamic Partitioning, but i am not sure how well that works with existing data in existing folders. I believe this creates the correct partitions based on the schema, and is creating those partition folders as the data inserts into the storage path. It sounds like you will need to execute a load data command for all partitions you want to query.
... View more
02-09-2023
05:53 AM
@ShobhitSingh You need to handle the escape with another option: .option("escape", "\\") You may need to experiment with the actual string in the match argument ("//") to suit your needs. Be sure to check spark docs specific to your version. For example: https://spark.apache.org/docs/latest/sql-data-sources-csv.html
... View more
02-09-2023
05:44 AM
@Iwantkakao There are 2 things i see right off bat: Tue Jan 31 19:47:13 UTC 2023 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. ^^ consider the recommendation: useSSL=false in your sqoop command. 23/01/31 19:47:14 ERROR manager.SqlManager: Error executing statement: java.sql.SQLException: Access denied for user ''@'localhost' (using password: NO) java.sql.SQLException: Access denied for user ''@'localhost' (using password: NO) ^^ this error is saying that your user does not have access to mysql. You are going to need to provide a specific user, password, and host, with permissions and grants accordingly. If your user is root, add the username and password to the command. Last but not least, sqoop project is now "in the attic" which means the project is no longer actively getting support and developement from the open source community. I recommend that you learn other techniques to complete the same outcome.
... View more
02-09-2023
05:34 AM
@Techie123 You are going to need to provide credentials for the nifi calls against any s3 bucket with access controls. I would recommend working in lower nifi dev environments and use a public S3 buckets to get comfortable. This will test basics of your flow without access issues. This will also remove confusion around nifi flow functionality vs AWS access issues and help you learn when/where to use the different ways (key, or a controler service w/ credentials) to provide access from nifi to s3.
... View more
02-07-2023
05:02 AM
@Abdulrahmants if you need to talk to someone about getting those added, please reach out in direct message. Another approach could be to create an API input endpoint on nifi (handleHttpRequest/handleHttpResponse), and make a scripted (python,java,etc) process to send the file to the nifi endpoint.
... View more
02-06-2023
07:25 AM
1 Kudo
I believe this is a job for MiNiFi https://nifi.apache.org/minifi/index.html Basically, you create a small minifi flow to run on the server/network with privelaged access, and this flow will send its results to NiFi.
... View more
02-02-2023
06:40 AM
@samrathal Per the docs you need to provide the ID: Request consumes: */* Name Location Type Description id path string The connection id. flowfile-uuid path string The flowfile uuid. clusterNodeId query string The id of the node where the content exists if clustered. You should be getting that ID from the previous get call ( Gets a FlowFile from a Connection ) within the response is an object: FlowFileEntity Inside is the "clusterNodeId" corresponding to which node that flowfile exists on. { "flowFile": { "uri": "value" , "uuid": "value" , "filename": "value" , "position": 0 , "size": 0 , "queuedDuration": 0 , "lineageDuration": 0 , "penaltyExpiresIn": 0 , "clusterNodeId": "value" , "clusterNodeAddress": "value" , "attributes": { "name": "value" } , "contentClaimSection": "value" , "contentClaimContainer": "value" , "contentClaimIdentifier": "value" , "contentClaimOffset": 0 , "contentClaimFileSize": "value" , "contentClaimFileSizeBytes": 0 , "penalized": true } } Those details are exposed in the nifi api doc, just be sure to click into the entity.
... View more
01-23-2023
06:52 AM
@phaelax ExecuteScript ExecuteProcess ExecuteStreamCommand etc are some of the hardest configs in nifi. It is very hard to give guidance without exact templates, configs, scripts, etc. That said, i would recommend ExecuteScript and python over bash. If that is interesting to. you, you should spend some time consuming the 3 part series on ExecuteScript by @MattWho . I believe the first part explains how to get flowfile attributes (filename) or flowfile content from previous processor flowfile (getFile) into the script. https://community.hortonworks.com/articles/75032/executescript-cookbook-part-1.html https://community.hortonworks.com/articles/75545/executescript-cookbook-part-2.html https://community.hortonworks.com/articles/77739/executescript-cookbook-part-3.html
... View more
01-23-2023
06:43 AM
@prakashrulez This sounds like a job for new nifi relationship feature. This feature allows you to indicate the number of retries before a failure.
... View more
01-23-2023
06:31 AM
@BRinxen First, I feel your pain, as this sandbox was always an issue. Some advice below. Second, i would highly recommend you find a way to do something with hive,spark in another more modern form factor (not old hortonworks sandbox). That said, you are going to need like 32 gb of ram on a very beefy machine to role the whole sandbox even then it will struggle. If you have less resources, you willy only be able to run a few services, not the whole stack. Turn everything else off/maintenance mode. Start yarn, mapreduce, hdfs first. Then begin to start hive. Expect things to take a long time so be patient. Make sure nothing else is running on the main machine.
... View more