Member since
07-19-2018
613
Posts
101
Kudos Received
117
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4901 | 01-11-2021 05:54 AM | |
3337 | 01-11-2021 05:52 AM | |
8642 | 01-08-2021 05:23 AM | |
8157 | 01-04-2021 04:08 AM | |
36034 | 12-18-2020 05:42 AM |
12-22-2020
11:42 AM
Amazing work here sir!
... View more
12-21-2020
09:01 AM
We have some background on schema evolution in Parquet in the docs - https://docs.cloudera.com/runtime/7.2.2/impala-reference/topics/impala-parquet.html. See "Schema Evolution for Parquet Tables". Some of the details are specific to Impala but the concepts are the same across engines including Hive and Spark that use parquet tables. At a high level, you can think of the data files being immutable while the table schema evolves. If you add a new column at the end of the table, for example, that updates the table schema but leaves the parquet files unchanged. When the table is queried, the table schema and parquet file schema are reconciled and the new column's values will be all NULL. If you want to modify the existing rows and include new non-NULL values, that would require rewriting the data, e.g. with an INSERT OVERWRITE statement for a partition or a CREATE TABLE .. AS SELECT to create an entirely new table. Keep in mind that traditional Parquet tables are not optimized for workloads with updates - Apache Kudu in particular and also transactional tables in Hive3+ have support for row-level updates that is more convenient/efficient. We definitely don't require rewriting the whole table every time you want to add a column, that would be impractical for large tables!
... View more
12-18-2020
05:42 AM
1 Kudo
Excellent news. Can you accept the first 2 responses to close this solution?
... View more
12-18-2020
05:40 AM
1 Kudo
@hakansan The error is stating your hard drive is full: could not write to file "pg_logical/replorigin_checkpoint.tmp": No space left on device "no space left on device" The solution you need is to investigate cleaning out some files to free up space, expanding disk, etc. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
12-15-2020
05:06 AM
@jainN Great looking flow. The modification you need is to simply remove json route which is combined with csv. Connect json route from Notify to FetchFile. You may need to adjust the wait/notify so that csv is released when you want. The wait/notify is often trickey, so i would recommend working with wait/notify until you understand their behavior. Here is a good article: https://community.cloudera.com/t5/Community-Articles/Trigger-based-Serial-Data-processing-in-NiFi-using-Wait-and/ta-p/248308 You may find other articles/posts here if you do some deeper research on Wait/Notify. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
12-14-2020
01:12 PM
@jainN If you are looking to route flowfiles that end in json versus those that are not, check out RouteOnAttribute with something similar to json => ${filename:endsWith('.json')}. You would use this after using your method of choice to list/fetch the files which would then provide a $filename for every flowfile. With this json property added to RouteOnAttribute you can drag the json route to a triggering flow, and send everything else (not json: unmatched) to a holding flow. NiFi Wait/Notify should be able to provide the trigger logic, but there are many other ways to do it with out wait/notify by using another datastore, map cache, etc. For example, your non json flow could simply write to a new location and finish. Then your json flow can process that new location some known amount of time later. The logic there is your use case ofc ourse, the point is to use RouteOnAttribute to split your flow. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
12-14-2020
07:05 AM
@toutou From your hdfs cluster you need hdfs-site.xml and correct configuration for PutHDFS. You may also need to satisfy creating a user with permissions on the hdfs location. Please share PutHDFS processor configuration, and error information to allow community members to respond with specific feedback required to solve your issue. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
12-13-2020
10:05 PM
Yes, That's correct Answer and it works. But do we have any other workaround, as we have disabled exec due to security reasons. So how to achieve this.?
... View more
12-04-2020
05:16 AM
@SandeepG01 Ahh no fun with bad filenames. Space in filename is highly not recommended in these days and times. That said, a solution you might try is to \ (backslash) the space. Especially in the context of passing the filename in flowfile attributes. If you still need to allow spaces and cannot resolve upstream (do not use spaces), i might suggest submitting your experience over on the NiFI jira as a bug: https://issues.apache.org/jira/projects/NIFI/issues If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
12-02-2020
12:26 AM
@stevenmatisonthe solution that i find is to get the Oauth2 token from slaesforec by using command Curl. like is explained in this page : https://www.jitendrazaa.com/blog/salesforce/using-curl-with-salesforce-rest-api/ So i create a ExecuteProces NiFi Procesor. And as parameter i put : the file C:/loginInfo.txt contains : grant_type=password& client_id= 3MVG9iTxZANhwsdsdsdsdspr0LstjR3sRat & client_secret=21961212323233121943 & username=jitendra.zaa@demo.com & password=myPWDAndSecurityToken and the i get a response with the authentication token 🙂 (you can use the cmd command Curl -x post -d @LoginInfo.txt Https://test.salesforce.com/.... to test the connection between the local machine and salesforce )
... View more