About stevenmatison

stevenmatison · ‎12-22-2020

Amazing work here sir!

Tim Armstrong · ‎12-21-2020

We have some background on schema evolution in Parquet in the docs - https://docs.cloudera.com/runtime/7.2.2/impala-reference/topics/impala-parquet.html. See "Schema Evolution for Parquet Tables". Some of the details are specific to Impala but the concepts are the same across engines including Hive and Spark that use parquet tables. At a high level, you can think of the data files being immutable while the table schema evolves. If you add a new column at the end of the table, for example, that updates the table schema but leaves the parquet files unchanged. When the table is queried, the table schema and parquet file schema are reconciled and the new column's values will be all NULL. If you want to modify the existing rows and include new non-NULL values, that would require rewriting the data, e.g. with an INSERT OVERWRITE statement for a partition or a CREATE TABLE .. AS SELECT to create an entirely new table. Keep in mind that traditional Parquet tables are not optimized for workloads with updates - Apache Kudu in particular and also transactional tables in Hive3+ have support for row-level updates that is more convenient/efficient. We definitely don't require rewriting the whole table every time you want to add a column, that would be impractical for large tables!

stevenmatison · ‎12-18-2020

Excellent news. Can you accept the first 2 responses to close this solution?

stevenmatison · ‎12-18-2020

@hakansan The error is stating your hard drive is full: could not write to file "pg_logical/replorigin_checkpoint.tmp": No space left on device "no space left on device" The solution you need is to investigate cleaning out some files to free up space, expanding disk, etc. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven

stevenmatison · ‎12-15-2020

@jainN Great looking flow. The modification you need is to simply remove json route which is combined with csv. Connect json route from Notify to FetchFile. You may need to adjust the wait/notify so that csv is released when you want. The wait/notify is often trickey, so i would recommend working with wait/notify until you understand their behavior. Here is a good article: https://community.cloudera.com/t5/Community-Articles/Trigger-based-Serial-Data-processing-in-NiFi-using-Wait-and/ta-p/248308 You may find other articles/posts here if you do some deeper research on Wait/Notify. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven

stevenmatison · ‎12-14-2020

@jainN If you are looking to route flowfiles that end in json versus those that are not, check out RouteOnAttribute with something similar to json => ${filename:endsWith('.json')}. You would use this after using your method of choice to list/fetch the files which would then provide a $filename for every flowfile. With this json property added to RouteOnAttribute you can drag the json route to a triggering flow, and send everything else (not json: unmatched) to a holding flow. NiFi Wait/Notify should be able to provide the trigger logic, but there are many other ways to do it with out wait/notify by using another datastore, map cache, etc. For example, your non json flow could simply write to a new location and finish. Then your json flow can process that new location some known amount of time later. The logic there is your use case ofc ourse, the point is to use RouteOnAttribute to split your flow. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven

stevenmatison · ‎12-14-2020

@toutou From your hdfs cluster you need hdfs-site.xml and correct configuration for PutHDFS. You may also need to satisfy creating a user with permissions on the hdfs location. Please share PutHDFS processor configuration, and error information to allow community members to respond with specific feedback required to solve your issue. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven

Ashish03 · ‎12-13-2020

Yes, That's correct Answer and it works. But do we have any other workaround, as we have disabled exec due to security reasons. So how to achieve this.?

stevenmatison · ‎12-04-2020

@SandeepG01 Ahh no fun with bad filenames. Space in filename is highly not recommended in these days and times. That said, a solution you might try is to \ (backslash) the space. Especially in the context of passing the filename in flowfile attributes. If you still need to allow spaces and cannot resolve upstream (do not use spaces), i might suggest submitting your experience over on the NiFI jira as a bug: https://issues.apache.org/jira/projects/NIFI/issues If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven

Ksad · ‎12-02-2020

@stevenmatisonthe solution that i find is to get the Oauth2 token from slaesforec by using command Curl. like is explained in this page : https://www.jitendrazaa.com/blog/salesforce/using-curl-with-salesforce-rest-api/ So i create a ExecuteProces NiFi Procesor. And as parameter i put : the file C:/loginInfo.txt contains : grant_type=password& client_id= 3MVG9iTxZANhwsdsdsdsdspr0LstjR3sRat & client_secret=21961212323233121943 & username=jitendra.zaa@demo.com & password=myPWDAndSecurityToken and the i get a response with the authentication token 🙂 (you can use the cmd command Curl -x post -d @LoginInfo.txt Https://test.salesforce.com/.... to test the connection between the local machine and salesforce )

Online	Offline
Last Visited	‎06-01-2022 03:47 PM

Name	Steven Matison
Location	Florida
Member Since	‎07-19-2018 04:45 PM
Last Visited	‎06-01-2022 03:47 PM
Posts	613
Kudos received	101

Cloudera Community

Re: Apache nifi - how to convert a file .txt into ...

Re: Apache Nifi - Using PutParquet, the HDFS file ...

Re: How to extract csv column record and used it f...

Re: Could not connect to Distributed Map Cache ser...

Re: NiFi InvokeHTTP POST JSON

Re: Smart Stocks with FLaNK (NiFi, Kafka, Flink SQ...

Re: How to add a new column to an existing parquet...

Re: NiFi InvokeHTTP POST JSON

Re: Ambari with Postgres

Re: FetchFile not forwarding new fetched files but...

Re: How can a create a NiFi workflow for - Forward...

Re: Nifi in transferring file from my local machin...

Re: ambari agent failed to start Permission denie...

Re: FetchGCSObject processor in NiFi fails to fetc...

Re: NiFi : InvokeHttp, authenticate to Salesforce...