About Lallagreta

Lallagreta · ‎04-12-2021

Hi, thank you. However, this is the mechanism of every List processor: it keeps track of the files already processed. Instead, I was wondering if there was a way to start a trigger or a notification to the flow when new ones arrive in the blob. Thanks

Daming Xue · ‎04-09-2021

Hello According to the documentation related to the state management, it will only pull the new files compared to last run https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-azure-nar/1.5.0/org.apache.nifi.processors.azure.storage.ListAzureBlobStorage/ State management: Scope Description CLUSTER After performing a listing of blobs, the timestamp of the newest blob is stored. This allows the Processor to list only blobs that have been added or modified after this date the next time that the Processor is run. State is stored across the cluster so that this Processor can be run on Primary Node only and if a new Primary Node is selected, the new node can pick up where the previous node left off, without duplicating the data.

Lallagreta · ‎03-08-2021

yes it's quite right! i add a volumes on docker compose that connect directory on VM to docker-compose. Thank you so much!

MattWho · ‎01-21-2021

@Lallagreta Make sure you do not have any line returns in the values for your dynamic properties added in the UpdateAttribute processor. When you click on the value field for each property you should not see a line "2". For example: Above would result in the value assigned to the FlowFile Attribute having a line return. If this is the case, edit the properties value(s) to remove the line returns so you only see one line (1). Hope this helps, Matt

stevenmatison · ‎01-11-2021

You must have the reader incorrectly configured for your CSV schema.

stevenmatison · ‎01-11-2021

@Lallagreta You should be able to define the filename, or change the filename to what you want. That said the filename doesnt dictate the type, so you can have parquet saved as .txt. One recommendation I have is to use parquet command line tools during the testing of your use case. This is the best way to validate that files are looking right, have the right schema, and right results. https://pypi.org/project/parquet-tools/ I apologize i do not have any exact samples, but from my recall of a year ago, you should be able to get simple commands to check schema of a file, and another command to show the data results. You may have to copy your hdfs file to local file system to inspect them from command line. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven

Online	Offline
Last Visited	‎09-22-2021 09:16 AM

Member Since	‎01-08-2021 09:04 AM
Last Visited	‎09-22-2021 09:16 AM
Posts	12
Kudos received	2

Cloudera Community

Re: Apache Nifi - Trigger/notification for new dat...

Re: Apache Nifi: ListAzureBlobStorage doesn't ta...

Re: Nifi load file from Virtual machine azure

Re: AttributesToJson into Elasticsearch doesn't wo...

Re: Apache nifi - how to convert a file .txt into ...

Re: Apache Nifi - Using PutParquet, the HDFS file ...