Member since
07-21-2021
405
Posts
10
Kudos Received
17
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
784 | 05-06-2022 11:10 AM | |
1202 | 04-12-2022 11:59 PM | |
925 | 03-17-2022 09:57 AM | |
398 | 03-17-2022 09:54 AM | |
722 | 03-14-2022 08:49 AM |
01-20-2023
09:38 PM
you can create custom nar file and then put into lib folder of $NIFI_HOME directory and restart your nifi server. Add dependecy in processor module & then write a java code then build and create your nar file. <dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>4.1.2</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>4.1.2</version>
</dependency>
<dependency>
<groupId>com.opencsv</groupId>
<artifactId>opencsv</artifactId>
<version>5.1</version>
<exclusions>
<exclusion>
<artifactId>commons-logging</artifactId>
<groupId>commons-logging</groupId>
</exclusion>
</exclusions>
</dependency> /*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.anoop.converter;
import com.opencsv.CSVReader;
import com.opencsv.exceptions.CsvValidationException;
import org.apache.nifi.annotation.behavior.*;
import org.apache.nifi.components.PropertyDescriptor;
import org.apache.nifi.flowfile.FlowFile;
import org.apache.nifi.annotation.lifecycle.OnScheduled;
import org.apache.nifi.annotation.documentation.CapabilityDescription;
import org.apache.nifi.annotation.documentation.SeeAlso;
import org.apache.nifi.annotation.documentation.Tags;
import org.apache.nifi.processor.AbstractProcessor;
import org.apache.nifi.processor.ProcessContext;
import org.apache.nifi.processor.ProcessSession;
import org.apache.nifi.processor.ProcessorInitializationContext;
import org.apache.nifi.processor.Relationship;
import org.apache.nifi.processor.io.StreamCallback;
import org.apache.nifi.processor.util.StandardValidators;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
@Tags({"csvToExcel"})
@CapabilityDescription("This processor can convert CSV flow files into Excel flow file")
@SeeAlso({})
@ReadsAttributes({@ReadsAttribute(attribute="", description="")})
@WritesAttributes({@WritesAttribute(attribute="", description="")})
@InputRequirement(InputRequirement.Requirement.INPUT_REQUIRED)
public class CsvToExcel extends AbstractProcessor {
public static final Relationship REL_SUCCESS = new Relationship.Builder()
.name("original")
.description("The original file")
.build();
private List<PropertyDescriptor> descriptors;
private Set<Relationship> relationships;
@Override
protected void init(final ProcessorInitializationContext context) {
descriptors = Collections.emptyList();
relationships = new HashSet<>();
relationships.add(REL_SUCCESS);
relationships = Collections.unmodifiableSet(relationships);
}
@Override
public Set<Relationship> getRelationships() {
return this.relationships;
}
@Override
public final List<PropertyDescriptor> getSupportedPropertyDescriptors() {
return descriptors;
}
@OnScheduled
public void onScheduled(final ProcessContext context) {}
@Override
public void onTrigger(final ProcessContext context, final ProcessSession session) {
FlowFile flowFile = session.get();
if ( flowFile == null ) {
return;
}
session.write(flowFile, new Converter());
session.putAttribute(flowFile,"convertedIntoExcel","true");
session.transfer(flowFile,REL_SUCCESS);
}
}
class Converter implements StreamCallback {
@Override
public void process(InputStream in, OutputStream out) throws IOException {
try {
streamConversion(in,out);
} catch (CsvValidationException e) {
throw new RuntimeException(e);
}
}
private void streamConversion(InputStream in, OutputStream out) throws IOException, CsvValidationException {
CSVReader csvReader = new CSVReader(new InputStreamReader(in));
XSSFWorkbook workbook = new XSSFWorkbook();
XSSFSheet sheet = workbook.createSheet("Sheet1");
String[] rowData = null;
int rowNum = 0;
while ((rowData = csvReader.readNext()) != null) {
Row row = sheet.createRow(rowNum++);
int colNum = 0;
for (String cellData : rowData) {
Cell cell = row.createCell(colNum++);
cell.setCellValue(cellData);
}
}
workbook.write(out);
workbook.close();
}
}
... View more
10-28-2022
01:06 PM
@D5ha Not all processors write to the content repository nor is content of a FlowFile ever modified in the content after it is created. Once a FlowFile is created in NiFi it exists as is until terminated. A NiFi FlowFile consists of two parts, FlowFile Attributes (metatadata about the FlowFile which includes details about the FlowFIle's content location in the content_repository) and the FlowFile content itself. When a downstream processor modifies the content of a FlowFile, what is really happening is a new content is written to a new content claim in the content_repository, the original content still remains unchanged. From what you shared, you appear to have just one content_repository. Within that single content_repository, NiFi creates a bunch of sub-directories. NiFi does this because of the massive number of content claims a user's dataflow(s) may hold for better indexing and seeking. What is very important to also understand is that a content claim in the content_repository can hold the content for 1 or more FlowFiles. It is not always one content claim per FlowFiles content. It is also very possible to have multiple queued FlowFiles pointing to the exact same content claim and offset (same exact content). This happens when you dataflow clones a FlowFile (for example routing same outbound relationship from a processor multiple times). So you should never manually delete claims from any content repository as you may delete content for multiple FlowFiles. That being said, you can use data provenance to locate the content_repository (container), subdirectory (section), Content Claim filename(Identifier), Content offset byte where content begins in that claim (Offset), and number of bytes from offset to end of content in the claim (Size). Right click on a processor and select "view data provenance" from displayed context menu: This will list all FlowFiles for which provenance still holds index data on that were processed by this processor: Click the Show Lineage icon (looks like 3 connected circles) to the far right of a FlowFile. You can right click on "clone" and "join" events to find/expand any parent flowfiles in the lineage (the event dot created for the processor on which you said show provenance will be colored red in the lineage graph): Each white circle is a different FlowFile. clicking on a white circle will highlight dataflow path for that FlowFile. Right clicking on an event like "create" and selecting "view details" will tell you all about what is known about that FlowFile (this includes a tab about the "content"): Container corresponds to the following property in the nifi.properties file: nifi.content.repository.directory.default= Section corresponds to subdirectory within the above content repository path. Identifier is the content claim filename. Offset is the byte on which content for this FlowFile begins within that identifier. Size is number of bytes of you reach end of content for that FlowFile's content in the Identifier. I also created an article on how to index the Content Identifier. Indexing a field allows you to locate a content claim and the search for it in your data provenance to find all FlowFile(s) that pointed at it. You can then look view the details of all those FlowFile(s) to see full content calim details as above: https://community.cloudera.com/t5/Community-Articles/How-to-determine-which-FlowFiles-are-associated-to-the-same/ta-p/249185 If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
10-26-2022
11:19 PM
Thanks for your response. I tried to call the sales to purchase an individual license, they do not sell an individual license. Is there anyway to install an open source Azure quickstart ? https://docs.cloudera.com/cdp-public-cloud/cloud/azure-quickstart/topics/mc-azure-quickstart.html#mc-azure-quickstart I finally installed a docker version of Cloudera QuickStart . however the File Browsing is missing from HDFS.
... View more
05-17-2022
01:15 AM
The linked thread is a walkthrough on how to secure a NiFi Registry instance locally. I’m looking for instructions on how to connect to a secure NiFi Registry deployed on CDP Data Hub. I’m running on AWS infrastructure. The Data Hub is deployed using default settings and resides in a private subnet.
... View more
04-13-2022
12:05 AM
Hello Please refer to https://community.cloudera.com/t5/Community-Articles/Using-RStudio-as-an-Editor-with-ML-Runtimes/ta-p/325166 Was your question answered on cloudera community portal ? Make sure to mark the answer as the accepted solution. If you find a reply useful, say thanks by clicking on the thumbs up button.
... View more
03-17-2022
11:54 AM
Hello @Koffi The balancer will do the job for you, please refer to the below Official docs before configuring it. 1- Overview of the HDFS Balancer 2- Configuring the Balancer Was your question answered? Make sure to mark the answer as the accepted solution. If you find a reply useful, say thanks by clicking on the thumbs up button.
... View more
03-17-2022
09:54 AM
Hello @Soa
Hive partition divides the table into a number of partitions and these partitions can be further subdivided into more manageable parts known as Buckets or Clusters. The Bucketing concept is based on Hash function, which depends on the type of the bucketing column. Records which are bucketed by the same column will always be saved in the same bucket.
The Bucketing concept is based on Hash function, which depends on the type of the bucketing column. Records which are bucketed by the same column will always be saved in the same bucket. Here, CLUSTERED BY clause is used to divide the table into buckets. each partition will be created as a directory. But in Hive Buckets, each bucket will be created as a file. Bucketing can also be done even without partitioning on Hive tables.
Bucketed tables allow much more efficient sampling than the non-bucketed tables. Allowing queries on a section of data for testing and debugging purpose when the original data sets are very huge. Here, the user can fix the size of buckets according to the need. This concept also provides the flexibility to keep the records in each bucket to be sorted by one or more columns. Since the data files are equal sized parts, map-side joins will be faster on the bucketed tables.
Was your question answered? Make sure to mark the answer as the accepted solution. If you find a reply useful, say thanks by clicking on the thumbs up button.
... View more
03-15-2022
08:41 AM
Hello @Azhar_Shaikh Thanks for the reply, as it turns out it wasn't a service account problem. We found that the ListS3's output included a 'key' field, and this is what was required in The FetchS3Object processor for 'Object Key'. So the fix I applied was to split the json into individual records (SplitJson), then pull the keys out as attributes (EvaluateJsonPath) then input ${key} into the FetchS3 processor.. worked a treat.
... View more
03-14-2022
08:49 AM
@RajeshReddy for tag based policies you can refer to https://docs.cloudera.com/runtime/7.2.10/security-ranger-authorization/topics/security-ranger-tag-based-policies.html Was your question answered? Make sure to mark the answer as the accepted solution. If you find a reply useful, say thanks by clicking on the thumbs up button.
... View more
03-10-2022
03:18 AM
1 Kudo
@mehmetersoy CM does not have dependency on samba, and does not use any samba packages. Was your question answered? Make sure to mark the answer as the accepted solution. If you find a reply useful, say thanks by clicking on the thumbs up button.
... View more
03-10-2022
03:05 AM
Hello @vishal_ Yes you are right. Machine users in CDP have programmatic access. If you have IDP integration with CDP you can create one user at Azure and add the user to the Azure AD group that is mapped with CDP and ask the user to login from Azure end to access the CDP application. If you are using CDP local users (users are directly created in CDP) you can reach out to your accounts team or open an administrative case from support portal to add the user to CDP and then you can manage access accordingly. I hope I have answered your question. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post.
... View more
03-08-2022
01:37 AM
Hello @corestack We hope the Post by @Azhar_Shaikh pointing to Link [1] helps your Team as there has been no further response from your side. As such, We shall mark the Post as Resolved. Feel free to share any concerns with your Team's CDP Adoption via a Post in Community & We shall help your Team. Regards, Smarak [1] https://community.cloudera.com/t5/Community-Articles/How-to-configure-Single-Sign-On-SSO-for-CDP-Public-Cloud-the/ta-p/300222
... View more
03-06-2022
09:31 PM
@andrea_pretotto, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
... View more
01-14-2022
09:37 AM
Hello @jludac Thanks for letting us know. Yes. To access the archive repos, you will need to access it through the shared wall credentials
... View more
01-10-2022
01:20 AM
Hi @Pravin93, Has the reply from @Azhar_Shaikh helped resolve your issue? If so, can you please mark the reply as the solution? It will make it easier for others to find the answer in the future.
... View more
01-05-2022
10:09 AM
@Talavera Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
... View more
12-23-2021
12:06 PM
@Sumita Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. If you are still experiencing the issue, can you provide the information @Azhar_Shaikh has requested? Thanks
... View more
12-17-2021
03:47 AM
Hi Nidhin , Thank for your support. With your solution I solved the problem Regards, Khang
... View more
12-13-2021
10:38 AM
@IAJ Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thank you.
... View more
12-08-2021
08:35 AM
Hello @KhangNguyen , Was there any recent commissioning/decommissioning of nodes? Do you see any health alerts on data nodes related to space? Can you share the output of hdfs dfsadmin -report command
... View more
12-08-2021
02:56 AM
Hello @ChandravadanD
I see you have registered for the free trial and have not received the access details.
You will need to reach out to the Cloudera Sales team to check this.
... View more
12-02-2021
09:11 PM
@Jarinek, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
... View more
11-30-2021
10:08 PM
@SimonBergerard, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
... View more
11-23-2021
07:25 AM
@MattWho You were right, the missing rootCA and intermediate certificates missed in the nifi nodes truststore were the cause of the problem(s)! As soon as I added them in each nifi node truststore, it solved my problem and the node were able to communicate and transmit heartbeat through port 11443! Thanks a lot for your time and your help! Best regards Emmanuel
... View more
10-14-2021
11:15 AM
1 Kudo
Can you please check zookeeper logs once and share if any errors in the zookeeper logs
... View more
09-20-2021
10:16 PM
Can you take a back of up existing flow.xml.gz and remove the flow.xml.gz. Once you removed the flow.xml.gz try restarting the nifi from the backend.
... View more