2206
Posts
230
Kudos Received
82
Solutions
About
My expertise is not in hadoop but rather online communities, support and social media. Interests include: photography, travel, movies and watching sports.
My Accepted Solutions
Title | Views | Posted |
---|---|---|
438 | 05-07-2025 11:41 AM | |
904 | 02-27-2025 12:49 PM | |
2776 | 06-29-2023 05:42 AM | |
2361 | 05-22-2023 07:03 AM | |
1725 | 05-22-2023 05:42 AM |
06-23-2023
06:04 AM
1 Kudo
Welcome to the community @samans I'm not an expert but I did some searching and may have found something for you. I would review the Cloudera ODBC Connector for Apache Impala documentation as I see some references to transaction statements not being supported and how to work around it. Not sure if that is the issue here, but hope it helps.
... View more
06-22-2023
07:44 AM
hi @rki_ / @cjervis , I forgot to ask, today the cluster already has thousands of blocks in hdfs, more than 23 million blocks. after configuring the rack in the cluster, the hdfs will recognize the racks and will start moving the blocks to the racks to increase the availability of the blocks or will i have to rebalance the hdfs?
... View more
06-15-2023
11:41 AM
This simple InvokeScriptedProcessor will look for a FlowFile attribute called "ip_address" and will attempt the reverse lookup and create a new attribute called "host_name" with the resolved value. import groovy.json.JsonOutput
import groovy.json.JsonSlurper
import java.net.InetAddress
import java.net.UnknownHostException
import java.nio.charset.StandardCharsets
import org.apache.commons.io.IOUtils
class GroovyProcessor implements Processor {
PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder()
.name("BATCH_SIZE")
.displayName("Batch Size")
.description("The number of incoming FlowFiles to process in a single execution of this processor.")
.required(true)
.defaultValue("1000")
.addValidator(StandardValidators.POSITIVE_INTEGER_VALIDATOR)
.build()
Relationship REL_SUCCESS = new Relationship.Builder()
.name("success")
.description('FlowFiles that were successfully processed are routed here')
.build()
Relationship REL_FAILURE = new Relationship.Builder()
.name("failure")
.description('FlowFiles that were not successfully processed are routed here')
.build()
ComponentLog log
void initialize(ProcessorInitializationContext context) { log = context.logger }
Set<Relationship> getRelationships() { return [REL_FAILURE, REL_SUCCESS] as Set }
Collection<ValidationResult> validate(ValidationContext context) { null }
PropertyDescriptor getPropertyDescriptor(String name) { null }
void onPropertyModified(PropertyDescriptor descriptor, String oldValue, String newValue) { }
List<PropertyDescriptor> getPropertyDescriptors() { Collections.unmodifiableList([BATCH_SIZE]) as List<PropertyDescriptor> }
String getIdentifier() { null }
JsonSlurper jsonSlurper = new JsonSlurper()
JsonOutput jsonOutput = new JsonOutput()
def reverseDnsLookup(String ipAddress) {
try {
InetAddress inetAddress = InetAddress.getByName(ipAddress)
String hostName = inetAddress.getCanonicalHostName()
return hostName
} catch (UnknownHostException e) {
return "Unknown"
}
}
void onTrigger(ProcessContext context, ProcessSessionFactory sessionFactory) throws ProcessException {
ProcessSession session = sessionFactory.createSession()
try {
List<FlowFile> flowFiles = session.get(context.getProperty(BATCH_SIZE).asInteger())
if (!flowFiles) return
Map customAttributes = [:]
flowFiles.each { flowFile ->
String ipAddress = flowFile.getAttribute("ip_address")
if (ipAddress) {
String hostName = reverseDnsLookup(ipAddress)
customAttributes["host_name"] = hostName
flowFile = session.putAllAttributes(flowFile, customAttributes)
}
session.transfer(flowFile, REL_SUCCESS)
}
session.commit()
} catch (final Throwable t) {
log.error('{} failed to process due to {}; rolling back session', [this, t] as Object[])
session.rollback(true)
throw t
}
}
}
processor = new GroovyProcessor()
... View more
06-14-2023
02:33 PM
Does ExecuteSQL erase some of the attributes that could be used to associate the FlowFiles futher down stream?
... View more
06-09-2023
10:24 AM
@naveenb Your query will get better visibility by starting a new question in the community rather then asking on an already solved question. NiFi's ListSFTP and GetSFTP (deprecated in favor of listSFTP and FetchSFTP) processor only lists/gets files. When it generates a NiFi FlowFile from a file it finds recursively within the source SFTP server configured base directory, it adds a "path" attribute to that FlowFile. That "path" attribute has the absolute path to the file. So based on your configuration, the results you are seeing are expected since you configured your putSFTP with "/home/ubuntu/samplenifi/${path}" Were "path" attribute on your FlowFiles resolves to "/home/nifiuser/nifitest/sample" for files found in that source subdirectory. You can use NiFi expression language (NEL) to modify that "path" attribute string to get rid of the "/home/nifiuser" portion /home/ubuntu/samplenifi/${path:substringAfter('/home/nifiuser')} If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
05-26-2023
12:29 PM
@Gutao When interacting with the NiFi rest-api, I'd recommend creating a client certificate to use in your automation. A secured NiFi will always WANT a client certificate and will only try another configured auth method if a client certificate is not provide in the TLS exchange. Using a certificate for your rest-api automation removes the need for obtaining a token completely. You simply pass your client certificate with every rest-api call. Another advantage here over auth is token expiration. With no token involved with certificate based auth, your certificate will continuously work until it expires (typical default is 1 or 2 years). You'll need to setup authorization policies for your certificate user (Certificate DN used as user identity) for the various endpoints you are trying to interact with through the rest-api. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
05-24-2023
07:31 AM
Hi @ChuckE, yes i had seen that note before and imagine that they fixed a similar issue but didnt catch this specific case, i am posting an issue in Jira to hopefully find a fix.
... View more
05-23-2023
07:13 PM
Thanks for the reply. In my case the question is: I build a model and then I train the model again; How do I create a new build of the model with the same id of the model already created before? I have to have the same access key of the model. I use cmlapi. Can I do cmlapi.CreateModelBuildRequest with exiting model id? Do you understand what I mean?
... View more
05-23-2023
05:39 AM
@suhaa I've sent you a private message to discuss
... View more
05-23-2023
05:37 AM
@suhaa Just sent you a private message to discuss
... View more