2213
Posts
231
Kudos Received
82
Solutions
About
My expertise is not in hadoop but rather online communities, support and social media. Interests include: photography, travel, movies and watching sports.
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 781 | 05-07-2025 11:41 AM | |
| 1642 | 02-27-2025 12:49 PM | |
| 3454 | 06-29-2023 05:42 AM | |
| 3007 | 05-22-2023 07:03 AM | |
| 2118 | 05-22-2023 05:42 AM |
06-29-2023
06:03 AM
1 Kudo
Welcome to the community @chenjun. While you wait for someone more experience than me, I wanted to add a quick google translation of your post's subject in case if increases your ability to find help.
Submitting the sparksql task through Livy actually reports a kafka kerberos error, please help!
... View more
06-29-2023
05:30 AM
Welcome to the community @harry_su . As this question is several years old, I would suggest starting a new one. This would allow you to add details specific to your situation.
... View more
06-27-2023
01:54 PM
thanks it solved the problem
... View more
06-26-2023
06:20 AM
Welcome to the community @Arui. I see you are facing an error copying installation files when adding a new host to a cluster (if google translate is correct). I'll refer you to this post which may explain the reason.
... View more
06-23-2023
07:20 AM
Welcome to the community @sencae While you wait for a more knowledgable person to respond, I did find this older post that hopefully gets you closer to where you need to be.
https://community.cloudera.com/t5/Support-Questions/How-to-retrieve-Latest-Uploaded-records-from-Hive-In-my/td-p/187070
... View more
06-23-2023
06:04 AM
1 Kudo
Welcome to the community @samans I'm not an expert but I did some searching and may have found something for you. I would review the Cloudera ODBC Connector for Apache Impala documentation as I see some references to transaction statements not being supported and how to work around it. Not sure if that is the issue here, but hope it helps.
... View more
06-22-2023
07:44 AM
hi @rki_ / @cjervis , I forgot to ask, today the cluster already has thousands of blocks in hdfs, more than 23 million blocks. after configuring the rack in the cluster, the hdfs will recognize the racks and will start moving the blocks to the racks to increase the availability of the blocks or will i have to rebalance the hdfs?
... View more
06-15-2023
11:41 AM
This simple InvokeScriptedProcessor will look for a FlowFile attribute called "ip_address" and will attempt the reverse lookup and create a new attribute called "host_name" with the resolved value. import groovy.json.JsonOutput
import groovy.json.JsonSlurper
import java.net.InetAddress
import java.net.UnknownHostException
import java.nio.charset.StandardCharsets
import org.apache.commons.io.IOUtils
class GroovyProcessor implements Processor {
PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder()
.name("BATCH_SIZE")
.displayName("Batch Size")
.description("The number of incoming FlowFiles to process in a single execution of this processor.")
.required(true)
.defaultValue("1000")
.addValidator(StandardValidators.POSITIVE_INTEGER_VALIDATOR)
.build()
Relationship REL_SUCCESS = new Relationship.Builder()
.name("success")
.description('FlowFiles that were successfully processed are routed here')
.build()
Relationship REL_FAILURE = new Relationship.Builder()
.name("failure")
.description('FlowFiles that were not successfully processed are routed here')
.build()
ComponentLog log
void initialize(ProcessorInitializationContext context) { log = context.logger }
Set<Relationship> getRelationships() { return [REL_FAILURE, REL_SUCCESS] as Set }
Collection<ValidationResult> validate(ValidationContext context) { null }
PropertyDescriptor getPropertyDescriptor(String name) { null }
void onPropertyModified(PropertyDescriptor descriptor, String oldValue, String newValue) { }
List<PropertyDescriptor> getPropertyDescriptors() { Collections.unmodifiableList([BATCH_SIZE]) as List<PropertyDescriptor> }
String getIdentifier() { null }
JsonSlurper jsonSlurper = new JsonSlurper()
JsonOutput jsonOutput = new JsonOutput()
def reverseDnsLookup(String ipAddress) {
try {
InetAddress inetAddress = InetAddress.getByName(ipAddress)
String hostName = inetAddress.getCanonicalHostName()
return hostName
} catch (UnknownHostException e) {
return "Unknown"
}
}
void onTrigger(ProcessContext context, ProcessSessionFactory sessionFactory) throws ProcessException {
ProcessSession session = sessionFactory.createSession()
try {
List<FlowFile> flowFiles = session.get(context.getProperty(BATCH_SIZE).asInteger())
if (!flowFiles) return
Map customAttributes = [:]
flowFiles.each { flowFile ->
String ipAddress = flowFile.getAttribute("ip_address")
if (ipAddress) {
String hostName = reverseDnsLookup(ipAddress)
customAttributes["host_name"] = hostName
flowFile = session.putAllAttributes(flowFile, customAttributes)
}
session.transfer(flowFile, REL_SUCCESS)
}
session.commit()
} catch (final Throwable t) {
log.error('{} failed to process due to {}; rolling back session', [this, t] as Object[])
session.rollback(true)
throw t
}
}
}
processor = new GroovyProcessor()
... View more
06-14-2023
02:33 PM
Does ExecuteSQL erase some of the attributes that could be used to associate the FlowFiles futher down stream?
... View more