Member since
08-08-2013
339
Posts
132
Kudos Received
27
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
11357 | 01-18-2018 08:38 AM | |
756 | 05-11-2017 06:50 PM | |
6710 | 04-28-2017 11:00 AM | |
2483 | 04-12-2017 01:36 AM | |
2063 | 02-14-2017 05:11 AM |
02-28-2019
08:25 AM
Hi @Rodrigo Hjort , did you solve this problem and if yes, how ?
... View more
11-07-2018
08:04 PM
Hi @Michael Bronson , any insights into why you are thinking "retention does not work as it should" ? It would be also helpful if you could provide some more details about the usage of your Kafka Cluster. Is data flodding in steadily, are there heavy spikes which lead to _partition full_, how many producers in parallel, how many topics + replication, etc. How did you configure the retention? Regards, Gerd
... View more
06-19-2018
08:49 PM
Hi @SATHIYANARAYANA KUMAR.N , you can keep your pipeline if you need to and write intermediate output (after each processing) either with Spark into HDFS again, or by using Hive into another table. From what you are describing, it sounds like a huge (and useless) overhead to split your huge files, put it into a RDBMS, grab it from there into AMQ and process it from there....that is ways to expensive/complicated/error-prone. Just upload your huge files to HDFS and e.g. create a directory structure which reflects your processing pipeline, like /data/raw /data/layer1 /data/layer2 ...and put your output after each processing into it accordingly HTH, Gerd
... View more
06-19-2018
08:40 PM
Hi @Harshali Patel , regarding _HDFS_ there is no need to use RAID at all. In addition to @Aayush Kasliwal 's answer, I'd highly recommend to configure Namenode-HA to avoid any single-point-of-failure for HDFS. This also ensures that the Namenode Metadata will be written in multiple copies throughout the JournalNodes (e.g. you can configure multiple directories and you should use e.g. 3 JournalNodes). Where I see RAID as benefitial are the partitions for OS, logs, ... but of course, this is "below" HDFS HTH, Gerd
... View more
06-05-2018
06:36 PM
Hi @SATHIYANARAYANA KUMAR.N , some details are missing in your post, but as an general answer: if you want to do a batch processing of some huuuge files, Kafka is the wrong tool to use. Kafka's strength is managing STREAMING data. Based on your description I am assuming that your use-case is, bringing huge files to HDFS and process it afterwards. For that I won't split the files at all, just upload it as a whole (e.g. via WebHDFS). Then you can use tools like Hive/Tez, Spark, ... to process your data (whatever you mean with "process", clean/filter/aggregate/merge/...or at the end "analyze" in an sql-like manner) HTH, Gerd
... View more
06-04-2018
06:39 PM
Hi @Robin Dong, there is a slight dependency to your overall cluster layout. If e.g. this node is you single Master node of your cluster, then you have to shutdown your cluster first ( Ambari => Stop all services), but if e.g. this node is one of your workers you most probably are fine by putting that particular node in "Maintenance mode", then perform your maintenance work, start it up again and finally disable "Maintenance mode". HTH, Gerd
... View more
05-10-2018
06:24 PM
Hi @Mudit Kumar , for adding your users you need to create principals for them in the Kerberos database. e.g. connect to the node where MIT-KDC is running, then sudo kadmin.local "addprinc <username>" #replace <username> by your real usernames So that you are able to grab a valid Kerberos ticket for those 5 users. You can verify this by executing kinit <username> this should ask for the corresponding password of that user (!! the password you provided at creation time of the principal above !!), followed by klist After grabbing a Kerberos ticket you can start executing commands to the cluster, like "hdfs dfs -ls" If you have enabled authorization as well, you have to add those new users to the ACLs appropriately.
... View more
01-18-2018
08:38 AM
1 Kudo
Hi, that indicates your os user "root" is not the superuser of HDFS (root is just the "superuser" of the operating system). Try to do the same as user "hdfs" (which is by default the hdfs superuser), as root do: su - hdfs
hdfs dfsadmin -report Basically, the HDFS superuser is the user, under which account the Namenode is started. Alternatively you can add the os-user "root" to the group which is set as hdfs supergroup. Check for property dfs.permissions.supergroup and add "root" to this group (which points to an os group) HTH, Gerd
... View more
12-21-2017
12:14 AM
Hi experts, any feedback/hints/... highly appreciated 😄
... View more
11-14-2017
04:06 PM
Hi, after enabling SASL_PLAINTEXT listener on kafka it is no longer possible to use console-consumer/-producer. Whereas using a simple Java snippet to create a producer and adding some messages, it works fine, by using the exact same user/password as used for the console-clients: public class SimpleProducer {
public static void main(String[] args) throws Exception{
if(args.length == 0){
System.out.println("Enter topic name");
return;
}
String topicName = args[0].toString();
Properties props = new Properties();
props.put("bootstrap.servers", "<brokernode>:6666");
props.put("acks", "1");
props.put("retries", 0);
props.put("batch.size", 16384);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
////// AUTHENTICATION
props.put("security.protocol","SASL_PLAINTEXT");
props.put("sasl.mechanism","PLAIN");
props.put("sasl.jaas.config",
"org.apache.kafka.common.security.plain.PlainLoginModule required\n" +
"username=\"kafka\"\n" +
"password=\"kafkaSecure\";");
////// END AUTHENTICATION
Producer<String, String> producer = new KafkaProducer<String, String>(props);
System.out.println("producer created");
for(int i = 0; i < 10; i++) {
System.out.println("message"+i);
producer.send(new ProducerRecord<String, String>(topicName,
Integer.toString(i), Integer.toString(i)));
}
System.out.println("Messages sent successfully");
producer.close();
}
} After starting the e.g. producer and trying to add a message via the console, the following message is shown (endless): [2017-11-14 16:48:23,039] WARN Bootstrap broker <brokernode>:6666 disconnected (org.apache.kafka.clients.NetworkClient)
[2017-11-14 16:48:23,091] WARN Bootstrap broker <brokernode>:6666 disconnected (org.apache.kafka.clients.NetworkClient)
[2017-11-14 16:48:23,143] WARN Bootstrap broker <brokernode>:6666 disconnected (org.apache.kafka.clients.NetworkClient)
[2017-11-14 16:48:23,195] WARN Bootstrap broker <brokernode>:6666 disconnected (org.apache.kafka.clients.NetworkClient) Kafka config looks like: listeners=PLAINTEXT://<brokernode>:6667,SASL_PLAINTEXT://<brokernode>:6666 sasl.enabled.mechanisms=PLAIN sasl.mechanism.inter.broker.protocol=PLAIN security.inter.broker.protocol=SASL_PLAINTEXT The console-producer gets started via: export KAFKA_OPTS="-Djava.security.auth.login.config=/etc/kafka/conf/user_kafka_jaas.conf" ; /usr/hdf/current/kafka-broker/bin/kafka-console-producer.sh --broker-list <brokernode>:6666 --topic gk-test --producer.config /etc/kafka/conf/producer.properties where the property files look like: /etc/kafka/conf/user_kafka_jaas.conf KafkaClient {
org.apache.kafka.common.security.plain.PlainLoginModule required
username="kafka"
password="kafkaSecure";
}; /etc/kafka/conf/producer.properties security.protocol=SASL_PLAINTEXT sasl.mechanism=PLAIN Any hint on what is going wrong with console-producer and console-consumer to not being able to produce/consume from topic ? ...but the Java snippet works... Thanks
... View more
Labels:
- Labels:
-
Apache Kafka
11-09-2017
11:22 PM
Hi @Vikasreddy , you should have a look at KafkaConnect (here or here). You can use e.g. the JDBC-Sink to directly push data from Kafka to RDBMS. Regards...
... View more
11-09-2017
10:36 AM
Bryan, many thanks for your explanation. Do you have any resources/hints regarding "creating a dynamic JAAS file", how this would look like ? ....assuming Kerberos is enabled 😉 ...or do you mean by 'dynamic' the possibility to specify principal&keytab within the Kafka processor? Thanks!
... View more
11-08-2017
01:07 PM
Hi, how can I enable Kafka SASL_PLAINTEXT auth, without enabling Kerberos in general ?!?! Right now I added the additional "listener" entry and populated the "advanced kafka_jaas_conf" as well as "advanced kafka_client_jaas_conf". After that the KafkaBrokers won't start up, because of error: FATAL [Kafka Server 1001], Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
org.apache.kafka.common.KafkaException: java.lang.IllegalArgumentException: Could not find a 'KafkaServer' entry in the JAAS configuration. System property 'java.security.auth.login.config' is not set What else needs to be done to provide the required properties to Broker startup as well as to distribute the .jaas files ? Also it looks like the .jaas files are not being deployed to the kafka nodes, they are not under /usr/hdp/current/kafka-broker/config. Is this functionality missing because of Kerberos is disabled ?!?! I am sure after enabling Kerberos the defined .jaas entries in Ambari will be deployed to the nodes, hence there must be some "hidden" functionality missing in non-Kerberos mode.... Any help appreciated, thanks in advance...
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Kafka
11-08-2017
10:01 AM
Hello, there is HDF setup done (HDF3.0) and now SASL_PLAINTEXT needs to be added to Kafka listeners (without Kerberos, just the plain sasl). To be enable to authenticate there needs to be user:pw tuples being provided in the .jaas file. But this looks very static. How can the enduser (who is logged in into NiFi) being used in a Kafka Processor to authenticate against Kafka ? Is there a possibility with user defined properties to ensure that the current user is being used for authenticating against Kafka / or to dynamically decide which .jaas file needs to be used based on the current logged in user ? Kerberos and SSL are currently not an option, hence need a solution for SASL_PLAINTEXT 😉 Thanks in advance...
... View more
Labels:
10-20-2017
06:33 AM
Hi @Matt Clarke , thanks for your reply. Will dive back into this with the release you mentioned. You're saying "no support of Ranger or LDAP Groups", but support of Ranger is already there, although limited to user-based policies. Or did I misunderstand something here ?!?!
... View more
10-19-2017
08:35 AM
Hi, I setup HDF (in particular NiFi & Ranger) to fetch users&groups from AD and do auth against AD. Defining policies in Ranger for NiFi, based on AD users, is working as expected after logging in to NiFi with AD credentials. The only thing that is not working are the policies which grants access based on AD groups. There is this article from almost a year ago. Does it still apply @Bryan Bende? Means, NiFi policies based on AD group membership is not working ? Thanks in advance....
... View more
Labels:
- Labels:
-
Apache NiFi
-
Apache Ranger
08-16-2017
12:46 PM
Hello, I setup HDF 3 including Ranger and Kerberos...everything's green in Ambari so far. Ranger plugins for Kafka and NiFi have been enabled and in RangerUI I can see the default policy for Kafka has been created and some audit entries are there, see below The problem now is, that I can list and describe Kafka topics with my user account, although it is not allowed by Ranger ACL , and I do not even see any entry in Audit log for the accesses under my own user account. It looks like Ranger ACLs doesn't get applied to Kafka at all, no idea why ?!?! I create a dedicated policy for Topic 'foo', just granting my user 'consume' access => In a terminal I can still 'describe' that topic, and in ranger audit there is NO entry for that access => Any ideas why access is still allowed and why there is no audit being recorded ?!?! PS: in RangerUI the Kafka policy is shown as updated.....it updates right after being updated
... View more
Labels:
08-14-2017
08:21 AM
Hi @Lucky_Luke, the script "kafka-topics.sh" with parameter "--describe" is what you are looking for. To get details for a certain topic, e.g. "test-topic", you would call (adjust zookeeper connect string according to your env.): /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper sandbox.hortonworks.com:2181/kafka --describe --topic test-topic The output contains (amongst others) no. of partitions, leading broker for each partition, in-snyc-replicas The topic-level configuration properties are listed under "Configs:" . If this is blank, then the default (broker-wide) settings are applied and you should check your broker config file (or Ambari section) for property "log.retention.hours" ...... assuming you mean the retention time by mentioning "TTL" HTH, Gerd
... View more
08-14-2017
08:02 AM
Hi @Bharadwaj Bhimavarapu , how did you solve the "java.lang.NoSuchMethodError" ? I am facing the same in the HDP2.6 sandbox, trying to start connect-standalone
... View more
08-02-2017
03:00 PM
Hello @Alexandru Anghel , many thanks....works brilliant !
... View more
07-30-2017
07:42 PM
Hello @Avinash Reddy , if the directory is missing, just create it as first statement after becoming user hdfs (after: 'sudo su - hdfs'): hdfs dfs -mkdir /user/anonymous ...then proceed with 'hdfs dfs -chown..'...
... View more
07-29-2017
06:11 PM
Hi, I am setting up HDF3.0 via Blueprint where installation of services works nice, but starting NiFi fails because it expects a password being provided for decrypting flow.xml.gz under /nifi directory (calling the encrypt-config tool). Two questions here: 1.) which properties needs to be provided in the blueprint so that NiFi starts successfully without being asked for a password....which obviously cannot be provided 2.) from where does NiFi at startup time populate the subdirectories under /nifi ? Before re-deploying the blueprint I deleted whole /nifi directory just to ensure the main error is not caused by some old/previous files, but at starting up Nifi this folder gets recreated incl. subdirectories. Any other hint to get the services startup also being successfull at applying the blueprint highly appreciated 😉
... View more
Labels:
07-25-2017
06:53 PM
1 Kudo
Hello @Avinash Reddy , assuming you are not using Ranger for defining ACLs, the hdfs command to change ownership to anonymous is: # become user 'hdfs'
sudo su - hdfs
# change ownership
hdfs dfs -chown anonymous /user/anonymous
# optional, if you want to limit access to user 'anonymous' for its user directory
hdfs dfs -chmod -R 700 /user/anonymous Regards, Gerd
... View more
07-11-2017
06:27 PM
@Sami Ahmad , no, Ranger does not install Solr for you. What @vperiasamy was referring to, is service "Ambari Infra". This service runs a SolrCloud under the hood, and if you configure Ranger to store audits in SolrCloud it will by default pick up this SolrCloud instance. To put it in a nutshell: you need service "Ambari Infra" if you do not want to setup/maintain a dedicated, additional SolrCloud environemnt. HTH, Gerd
... View more
07-11-2017
09:53 AM
1 Kudo
Hi, I wanted to setup Ranger-on-NiFi in the HDF 3.0 sandbox, according to https://community.hortonworks.com/articles/58769/hdf-20-enable-ranger-authorization-for-hdf-compone.html But after adding Ranger , enabling NiFi plugin and restarting required services, this plugin doesn't appear in the Ranger-UI. Its "audit"=>"Plugin" overview is just empty. The default policy for NiFi has been created though.
Any ideas what was going wrong ?
... View more
Labels:
07-06-2017
07:02 AM
Hi @Robin Dong , if you try the standalone-mode, there is no configuration via REST at all, hence you do NOT need any curl command to provide the connector config to your worker. In standalone-mode you pass the connector config as a second commandline parameter to start your worker, see here for an example how to start the standalone stuff including the connector config. Maybe it is worth providing both configurations, the standalone worker as well the distributed one. If you start the distributed worker, at the end of the commandline output there you will find the URL to the REST-Interface. Can you paste that terminal output as well ? Do you execute the curl command from the same node where you started Connect Worker, or is it from a remote host and maybe the AWS Network/Security settings prevent you from talking to the REST Interface ? Regards, Gerd
... View more
07-03-2017
07:03 PM
Hi @Robin Dong , port 8083 is the default port for the KafkaConnect Worker, if started in distributed mode....which is the case in the URL you are referring to. You can set this port to another one in the properties file you provide as parameter to the connect-distributed.sh cmdline call (the property is called rest.port , see here ). In distributed mode you have to use the REST API to configure your Connectors, that's the only option. You can of course also start investigating into using Connect by starting with standalone mode. Then you do not need a REST call to configure your connector, you can just provide the connector.properties file as additional parameter at starting time of the ConnectWorker to the connect-standalone.sh script (ref. here) Please try to replace 'localhost' by the FQDN of the host , where the Connect worker was started, and of course check if this start was successfull by looking at the listening ports e.g. netstat -tulpn | grep 8083 HTH, Gerd --------- If you find this post useful, any vote/reward highly appreciated 😉
... View more
06-27-2017
07:12 AM
1 Kudo
Hi @mel mendoza , maybe it is worth checking Flume to ingest multiple files to Kafka. Alternatively you can use HDF (particularly NiFi) to do so.
... View more
06-23-2017
08:44 AM
Hi @Karan Alang , does your KafkaTopic has several partitions (which are then spread across the brokers) ? If you then ingest data that do not belong to the same partition, you'll see your reported behaviour since the order is ensured within a partition HTH, Gerd
... View more
06-23-2017
07:18 AM
1 Kudo
Hello @Adhishankar Nanjundan , just an architectural idea for your approach. What about putting all your metrics in one dedicated topic itself, then use Kafka-Connect to insert data from that topic into ElasticSearch....by using Kafka-Connect ElasticSearch sink ?!?! Regards, Gerd
... View more