Member since
05-20-2016
155
Posts
220
Kudos Received
30
Solutions
10-06-2018
03:17 PM
Thanks @Santhosh B Gowda. We have around 15Nodes in the cluster. The Ambari version is 2.6.1.5 & HDP 2.6.3. We are preparing the OS Upgrade plan for production, We are doing this in test region. I have the following questions a. Can you confirm if we have to upgrade Ambari2.6.1.5 to Ambari 2.7 before starting the OS Migration. This will enable us to use 'Recover Host' option introduced in the Ambari 2.7. b. Should we start with the Ambari 2.6.1.5. We will have to ensure that we have upgraded all the servers to RHEL 7 with Ambari agent & Ambari server as 2.6.1.5. Once all the Servers are upgraded to RHEL 7, We will have all the servers running Ambari 2.6.1.5 in all the servers running RHEL 7. Post this we can upgrade Ambari to 2.7. What would be your approach to this.
... View more
08-24-2018
09:55 AM
1 Kudo
There are at times we would need to move kerberos database to different nodes or upgrade the OS of KDC node ( for e.x CentOS6 to CentOS7 ). Obviously you would not want to lose you the kdc users especially if your HDP cluster is configured to use this kdc. Follow below steps to backup and restore kerberos database. prerequisite * Backup the keytab from the HDP cluster under /etc/security/keytabs from all nodes.
* Note down your kdc admin principal and password
* Backup /etc/krb5.conf
* Backup /var/kerberos directory Backup * Take the kerberos database dump using below command ( to be executed on node running kerberos )
kdb5_util dump kdb5_dump.txt
* Safely backup the kdb5_dump.txt. Restore * Restore the kerberos database execute below command
kdb5_util load kdb5_dump.txt
* Restore the /etc/krb5.conf from backup
* Restore /var/kerberos/krb5kdc/kdc.conf from backup
* Restore /var/kerberos/krb5kdc/kadm5.acl from backup
* Run below command to store master principal in stash file ( kdc admin password is required )
kdb5_util stash
* Start KDC server using below command
service krb5kdc start
... View more
08-09-2018
11:51 PM
We have NIFI Secure S2S enabled. How do we get the {{ token }}, as NIFI is not enabled for Username / Password, we only use CERTS / PEM files?
... View more
06-30-2017
09:57 AM
4 Kudos
While this article provides a mechanism through which we could setup Spark with HiveContext, there are some limitation that when using Spark with HiveContext. For e.x Hive support writing query result to HDFS using the "INSERT OVERWRITE DIRECTORY" i.e INSERT OVERWRITE DIRECTORY 'hdfs://cl1/tmp/query'
SELECT * FROM REGION Above command will result is writing the result of above query to HDFS. However if the same query is passed to Spark with HiveContext, this will fail since "INSERT OVERWRITE DIRECTORY" is not a supported feature when using Spark. This is tracked via this jira. If the same needs to be achieved via spark -- it could achieved by using the Spark CSV library ( required in case of Spark1 ). Below is the code snippet on how to achieve the same. DataFrame df = hiveContext.sql("SELECT * FROM REGION");
df.write()
.format("com.databricks.spark.csv")
.option("delimiter", "\u0001")
.save("hdfs://cl1/tmp/query");
Above command will save the result in HDFS under dir /tmp/query. Please note the delimiter which is used, this is same as what hive currently supports. Also below depedency needs to be added to pom.xml <dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-csv_2.10</artifactId>
<version>1.5.0</version>
</dependency>
... View more
Labels:
03-08-2017
12:20 PM
10 Kudos
Storm 1.1.X provide an external storm kafka client that we could use to build storm topology. Please note this is support for Kafka 0.10 onwards. Below is the step by step guide on how to use the API's. Add below dependency to your pom.xml <dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-kafka-client</artifactId>
<version>1.1.1-SNAPSHOT</version>
</dependency>
The kafka spout implementation for the topology is configured using KafkaSpoutConfig. Below is a sample config object creation. KafkaSpoutConfig spoutConf = KafkaSpoutConfig.builder(bootStrapServers, topic)
.setGroupId(consumerGroupId)
.setOffsetCommitPeriodMs(10_000)
.setFirstPollOffsetStrategy(UNCOMMITTED_LATEST)
.setMaxUncommittedOffsets(1000000)
.setRetry(kafkaSpoutRetryService)
.setRecordTranslator
(new TupleBuilder(), outputFields, topic )
.build();
Above class follows builder pattern. bootStrapServers is the Kafka broker end point from where the consumer records are to be polled. topic is the kafka topic name. It can be a collection of kafka topic ( multiple topic or a Pattern ( regular expression ) as well. consumerGroupId would set the kafka consumer group id ( group.id). setFirstPollOffsetStrategy allows you to set from where the consumer records should be fetched. This takes an enum as input and below is the description for the same. EARLIEST - spout will fetch the first offset of the partition, irrespective of commit
LATEST - spout will fetch records greater than the last offset in partition, irrespective of commit.
UNCOMMITTED_EARLIEST - spout will fetch the first offset of the parition, if there is no commit
UNCOMMITTED_LATEST - spout will fetch records from the last offset, if there is no commit.
kafkaSpoutRetryService impl is provided below. This is making use of ExponentialBackOff. This setRetry provides a pluggable interface if in case you would want to have failed tuples retry differently. KafkaSpoutRetryService kafkaSpoutRetryService = new KafkaSpoutRetryExponentialBackoff(KafkaSpoutRetryExponentialBackoff.TimeInterval.microSeconds(500),
KafkaSpoutRetryExponentialBackoff.TimeInterval.milliSeconds(2), Integer.MAX_VALUE, KafkaSpoutRetryExponentialBackoff.TimeInterval.seconds(10));
setRecordTranslator provides a mechanism through which we can specify how the kafka consumer records should be converted to tuples. In the above given e.x the TupleBuilder is implementing Func interface. Below is the sample impl of apply method that needs to be overridden. OutputFields is the list of the fields that will be emitted in tuple. Please note there are multiple ways to set translate records to tuple. Please go through storm kafka client documentation for more details. public List<Object> apply(ConsumerRecord<String, String> consumerRecord) {
try {
String records[] = consumerRecord.value().split('|')
return Arrays.asList(records);
} catch (Exception e) {
LOGGER.debug("Failed to Parse {}. Throwing Exception {}", consumerRecord.value() , e.getMessage() );
e.printStackTrace();
}
return null;
} Once the above step is complete, topology can include above created spoutConf as below. TopologyBuilder builder = new TopologyBuilder();
Config conf = new Config();
conf.setNumWorkers(1);
builder.setSpout(KAFKA_SPOUT, new KafkaSpout(spoutConf), 1); Reference: https://github.com/apache/storm/blob/1.x-branch/docs/storm-kafka-client.md
... View more
Labels:
12-22-2016
10:59 AM
2 Kudos
Ever come across below error while starting HDFS datanode in unsecure cluster ? 016-12-22 09:00:41,045 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:0
2016-12-22 09:00:41,152 INFO datanode.DataNode (DataNode.java:shutdown(1915)) - Shutdown complete.
2016-12-22 09:00:41,153 ERROR datanode.DataNode (DataNode.java:secureMain(2630)) - Exception in secureMain
java.net.SocketException: Permission denied
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:463)
at sun.nio.ch.Net.bind(Net.java:455)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125)
at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:475)
at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1021)
at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:455)
at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:440)
at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:844)
at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:194)
at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:340)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)
2016-12-22 09:00:41,157 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2016-12-22 09:00:41,159 INFO datanode.DataNode (LogAdapter.java:info(47)) - SHUTDOWN_MSG:
This error means DataNode is unable to bind to provided socket. Reason ? This is most like because the http port or other port of DataNode are listening on port < 1024. For e.x http port property is set to as below. dfs.datanode.http.address=1022 Solution: Change the port something as 50075 which is greater than 1024. DataNode should be able to start with this solution.
... View more
Labels:
09-21-2016
02:37 PM
4 Kudos
Ambari 2.4.0.0 officially supports LogSearch Component [ Tech Preview] . To learn more about Log Search component please refer to link Ambari LogSearch . While LogSearch component does support pushing the logs to Kafka topic [ based on which real time log analytics can be performed ] this is official not supported in Ambari 2.4.0.0. This might get addressed in Ambari 2.5.0.0 probably. This article provides the details on how to configure LogSearch [ LogFeeder component ] to push to Kafka topic if there is a need to capture and perform real time analytics based on logs in your cluster. 1.After installing LogSearch component from Ambari 2.4.0.0, go to the LogSearch component config screen and add below property under "Advanced logfeeder-properties" to property "logfeeder.config.files" {default_config_files},kafka-output.json 2. Create the kafka-output.json file with below content under directory /etc/ambari-logsearch-logfeeder/conf/ on the nodes which has logfeeder [ ideally all the nodes in your cluster ] {
"output": [
{
"is_enabled": "true",
"destination": "kafka",
"broker_list": "ctr-e25-1471039652053-0001-01-000006.test.domain:6667",
"topic": "log-streaming",
"conditions": {
"fields": {
"rowtype": [
"service"
]
}
}
}
]
}
3. Configure Kafka PLAINTEXT listener as below if the cluster is Kerberozied because workaround to push to PLAINTEXTSASL is not available.Make sure broker endpoint configured in Step#2 is PLAINTEXT PLAINTEXT://localhost:6667,PLAINTEXTSASL://localhost:6668 4. Create Kafka topic and provide ACLs to ANONYMOUS user. Below command help in the same. ./bin/kafka-topics.sh --zookeeper zookeeper-node:2181 --create --topic log-streaming --partitions 1 --replication-factor 1
./bin/kafka-acls.sh --authorizer kafka.security.auth.SimpleAclAuthorizer --authorizer-properties zookeeper.connect=zookeeper-node:2181 --add --allow-principal User:ANONYMOUS --operation Read --operation Write --operation Describe --topic log-streaming
5. Restart LogSearch service from Ambari and thats it ! logs should be pushing by now. Below is the command to check the same. /bin/kafka-console-consumer.sh --zookeeper zookeeper-node:2181 --topic log-streaming --from-beginning --security-protocol PLAINTEXT
... View more
Labels:
08-11-2016
10:05 PM
4 Kudos
Kafka from 0.9 onwards started support SASL_PLAINTEXT ( authentication and non-encrypted) for communication b/w brokers and consumer/produce r with broker. To know more about SASL, please refer to this link. Maven Dependency Add below maven dependency to your pom.xml
<dependency><br> <groupId>org.apache.kafka</groupId><br> <artifactId>kafka-clients</artifactId><br> <version>0.10.0.0</version><br></dependency>
2. Kerberos Setup Configure JAAS configuration file with contents as below KafkaClient {
com.sun.security.auth.module.Krb5LoginModule required
useTicketCache=true
principal="user@EXAMPLE.COM"
useKeyTab=true
serviceName="kafka"
keyTab="/etc/security/keytabs/user.headless.keytab";
};
Above configuration is set to use key tab and ticket cache. The Kakfa Client Producer use above info to get TGT and authenticates with Kafka broker. Note: a] Make sure the /etc/krb5.conf has realms mapping for "EXAMPLE.COM" and also the default_realm is set to "EXAMPLE.COM" under [libdefaults] section. Please refer this link for more information. b] Run below command and make sure it is successful kinit -kt /etc/security/keytabs/user.headless.keytab user@EXAMPLE.COM 3. Initialization Kafka Producer The Kafka Producer Client needs certain information to initialize itself. This can be provided either as a property file input or as a HashMap as below.
Properties properties = new Properties();
properties.put("bootstrap.servers","comma-seperated-list-of-brokers");
properties.put("key.serializer","org.apache.kafka.common.serialization.StringSerializer");// key serializer
properties.put("value.serializer","org.apache.kafka.common.serialization.StringSerializer"); //value serializer
properties.put("acks","1"); //message durability -- 1 mean ack after writing to leader is success. value of "all" means ack after replication.
properties.put("security.protocol","SASL_PLAINTEXT"); // Security protocol to use for communication.
properties.put("batch.size","16384");// maximum size of message
KafkaProducer<String,String> producer = new KafkaProducer<String, String>(properties);<br>
3. Push Message producer.send(new ProducerRecord<String, String>(", "key", "value"), new Callback() {
public void onCompletion(RecordMetadata metadata, Exception e) {
if (e != null) {
LOG.error("Send failed for record: {}", metadata);
}
else {
LOG.info("Message delivered to topic {} and partition {}. Message offset is {}",metadata.topic(),metadata.partition(),metadata.offset());
}
}
});
}
producer.close(); Above code pushes message to kafka broker and on completion( acked ) the method "onCompletion" is invoked. 4. Run While running this code add below VM params -Djava.security.auth.login.config=<PATH_TO_JAAS_FILE_CREATED_IN_STEP2> -Djava.security.krb5.conf=/etc/krb5.conf
... View more
Labels: