About tsharma

tsharma · ‎09-19-2017

@Funamizu Koshi import re from pyspark.sql.functions import UserDefinedFunction from pyspark.sql.types import * udf = UserDefinedFunction(lambda x: re.sub(',','',x), StringType()) new_df = df.select(*[udf(column).alias(column) for column in df.columns]) new_df.collect()

tsharma · ‎09-19-2017

Hi Jasper, I don't think you have a problem while logging in. https://github.com/streamsets/datacollector/blob/master/apache-kafka_0_9-lib/src/main/java/org/apache/kafka/common/security/kerberos/Login.java After skimming through the above link, I guess if there was any error w.r.t. having a valid ticket, you would've got a log in KerberosLogin itself. What principal are you using? Can you check contents of kafka_client_jaas.conf. Is it of the form below: KafkaClient { com.sun.security.auth.module.Krb5LoginModule required useTicketCache=true; }; Or you have keytab configuration? If former, please see kafka_jaas.conf for Client section and kinit with user/keytab mentioned there. Try running the command again as: /usr/hdp/2.5.5.0-157/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $BROKER_LIST --security-protocol PLAINTEXTSASL --new-consumer --describe --group spoutconsumer -Djava.security.auth.login.config= /etc/kafka/kafka_jaas.conf If above command does not work, try exporting that variable.

tsharma · ‎09-04-2017

Hi @pp z Check whether console consumer works with --bootstrap-server = broker instead of --zookeeper. Check advertised listener in server.properties. Is it of the form PLAINTEXTSASL://host:port? Check whether security_protocol = PLAINTEXTSASL exists in server.properties. Are you getting the exception: WARN SASL configuration failed:javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file. If yes, then modify kafka_client_jaas.conf to include a Client section, you'll most likely get that in kafka_jaas.conf. If not, then ignore this. Use --security-protocol PLAINTEXTSASL instead of SASL_PLAINTEXT. Check whether you're able to start a authenticated connection to zookeeper-client. If not, give the path of conf file that has client section as a JVM param. For example: export JVMFLAGS="-Djava.security.auth.login.config= /usr/hdp/2.6.1.0-129/kafka/conf/kafka_client_jaas.conf" Try these and let me know. Thanks

tsharma · ‎08-13-2017

@Sofian Benabdelhak Please check the answer here, It may help. https://community.hortonworks.com/questions/114024/invalid-kdc-administrator-credentials.html?childToView=117774#answer-117774 Also use that API and set credentials to persisted.

tsharma · ‎08-12-2017

@Mugdha One more thing, I was seeing the config versions for YARN, and in v1 (just after installation), yarn.acl.enable is set to false. Is this property set to true while Kerberizing?

tsharma · ‎08-11-2017

Hi, I set up Ambari-server security https://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.1/bk_Ambari_Security_Guide/content/_configuring_http_authentication_for_HDFS_YARN_MapReduce2_HBase_Oozie_Falcon_and_Storm.html And followed https://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.1/bk_ambari_views_guide/content/section_kerberos_setup_tez_view.html for setting up Tez view with Kerberos. However, I'm not seeing any data in query tab anymore, though the same query gets listed in DAGs and all information is available there. Before setting up my cluster for Kerberos, I saw the query tab populated with old/new queries. Does someone know why this may be happening? Thanks screen-shot-2017-08-11-at-45052-pm.png screen-shot-2017-08-11-at-45032-pm.png screen-shot-2017-08-11-at-45121-pm.png

tsharma · ‎08-03-2017

IPC is a generic concept. It's not particular to Hive. In fact several hadoop service communicate this way. https://wiki.apache.org/hadoop/ipc Using IPC, clients can connect to Server components at a certain port and invoke methods exposed by a server. See properties related to ipc.client here : https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/core-default.xml

tsharma · ‎08-03-2017

@Ann A You can use the concept of time window. These two links may help you: http://blog.madhukaraphatak.com/introduction-to-spark-two-part-5/ https://stackoverflow.com/questions/37632238/how-to-group-by-time-interval-in-spark-sql

tsharma · ‎08-01-2017

Naveen, Can you check Kerberos ACL? RHEL/CentOS/Oracle Linux vi /var/kerberos/krb5kdc/kadm5.acl SLES vi /var/lib/kerberos/krb5kdc/kadm5.acl Ubuntu/Debian vi /etc/krb5kdc/kadm5.acl Default settings would be similar to: */admin@EXAMPLE.COM* or in your case */admin@DEV.DATAQUEST.COM* This means that only principals matching the above regex would be considered as admins. So try changing your principal to kadmin/admin@DEV.DATAQUEST.COM instead. Or add a line in the acl giving permission to kadmin. Let me know if this works.

tsharma · ‎07-29-2017

Hi, I'm not very sure but you could use flume to get data into HDFS by using an hdfs sink. https://flume.apache.org/FlumeUserGuide.html The location in hdfs is mentioned in flume-agent.conf file, for example: agent_foo.sinks.hdfs-Cluster1-sink.hdfs.path = hdfs://namenode/flume/webdata You could write a script to modify this directory with a timestamp and restart the flume agent. And then run that every week through cron.

Online	Offline
Last Visited	‎01-08-2021 08:08 AM

Member Since	‎07-10-2017 03:41 AM
Last Visited	‎01-08-2021 08:08 AM
Posts	68
Kudos received	30

Cloudera Community

Re: how to check views in hive from hdfs?

Re: Extract timestamp from filename and add it in ...

Re: Pyspark dataframe: How to replace

Re: What is IPC client in Hive? What does it do?

Re: Window Operations on Spark Streaming

Re: Pyspark dataframe: How to replace

Re: Kafka ConsumerGroupCommand Error

Re: kafka-console-consumer not reading messages

Re: HDP Sandbox 2.6 Automated Kerberos Installatio...

Re: TEZ View- Queries tab doesn't show any query, ...

TEZ View- Queries tab doesn't show any query, thou...

Re: What is IPC client in Hive? What does it do?

Re: How to group dataframe coloumn of timestamp us...

Re: Invalid KDC administrator credentials

Re: Window Operations on Spark Streaming