Member since
02-27-2023
37
Posts
3
Kudos Received
4
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 9729 | 05-09-2023 03:20 AM | |
| 5182 | 05-09-2023 03:16 AM | |
| 3833 | 03-30-2023 10:41 PM | |
| 26091 | 03-30-2023 07:25 PM |
04-03-2023
01:45 AM
Hi all, I am practicing Kafak on my CDP 7.1.8 with Kerberos enabled. I can create topics under Kerberos authentication. However, when I test producing and consuming message, the consumer side never receive a message. Here are some screenshot: Consumer: kafka-console-consumer --bootstrap-server host2.my.cloudera.lab:9092 --topic topic001 --from-beginning --cons umer.config /root/kafka/krb-client.properties
23/04/03 04:37:00 INFO utils.Log4jControllerRegistration$: [main]: Registered kafka:type=kafka.Log4jController MBean
23/04/03 04:37:01 INFO consumer.ConsumerConfig: [main]: ConsumerConfig values:
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [host2.my.cloudera.lab:9092]
check.crcs = true
client.dns.lookup = use_all_dns_ips
client.id = console-consumer
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = console-consumer-82044
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
internal.throw.on.fetch.stable.offset.unsupported = false
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 500
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor, class org.apache.kafka.clients .consumer.CooperativeStickyAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = [hidden]
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = kafka
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.connect.timeout.ms = null
sasl.login.read.timeout.ms = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.login.retry.backoff.max.ms = 10000
sasl.login.retry.backoff.ms = 100
sasl.mechanism = GSSAPI
sasl.oauthbearer.clock.skew.seconds = 30
sasl.oauthbearer.expected.audience = null
sasl.oauthbearer.expected.issuer = null
sasl.oauthbearer.jwks.endpoint.refresh.ms = 3600000
sasl.oauthbearer.jwks.endpoint.retry.backoff.max.ms = 10000
sasl.oauthbearer.jwks.endpoint.retry.backoff.ms = 100
sasl.oauthbearer.jwks.endpoint.url = null
sasl.oauthbearer.scope.claim.name = scope
sasl.oauthbearer.sub.claim.name = sub
sasl.oauthbearer.token.endpoint.url = null
security.protocol = SASL_PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
session.timeout.ms = 45000
socket.connection.setup.timeout.max.ms = 30000
socket.connection.setup.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2]
ssl.endpoint.identification.algorithm = https
ssl.engine.factory.class = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.certificate.chain = null
ssl.keystore.key = null
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLSv1.2
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.certificates = null
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
23/04/03 04:37:01 INFO authenticator.AbstractLogin: [main]: Successfully logged in.
23/04/03 04:37:01 INFO kerberos.KerberosLogin: [kafka-kerberos-refresh-thread-null]: [Principal=null]: TGT refresh thread sta rted.
23/04/03 04:37:01 INFO kerberos.KerberosLogin: [kafka-kerberos-refresh-thread-null]: [Principal=null]: TGT valid starting at: 2023-04-03T02:52:45.000-0400
23/04/03 04:37:01 INFO kerberos.KerberosLogin: [kafka-kerberos-refresh-thread-null]: [Principal=null]: TGT expires: 2023-04-0 4T02:52:45.000-0400
23/04/03 04:37:01 INFO kerberos.KerberosLogin: [kafka-kerberos-refresh-thread-null]: [Principal=null]: TGT refresh sleeping u ntil: 2023-04-03T22:11:03.897-0400
23/04/03 04:37:01 INFO utils.AppInfoParser: [main]: Kafka version: 3.1.1.7.1.8.0-801
23/04/03 04:37:01 INFO utils.AppInfoParser: [main]: Kafka commitId: 15839ba4eb998a33
23/04/03 04:37:01 INFO utils.AppInfoParser: [main]: Kafka startTimeMs: 1680511021242
23/04/03 04:37:01 INFO consumer.KafkaConsumer: [main]: [Consumer clientId=console-consumer, groupId=console-consumer-82044] S ubscribed to topic(s): topic001
23/04/03 04:37:01 INFO clients.Metadata: [main]: [Consumer clientId=console-consumer, groupId=console-consumer-82044] Resetti ng the last seen epoch of partition topic001-0 to 0 since the associated topicId changed from null to MyVuTpA9Tfayosq_QihlwA
23/04/03 04:37:01 INFO clients.Metadata: [main]: [Consumer clientId=console-consumer, groupId=console-consumer-82044] Cluster ID: 7vkx3ceERrKii_vcW_gViQ Producer: [root@host1 ~]# kafka-console-producer --broker-list host1.my.cloudera.lab:9092 host2.my.cloudera.lab:9092 --topic topic001 -- producer.config /root/kafka/krb-client.properties
23/04/03 04:37:44 INFO utils.Log4jControllerRegistration$: [main]: Registered kafka:type=kafka.Log4jController MBean
23/04/03 04:37:44 INFO producer.ProducerConfig: [main]: ProducerConfig values:
acks = -1
batch.size = 16384
bootstrap.servers = [host1.my.cloudera.lab:9092]
buffer.memory = 33554432
client.dns.lookup = use_all_dns_ips
client.id = console-producer
compression.type = none
connections.max.idle.ms = 540000
delivery.timeout.ms = 120000
enable.idempotence = true
interceptor.classes = []
key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
linger.ms = 1000
max.block.ms = 60000
max.in.flight.requests.per.connection = 5
max.request.size = 1048576
metadata.max.age.ms = 300000
metadata.max.idle.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 1500
retries = 3
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = [hidden]
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = kafka
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.connect.timeout.ms = null
sasl.login.read.timeout.ms = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.login.retry.backoff.max.ms = 10000
sasl.login.retry.backoff.ms = 100
sasl.mechanism = GSSAPI
sasl.oauthbearer.clock.skew.seconds = 30
sasl.oauthbearer.expected.audience = null
sasl.oauthbearer.expected.issuer = null
sasl.oauthbearer.jwks.endpoint.refresh.ms = 3600000
sasl.oauthbearer.jwks.endpoint.retry.backoff.max.ms = 10000
sasl.oauthbearer.jwks.endpoint.retry.backoff.ms = 100
sasl.oauthbearer.jwks.endpoint.url = null
sasl.oauthbearer.scope.claim.name = scope
sasl.oauthbearer.sub.claim.name = sub
sasl.oauthbearer.token.endpoint.url = null
security.protocol = SASL_PLAINTEXT
security.providers = null
send.buffer.bytes = 102400
socket.connection.setup.timeout.max.ms = 30000
socket.connection.setup.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2]
ssl.endpoint.identification.algorithm = https
ssl.engine.factory.class = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.certificate.chain = null
ssl.keystore.key = null
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLSv1.2
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.certificates = null
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
transaction.timeout.ms = 60000
transactional.id = null
value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
23/04/03 04:37:44 INFO producer.KafkaProducer: [main]: [Producer clientId=console-producer] Instantiated an idempotent produc er.
23/04/03 04:37:44 INFO authenticator.AbstractLogin: [main]: Successfully logged in.
23/04/03 04:37:44 INFO kerberos.KerberosLogin: [kafka-kerberos-refresh-thread-null]: [Principal=null]: TGT refresh thread sta rted.
23/04/03 04:37:44 INFO kerberos.KerberosLogin: [kafka-kerberos-refresh-thread-null]: [Principal=null]: TGT valid starting at: 2023-04-03T02:52:45.000-0400
23/04/03 04:37:44 INFO kerberos.KerberosLogin: [kafka-kerberos-refresh-thread-null]: [Principal=null]: TGT expires: 2023-04-0 4T02:52:45.000-0400
23/04/03 04:37:44 INFO kerberos.KerberosLogin: [kafka-kerberos-refresh-thread-null]: [Principal=null]: TGT refresh sleeping u ntil: 2023-04-03T23:06:05.063-0400
23/04/03 04:37:44 INFO utils.AppInfoParser: [main]: Kafka version: 3.1.1.7.1.8.0-801
23/04/03 04:37:44 INFO utils.AppInfoParser: [main]: Kafka commitId: 15839ba4eb998a33
23/04/03 04:37:44 INFO utils.AppInfoParser: [main]: Kafka startTimeMs: 1680511064283
>23/04/03 04:37:44 INFO clients.Metadata: [kafka-producer-network-thread | console-producer]: [Producer clientId=console-prod ucer] Cluster ID: 7vkx3ceERrKii_vcW_gViQ
23/04/03 04:37:44 INFO internals.TransactionManager: [kafka-producer-network-thread | console-producer]: [Producer clientId=c onsole-producer] ProducerId set to 5 with epoch 0
23/04/03 04:37:48 INFO clients.Metadata: [kafka-producer-network-thread | console-producer]: [Producer clientId=console-produ cer] Resetting the last seen epoch of partition topic001-0 to 0 since the associated topicId changed from null to MyVuTpA9Tfay osq_QihlwA
>
>hello
>world Please help me out with this issue and feel free to tell me if I need to provide more information. Thank you.
... View more
Labels:
- Labels:
-
Apache Kafka
-
Cloudera on premises
-
Kerberos
04-02-2023
08:11 PM
@ChethanYM Could you please explain further why spark can read Hive managed table by pass this parameter? Thank you very much.
... View more
04-02-2023
08:10 PM
Hi all, I have HDFS service running on my CDP 7.1.8 private cloud base cluster with Kerberos enabled. Recently, I got two issues with my HDFS NameNode, here is the screen capture: The first one The second one: When looking into the role log, it shows Could anyone point out the root cause and the solution for this issue for me please? Thanks in advance. Please let me know if I need to provide more information.
... View more
Labels:
- Labels:
-
Cloudera on premises
-
HDFS
04-02-2023
07:40 PM
@ChethanYM Thank you for your reply. I tried your suggestion by recreating the spark session >>> conf = spark.sparkContext._conf.setAll([('spark.sql.htl.check','false'), ('mapreduce.input.fileinputformat.input.dir.recursive','true')])
>>> spark.sparkContext.stop()
>>> spark = SparkSession.builder.config(conf=conf).getOrCreate() It works fine. Thank you very much.
... View more
03-30-2023
10:50 PM
Hi all, I am practicing spark. When using pyspark to query table in Hive, I can retrieve the data from an external table but query a internal table. Here is the error message: >>> spark.read.table("exams").count()
23/03/30 22:28:50 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
Hive Session ID = eb0a9583-da34-4c85-9a1b-db790d126fb1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/pyspark/sql/readwriter.py", line 301, in table
return self._df(self._jreader.table(tableName))
File "/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
File "/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u'\nSpark has no access to table `default`.`exams`. Clients can access this table only ifMANAGEDINSERTWRITE,HIVEMANAGESTATS,HIVECACHEINVALIDATE,CONNECTORWRITE.\nThis table may be a Hive-managed ACID table, or require some other capability that Spark\ncurrently does not implement;' I know that spark cannot read a ACID Hive table. it there any work around? Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
03-30-2023
10:41 PM
1 Kudo
I managed to fix the issue. As I am using CDSW template, the library install be default is "sklearn". The correct library name should be "scikit-learn".
... View more
03-30-2023
07:25 PM
1 Kudo
Thank you @RangaReddy, I managed to solve the problem using your advice. Thank you very much.
... View more
03-29-2023
08:58 PM
@nikhilm Thank you for your reply. May I know what should I input for nameserviceXYZ? Please give some example for me if possible.
... View more
03-29-2023
08:52 PM
Hi all, I am practicing the feature in CDSW. When trying to do an experiment, I got an error in the session as below: -------error------- import pickle import cdsw model = pickle.load(open('model.pkl', 'rb')) ModuleNotFoundError: No module named 'sklearn' ModuleNotFoundError Traceback (most recent call last) /tmp/ipykernel_115/2912545845.py in <module> ----> 1 model = pickle.load(open('model.pkl', 'rb')) ModuleNotFoundError: No module named 'sklearn' Engine exited with status 1. -------error------- However, in the Build session in Experiment, I see Sklearn is installed in the docker container. -----build-log----- Step 1/5 : FROM docker.repository.cloudera.com/cloudera/cdsw/ml-runtime-workbench-python3.8-standard:2022.11.2-b2 ---> 7ffae291c607 Step 2/5 : WORKDIR /home/cdsw ---> 22e8b9772338 Removing intermediate container 404b7da93746 Step 3/5 : COPY sources /home/cdsw ---> 90d2c92ea316 Removing intermediate container 959005050d3c Step 4/5 : RUN su cdsw -c "mkdir -p ${R_LIBS_USER:-/home/cdsw/R}" && chown -R cdsw:cdsw /home/cdsw && printf "%s\n" 'export ALL_PROXY="" HTTPS_PROXY="" HTTP_PROXY="" MAX_TEXT_LENGTH="9999999" NO_PROXY="" PYTHONPATH="/usr/local/lib/python2.7/site-packages:/usr/local/lib/python3.6/site-packages:/usr/local/lib/anaconda_python3/site-packages" all_proxy="" http_proxy="" https_proxy="" no_proxy="" && /bin/bash --login -c "${1}"' > /tmp/.buildenv && chmod u+x /tmp/.buildenv && chown cdsw:cdsw /tmp/.buildenv && chmod u+x "/home/cdsw/cdsw-build.sh" && su cdsw -c "PATH=${PATH} /tmp/.buildenv /home/cdsw/cdsw-build.sh" && : ---> Running in 0c052f60686b Collecting sklearn Downloading sklearn-0.0.post1.tar.gz (3.6 kB) Building wheels for collected packages: sklearn Building wheel for sklearn (setup.py): started Building wheel for sklearn (setup.py): finished with status 'done' Created wheel for sklearn: filename=sklearn-0.0.post1-py3-none-any.whl size=2343 sha256=8dbc420f70aee919b1d2719ee53b89f5822502f59707d87c14362581489fdccb Stored in directory: /home/cdsw/.cache/pip/wheels/14/25/f7/1cc0956978ae479e75140219088deb7a36f60459df242b1a72 Successfully built sklearn Installing collected packages: sklearn Successfully installed sklearn-0.0.post1 ---> 04c71dc6db3d Removing intermediate container 0c052f60686b Step 5/5 : CMD /bin/bash ---> Running in 561a448372ae ---> 4dcc7434f811 Removing intermediate container 561a448372ae Successfully built 4dcc7434f811 Start Pushing image to [100.77.0.117:5000/4fcc178a-6e6e-4ed1-bd5b-7743775f9a5a] Finish Pushing image to [100.77.0.117:5000/4fcc178a-6e6e-4ed1-bd5b-7743775f9a5a] -----build-log----- Therefore, I have no idea why it fails. Could someone point out the reason for me please? Thank you.
... View more
Labels:
- Labels:
-
Cloudera AI Workbench