About dmueller1607

dmueller1607 · ‎11-25-2019

To answer my own question: Since I'm using multiple partitions for the Kafka topic, Spark uses more executors to process the data. Also Hive/Tez creates as many worker containers as the topic contains partitions.

dmueller1607 · ‎11-24-2019

I wrote a Kafka producer, that sends some simulated data to a Kafka stream (replication-factor 3, one partition). Now, I want to access this data by using Hive and/or Spark Streaming. First approach: Using an external Hive table with KafkaStorageHandler: CREATE EXTERNAL TABLE mydb.kafka_timeseriestest ( description string, version int, ts timestamp, varname string, varvalue float ) STORED BY 'org.apache.hadoop.hive.kafka.KafkaStorageHandler' TBLPROPERTIES ( "kafka.topic" = "testtopic", "kafka.bootstrap.servers"="server1:6667,server2:6667,server3:6667" ); -- e.g. SELECT max(varvalue) from mydb.kafka_timeseriestest; -- takes too long, and only one Tez task is running Second approach: Writing a Spark Streaming app, that accesses the Kafka topic: // started with 10 executors, but only one executor is active ... JavaInputDStream<ConsumerRecord<String, String>> stream = KafkaUtils.createDirectStream(jssc, LocationStrategies.PreferConsistent(), ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams)); ... In both cases, only one Tez/Spark worker is active. Therefore reading all data (~500 million entries) takes a very long time. How can I increase the performance? Is the issue caused by the one-partition topic? If yes, is there a rule of thumb according to which the number of partitions should be determined? I'm using a HDP 3.1 cluster, running Spark, Hive and Kafka on multiple nodes: dataNode1 - dataNode3: Hive + Spark + Kafka broker dataNode4 - dataNode8: Hive + Spark

dmueller1607 · ‎11-06-2017

Hi @Bryan Bende this tutorial is really helpful, thank you! For me, everything is working, except the "SPNEGO" part: a) When I open "https://myhost.de:9445/nifi" (without a kinit before), my Kerberos Client asks for authentication (which looks good). When I enter the principal and password, it continues with step b. b) When I already made a "kinit" before opening my browser and entering "https://myhost.de:9445/nifi", I always get the username / password prompt as shown in section "Kerberos Login". What am I missing here? I configured the following settings in my Firefox browser: [Windows only] network.auth.use-sspi = false network.negotiate-auth.delegation-uris = https://myhost.de:9445 network.negotiate-auth.trusted-uris = https://myhost.de:9445 Tested it on Centos7, Ubuntu and Windows, I always get the login screen, instead of skipping it after the "kinit". Can you help?

Online	Offline
Last Visited	‎11-25-2019 04:11 AM

Member Since	‎04-24-2017 12:08 PM
Last Visited	‎11-25-2019 04:11 AM
Posts	106
Kudos received	13

Cloudera Community

Re: Spark Streaming / Hive + Kafka: Only one Worke...

Re: Spark Streaming / Hive + Kafka: Only one Worke...

Spark Streaming / Hive + Kafka: Only one Worker ac...

Re: Apache NiFi 1.0.0 Kerberos Authentication