Member since
12-29-2019
11
Posts
0
Kudos Received
0
Solutions
04-15-2021
05:47 AM
I need to understand, Why we can not create a parameterized view in impala. Whereas in Hive we can do the same, Can anyone explain WHY? Are there any workaround this limitation?
... View more
Labels:
- Labels:
-
Apache Impala
02-12-2020
05:43 AM
Thanks @stevenmatison I am using Parquet format, I tried with ORC not a significant difference, then I changed following setting as follows: Not knowing a lot on the following settings but based on my research. I am not using partitions yet. set hive.cbo.enable=true; set hive.compute.query.using.stats=true; set hive.stats.fetch.column.stats=true; set hive.stats.fetch.partition.stats=true; set hive.vectorized.execution = ture set hive.vectorized.execution.enabled = true also I changed following execution engine set hive.execution.engine = spark I think changing engine to spark made a lot of difference.... Now query is running from 2.48 min to 15 sec I am quite satisfied with current performance but I would sure appreciate other advise for me and for the community. Thanks and appreciate you response. Andy
... View more
02-11-2020
11:35 AM
question... I get csv file I convert that into parquet in hdfs., copy that file to Hive table location....Hive table can see the file and can query the data successfully....
All is good... but question is how do I make that hive table perform faster ... empid is the is Pkey on the source.
Table is External Table...if I were to create Partition do I create on Empid (primary key) ? Do I have to create new table ... what are my options?
... View more
Labels:
- Labels:
-
Apache Hive
01-28-2020
11:41 AM
Need help on Kafka on Cloudera. I wrote a program in pySpark in PyCharm it works good.
from kafka import KafkaProducer from kafka.errors import KafkaError producer = KafkaProducer(bootstrap_servers=['192.168.56.103:9092']) tes = producer.send('my-first-topic', "this message from pyspark") producer.flush()
but when I run in my Linux Cloudera machine I get
File "/home/cloudera/kafka/kproducer.py", line 1, in <module> from kafka import KafkaProducer ImportError: No module named kafka
using command spark2-submit kproducer.py
... View more
Labels:
- Labels:
-
Apache Kafka
-
Apache Spark
01-22-2020
07:41 AM
[cloudera@quickstart ~]$ kafka-console-consumer --bootstrap-server 192.168.56.103:9092 --topic my-first-topic --from-beginning 20/01/22 07:19:07 INFO utils.Log4jControllerRegistration$: Registered kafka:type=kafka.Log4jController MBean 20/01/22 07:19:07 INFO consumer.ConsumerConfig: ConsumerConfig values: auto.commit.interval.ms = 5000 auto.offset.reset = earliest bootstrap.servers = [192.168.56.103:9092] check.crcs = true client.dns.lookup = default client.id = connections.max.idle.ms = 540000 default.api.timeout.ms = 60000 enable.auto.commit = false exclude.internal.topics = true fetch.max.bytes = 52428800 fetch.max.wait.ms = 500 fetch.min.bytes = 1 group.id = console-consumer-12294 heartbeat.interval.ms = 3000 interceptor.classes = [] internal.leave.group.on.close = true isolation.level = read_uncommitted key.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer max.partition.fetch.bytes = 1048576 max.poll.interval.ms = 300000 max.poll.records = 500 metadata.max.age.ms = 300000 metric.reporters = [] metrics.num.samples = 2 metrics.recording.level = INFO metrics.sample.window.ms = 30000 partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor] receive.buffer.bytes = 65536 reconnect.backoff.max.ms = 1000 reconnect.backoff.ms = 50 request.timeout.ms = 30000 retry.backoff.ms = 100 sasl.client.callback.handler.class = null sasl.jaas.config = null sasl.kerberos.kinit.cmd = /usr/bin/kinit sasl.kerberos.min.time.before.relogin = 60000 sasl.kerberos.service.name = null sasl.kerberos.ticket.renew.jitter = 0.05 sasl.kerberos.ticket.renew.window.factor = 0.8 sasl.login.callback.handler.class = null sasl.login.class = null sasl.login.refresh.buffer.seconds = 300 sasl.login.refresh.min.period.seconds = 60 sasl.login.refresh.window.factor = 0.8 sasl.login.refresh.window.jitter = 0.05 sasl.mechanism = GSSAPI security.protocol = PLAINTEXT send.buffer.bytes = 131072 session.timeout.ms = 10000 ssl.cipher.suites = null ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1] ssl.endpoint.identification.algorithm = null ssl.key.password = null ssl.keymanager.algorithm = SunX509 ssl.keystore.location = null ssl.keystore.password = null ssl.keystore.type = JKS ssl.protocol = TLS ssl.provider = null ssl.secure.random.implementation = null ssl.trustmanager.algorithm = PKIX ssl.truststore.location = null ssl.truststore.password = null ssl.truststore.type = JKS value.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer 20/01/22 07:19:07 INFO utils.AppInfoParser: Kafka version: 2.2.1-kafka-4.1.0 20/01/22 07:19:07 INFO utils.AppInfoParser: Kafka commitId: unknown 20/01/22 07:19:07 INFO consumer.KafkaConsumer: [Consumer clientId=consumer-1, groupId=console-consumer-12294] Subscribed to topic(s): my-first-topic 20/01/22 07:19:08 INFO clients.Metadata: Cluster ID: o9wVlIHdTT6naUOAtWJRRw
... View more
01-22-2020
07:39 AM
Folks, I just installed Kafka on my Quickstart Cloudera and trying to setup Topics Producers & consumers.. I can create the topic, But when I start producer CLI and Consumer CLI I get
utils.AppInfoParser: Kafka commitId: unknown
I am not using Kerberos or Sentry
Pls help
... View more
Labels:
- Labels:
-
Apache Kafka