About AndyTech

AndyTech · ‎04-15-2021

I need to understand, Why we can not create a parameterized view in impala. Whereas in Hive we can do the same, Can anyone explain WHY? Are there any workaround this limitation?

AndyTech · ‎02-12-2020

Thanks @stevenmatison I am using Parquet format, I tried with ORC not a significant difference, then I changed following setting as follows: Not knowing a lot on the following settings but based on my research. I am not using partitions yet. set hive.cbo.enable=true; set hive.compute.query.using.stats=true; set hive.stats.fetch.column.stats=true; set hive.stats.fetch.partition.stats=true; set hive.vectorized.execution = ture set hive.vectorized.execution.enabled = true also I changed following execution engine set hive.execution.engine = spark I think changing engine to spark made a lot of difference.... Now query is running from 2.48 min to 15 sec I am quite satisfied with current performance but I would sure appreciate other advise for me and for the community. Thanks and appreciate you response. Andy

AndyTech · ‎02-11-2020

question... I get csv file I convert that into parquet in hdfs., copy that file to Hive table location....Hive table can see the file and can query the data successfully.... All is good... but question is how do I make that hive table perform faster ... empid is the is Pkey on the source. Table is External Table...if I were to create Partition do I create on Empid (primary key) ? Do I have to create new table ... what are my options?

AndyTech · ‎01-28-2020

Need help on Kafka on Cloudera. I wrote a program in pySpark in PyCharm it works good. from kafka import KafkaProducer from kafka.errors import KafkaError producer = KafkaProducer(bootstrap_servers=['192.168.56.103:9092']) tes = producer.send('my-first-topic', "this message from pyspark") producer.flush() but when I run in my Linux Cloudera machine I get File "/home/cloudera/kafka/kproducer.py", line 1, in <module> from kafka import KafkaProducer ImportError: No module named kafka using command spark2-submit kproducer.py

AndyTech · ‎01-22-2020

[cloudera@quickstart ~]$ kafka-console-consumer --bootstrap-server 192.168.56.103:9092 --topic my-first-topic --from-beginning 20/01/22 07:19:07 INFO utils.Log4jControllerRegistration$: Registered kafka:type=kafka.Log4jController MBean 20/01/22 07:19:07 INFO consumer.ConsumerConfig: ConsumerConfig values: auto.commit.interval.ms = 5000 auto.offset.reset = earliest bootstrap.servers = [192.168.56.103:9092] check.crcs = true client.dns.lookup = default client.id = connections.max.idle.ms = 540000 default.api.timeout.ms = 60000 enable.auto.commit = false exclude.internal.topics = true fetch.max.bytes = 52428800 fetch.max.wait.ms = 500 fetch.min.bytes = 1 group.id = console-consumer-12294 heartbeat.interval.ms = 3000 interceptor.classes = [] internal.leave.group.on.close = true isolation.level = read_uncommitted key.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer max.partition.fetch.bytes = 1048576 max.poll.interval.ms = 300000 max.poll.records = 500 metadata.max.age.ms = 300000 metric.reporters = [] metrics.num.samples = 2 metrics.recording.level = INFO metrics.sample.window.ms = 30000 partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor] receive.buffer.bytes = 65536 reconnect.backoff.max.ms = 1000 reconnect.backoff.ms = 50 request.timeout.ms = 30000 retry.backoff.ms = 100 sasl.client.callback.handler.class = null sasl.jaas.config = null sasl.kerberos.kinit.cmd = /usr/bin/kinit sasl.kerberos.min.time.before.relogin = 60000 sasl.kerberos.service.name = null sasl.kerberos.ticket.renew.jitter = 0.05 sasl.kerberos.ticket.renew.window.factor = 0.8 sasl.login.callback.handler.class = null sasl.login.class = null sasl.login.refresh.buffer.seconds = 300 sasl.login.refresh.min.period.seconds = 60 sasl.login.refresh.window.factor = 0.8 sasl.login.refresh.window.jitter = 0.05 sasl.mechanism = GSSAPI security.protocol = PLAINTEXT send.buffer.bytes = 131072 session.timeout.ms = 10000 ssl.cipher.suites = null ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1] ssl.endpoint.identification.algorithm = null ssl.key.password = null ssl.keymanager.algorithm = SunX509 ssl.keystore.location = null ssl.keystore.password = null ssl.keystore.type = JKS ssl.protocol = TLS ssl.provider = null ssl.secure.random.implementation = null ssl.trustmanager.algorithm = PKIX ssl.truststore.location = null ssl.truststore.password = null ssl.truststore.type = JKS value.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer 20/01/22 07:19:07 INFO utils.AppInfoParser: Kafka version: 2.2.1-kafka-4.1.0 20/01/22 07:19:07 INFO utils.AppInfoParser: Kafka commitId: unknown 20/01/22 07:19:07 INFO consumer.KafkaConsumer: [Consumer clientId=consumer-1, groupId=console-consumer-12294] Subscribed to topic(s): my-first-topic 20/01/22 07:19:08 INFO clients.Metadata: Cluster ID: o9wVlIHdTT6naUOAtWJRRw

AndyTech · ‎01-22-2020

Folks, I just installed Kafka on my Quickstart Cloudera and trying to setup Topics Producers & consumers.. I can create the topic, But when I start producer CLI and Consumer CLI I get utils.AppInfoParser: Kafka commitId: unknown I am not using Kerberos or Sentry Pls help

Online	Offline
Last Visited	‎04-15-2021 02:02 PM

Member Since	‎12-29-2019 07:08 AM
Last Visited	‎04-15-2021 02:02 PM
Posts	11

Cloudera Community

Why we can not create a parameterized view in Impa...

Re: how do I make that external hive table perform...

how do I make that external hive table perform fas...

Cloudera kafka pyspark KafkaProducer (ImportError:...

Re: KAFKA - utils.AppInfoParser: Kafka commitId: u...

KAFKA - utils.AppInfoParser: Kafka commitId: unkno...