About rajkumar_singh

rajkumar_singh · ‎06-02-2016

numpy is missing here,install numpy using pip install numpy

rajkumar_singh · ‎06-01-2016

is there any zk client is running on your local machine which is trying to connect to zk-server running on vm?

rajkumar_singh · ‎06-01-2016

all these value are getting picked up from your env variable http://grepcode.com/file/repo1.maven.org/maven2/org.apache.zookeeper/zookeeper/3.3.1/org/apache/zookeeper/Environment.java#Environment please check your env variable

rajkumar_singh · ‎06-01-2016

if you commit the offset based on timestamp you can start consuming from kafka at next batch cycle like this --commit last consumed consumer = KafkaConsumer(bootstrap_servers='localhost:9092') tp = TopicPartition(topic, partition) consumer.seek(tp, end) consumer.commit() --now start consuming offset from Kafka when the job restarts at the next batch cycle: consumer.assign([tp]) start = consumer.committed(tp)

rajkumar_singh · ‎06-01-2016

with "transactional"="true" you are not able to compile this DDL statement, transactional table wont allow sorted column, are you able to successfully execute this statement?

rajkumar_singh · ‎06-01-2016

@Sanjeev Verma you can use following ways to get the external configuration inside the topplogy 1: pass the arguments like this storm jar storm-jar topology-name -c sKey=sValue -c key1=value1 -c key2=value2 >/tmp/storm.txt 2: Create a simple java resource file (properties files) and pass it as arguments to your topology main class, in main method read the properties from the main file and build the storm configuration object using conf.put() 3: create separate yaml file read it through the Utils method provided by storm api,look for more documentation https://nathanmarz.github.io/storm/doc/backtype/storm/utils/Utils.html Utils.findAndReadConfigFile()

rajkumar_singh · ‎06-01-2016

as you are using Transactional Table you can not take advantage of sort by on fechaoprcnf column.Apart from partitioning try to create storage index on the table using tblproperties ("orc.create.index"="true","orc.compress"="ZLIB", "orc.stripe.size"="268435456", "orc.row.index.stride"="10000") -- orc stripe and index.stride value in this case are default try to tune these value and compare performace results.

rajkumar_singh · ‎06-01-2016

considering you are using orc table,if you are not using ACID table it will be good if you can modify the table DDL clustered by (codnrbeenf) sorted by (fechaoprcnf). further to this you can create storage based index on orc table by specifying orc.create.index=true.

rajkumar_singh · ‎06-01-2016

@akeezhadath it seems that you are not calling action which actually don't trigger the job. spark actions are lazily evaluted ,can you run some terminal operation on the filterwords like count or collect and see if you are able to see the incremented value of accumulators.

rajkumar_singh · ‎05-31-2016

looking at this exception java.lang.NoSuchMethodError: org.apache.hadoop.hive.shims.HadoopShims.setHadoopSessionContext(Ljava/lang/String;)V it seems that there is wrong version of HadoopShims jar is available in your classpath which dont have setHadoopSessionContext implementation in it or it has different method signature. to troubleshoot this problem lsof -p <HS2 process id> | grep -i jar |awk '{ print $9 }' > class-jar.txt for jar in `cat class-jar.txt` ; do echo "$jar" ; jar -tvf "$jar" | grep --color 'org.apache.hadoop.hive.shims.HadoopShims' ; done look out the jars(there could me multiple shim jar available) which contains this class and then extract this class from jar for each jar which contains HadoopShims do jar xvf <jar> org/apache/hadoop/hive/shims/HadoopShims run javap org.apache.hadoop.hive.shims.HadoopShims to verify the method setHadoopSessionContext availbility and method signature

Online	Offline
Last Visited	‎08-23-2021 03:30 PM

Member Since	‎04-25-2016 07:57 AM
Last Visited	‎08-23-2021 03:30 PM
Posts	579
Kudos received	568

Cloudera Community

Re: Why Hive Compaction is failing ?

Re: how to setup queue name for squirrel

Re: How to set the logging level of the hiveserver...

Re: Hive Tez Client Memory

Re: Resource Manager API ?

Re: pyspark ImportError: No module named numpy

Re: Wrong Zookeeper Client Information listed in E...

Re: Wrong Zookeeper Client Information listed in E...

Re: How can i consume kafka offsets based on times...

Re: slow query in hive

Re: How to pass external configuration properties ...

Re: slow query in hive

Re: slow query in hive

Re: Spark Java Accumulator not incrementing

Re: hive error connection