About vmurakami

vmurakami · ‎07-05-2018

Hi @heta desai! Yes you can 🙂 Here's a link with more details: http://druid.io/docs/latest/ingestion/data-formats.html Hope this helps!

vmurakami · ‎07-04-2018

Hey @Javert Kirilov ! Sorry for the long delay, so regarding your issue. If you really need to clean up your data plus structure then to guarantee, I'd drop the table and truncate it. Now about the issue, it's kinda strange to me. You mentioned that you're using pyspark right? So I made a research here, and saw smtg interesting (not sure if this is your case, as you're using SQLContext). https://spark.apache.org/docs/1.6.1/sql-programming-guide.html#saving-to-persistent-tables Anyways, hope this helps you!

vmurakami · ‎07-02-2018

Good to know @Souveek Ray ! Please if the issue is solved, I'd kindly ask you to accept as an answer. Doing this will help other users to find the answer and will engage the contributors to keep doing the good job 🙂

vmurakami · ‎07-02-2018

Hey @Pankaj Singh! Not sure if I get it right, but, how are you consuming/producing to/from kafka topics? Through kafka-console-consumer/kafka-console-producer? Could you share with us the output from the following commands? [root@node2 ~]# kafka-topics.sh --zookeeper node1:2181,node2:2181,node3:2181 --describe --topic vini Topic:vini PartitionCount:3 ReplicationFactor:3 Configs:message.timestamp.type=LogAppendTime Topic: vini Partition: 0 Leader: 1001 Replicas: 1001,1002,1003 Isr: 1002,1003,1001 Topic: vini Partition: 1 Leader: 1002 Replicas: 1002,1003,1001 Isr: 1002,1003,1001 Topic: vini Partition: 2 Leader: 1003 Replicas: 1003,1001,1002 Isr: 1002,1003,1001 [root@node2 ~]# kafka-console-producer.sh --broker-list node2:6667 --topic vini >testing testing >this is a test >hcc [root@node2 ~]# kafka-console-consumer.sh --bootstrap-server node2:6667 --topic vini --from-beginning this is a test testing testing hcc [root@node2 ~]# zookeeper-client [zk: localhost:2181(CONNECTED) 9] ls /brokers/topics/vini/partitions [0, 1, 2] Hope this helps!

vmurakami · ‎07-02-2018

Hey @Santanu Ghosh! I never used Jethro, but have you tried to set the JD_HIVE_JDBC_CLASSPATH manually? Like export JD_HIVE_JDBC_CLASSPATH=/usr/hdp/current/hive-client/lib/hive-jdbc*.jar in your session. Is there anything else on the logs? And the other variables from "jd-hadoop-env.sh" are being set? Hope this helps

vmurakami · ‎06-29-2018

Hey @Simon Jespersen! Try to change your //tns:root/tns:second/tns:third/tns:four for root/second/four I made a simple test here: hive> select xpath_string('<tns:root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tns="http://test.com" xmlns="http://xmlns.oracle.com/pcbpel/adapter/noname"> <tns:second><tns:third>10379</tns:third><tns:four>stats</tns:four><tns:five>1</tns:five><tns:six><tns:DokumentFilIndhold>K</tns:DokumentFilIndhold></tns:six><tns:seven>2018-06-28T12:57:36</tns:seven><tns:eight>2018-06-28T13:02:28</tns:eight></tns:second></tns:root>','root/second/four'); OK stats Hope this helps!

vmurakami · ‎06-28-2018

Hi @Prathamesh H! Sorry about my delay, so regarding your issue. Hmm for partitioned table, afaik, you'll have to summarize per partition, unfortunately 😞 I just heard around, that it's possible to get the size on Hive 2.0. I'm not sure, i didn't test it. One last thing, the command that i had sent to you, in this case (for partitioned table) would be: analyze table <table> partition(col1,col2) compute statistics; Hope this helps!

vmurakami · ‎06-28-2018

Hi @Satish Anjaneyappa! Hm, what about the ANALYZE TABLE <TBL_NAME> COMPUTE STATISTICS? I made a test here, and it's doing good so far: --TABLE HAS 50 ROWS! 0: jdbc:hive2://node3:10000/default> CREATE EXTERNAL TABLE `salaries`( 0: jdbc:hive2://node3:10000/default> `gender` string, 0: jdbc:hive2://node3:10000/default> `age` int, 0: jdbc:hive2://node3:10000/default> `salary` double, 0: jdbc:hive2://node3:10000/default> `zip` int) 0: jdbc:hive2://node3:10000/default> ROW FORMAT DELIMITED 0: jdbc:hive2://node3:10000/default> FIELDS TERMINATED BY ',' 0: jdbc:hive2://node3:10000/default> STORED AS INPUTFORMAT 0: jdbc:hive2://node3:10000/default> 'org.apache.hadoop.mapred.TextInputFormat' 0: jdbc:hive2://node3:10000/default> OUTPUTFORMAT 0: jdbc:hive2://node3:10000/default> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 0: jdbc:hive2://node3:10000/default> LOCATION 0: jdbc:hive2://node3:10000/default> 'hdfs://Admin-TrainingNS/apps/hive/warehouse/salaries' 0: jdbc:hive2://node3:10000/default> TBLPROPERTIES ( 0: jdbc:hive2://node3:10000/default> 'COLUMN_STATS_ACCURATE'='true', 0: jdbc:hive2://node3:10000/default> 'numFiles'='1', 0: jdbc:hive2://node3:10000/default> 'numRows'='0', 0: jdbc:hive2://node3:10000/default> 'rawDataSize'='732', 0: jdbc:hive2://node3:10000/default> 'totalSize'='781', 0: jdbc:hive2://node3:10000/default> 'transient_lastDdlTime'='1529819960'); No rows affected (0.443 seconds) 0: jdbc:hive2://node3:10000/default> explain select count(1) from salaries; +------------------------------------------------------------------------------------------------------+--+ | Explain | +------------------------------------------------------------------------------------------------------+--+ | STAGE DEPENDENCIES: | | Stage-1 is a root stage | | Stage-0 depends on stages: Stage-1 | | | | STAGE PLANS: | | Stage: Stage-1 | | Map Reduce | | Map Operator Tree: | | TableScan | | alias: salaries | | Statistics: Num rows: 1 Data size: 732 Basic stats: COMPLETE Column stats: COMPLETE | | Select Operator | | Statistics: Num rows: 1 Data size: 732 Basic stats: COMPLETE Column stats: COMPLETE | | Group By Operator | | aggregations: count(1) | | mode: hash | | outputColumnNames: _col0 | | Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE | | Reduce Output Operator | | sort order: | | Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE | | value expressions: _col0 (type: bigint) | | Reduce Operator Tree: | | Group By Operator | | aggregations: count(VALUE._col0) | | mode: mergepartial | | outputColumnNames: _col0 | | Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE | | File Output Operator | | compressed: false | | Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE | | table: | | input format: org.apache.hadoop.mapred.TextInputFormat | | output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | | serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | | | Stage: Stage-0 | | Fetch Operator | | limit: -1 | | Processor Tree: | | ListSink | | | +------------------------------------------------------------------------------------------------------+--+ 42 rows selected (0.255 seconds) 0: jdbc:hive2://node3:10000/default> explain select * from salaries; +----------------------------------------------------------------------------------------------------------+--+ | Explain | +----------------------------------------------------------------------------------------------------------+--+ | STAGE DEPENDENCIES: | | Stage-0 is a root stage | | | | STAGE PLANS: | | Stage: Stage-0 | | Fetch Operator | | limit: -1 | | Processor Tree: | | TableScan | | alias: salaries | | Statistics: Num rows: 6 Data size: 732 Basic stats: COMPLETE Column stats: NONE | | Select Operator | | expressions: gender (type: string), age (type: int), salary (type: double), zip (type: int) | | outputColumnNames: _col0, _col1, _col2, _col3 | | Statistics: Num rows: 6 Data size: 732 Basic stats: COMPLETE Column stats: NONE | | ListSink | | | +----------------------------------------------------------------------------------------------------------+--+ 17 rows selected (0.232 seconds) 0: jdbc:hive2://node3:10000/default> desc salaries; +-----------+------------+----------+--+ | col_name | data_type | comment | +-----------+------------+----------+--+ | gender | string | | | age | int | | | salary | double | | | zip | int | | +-----------+------------+----------+--+ 4 rows selected (0.426 seconds) 0: jdbc:hive2://node3:10000/default> explain select age from salaries; +------------------------------------------------------------------------------------------------+--+ | Explain | +------------------------------------------------------------------------------------------------+--+ | STAGE DEPENDENCIES: | | Stage-0 is a root stage | | | | STAGE PLANS: | | Stage: Stage-0 | | Fetch Operator | | limit: -1 | | Processor Tree: | | TableScan | | alias: salaries | | Statistics: Num rows: 183 Data size: 732 Basic stats: COMPLETE Column stats: NONE | | Select Operator | | expressions: age (type: int) | | outputColumnNames: _col0 | | Statistics: Num rows: 183 Data size: 732 Basic stats: COMPLETE Column stats: NONE | | ListSink | | | +------------------------------------------------------------------------------------------------+--+ 0: jdbc:hive2://node3:10000/default> analyze table salaries compute statistics ; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : number of splits:1 INFO : Submitting tokens for job: job_1529940007017_0004 INFO : The url to track the job: http://node4:8088/proxy/application_1529940007017_0004/ INFO : Starting Job = job_1529940007017_0004, Tracking URL = http://node4:8088/proxy/application_1529940007017_0004/ INFO : Kill Command = /usr/hdp/2.6.5.0-292/hadoop/bin/hadoop job -kill job_1529940007017_0004 INFO : Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0 INFO : 2018-06-28 16:38:56,357 Stage-0 map = 0%, reduce = 0% INFO : 2018-06-28 16:39:02,796 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 2.93 sec INFO : MapReduce Total cumulative CPU time: 2 seconds 930 msec INFO : Ended Job = job_1529940007017_0004 INFO : Table default.salaries stats: [numFiles=1, numRows=50, totalSize=781, rawDataSize=732] No rows affected (16.338 seconds) 0: jdbc:hive2://node3:10000/default> explain select * from salaries; +----------------------------------------------------------------------------------------------------------+--+ | Explain | +----------------------------------------------------------------------------------------------------------+--+ | STAGE DEPENDENCIES: | | Stage-0 is a root stage | | | | STAGE PLANS: | | Stage: Stage-0 | | Fetch Operator | | limit: -1 | | Processor Tree: | | TableScan | | alias: salaries | | Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE | | Select Operator | | expressions: gender (type: string), age (type: int), salary (type: double), zip (type: int) | | outputColumnNames: _col0, _col1, _col2, _col3 | | Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE | | ListSink | | | +----------------------------------------------------------------------------------------------------------+--+ 17 rows selected (0.226 seconds) 0: jdbc:hive2://node3:10000/default> explain select age from salaries; +-----------------------------------------------------------------------------------------------+--+ | Explain | +-----------------------------------------------------------------------------------------------+--+ | STAGE DEPENDENCIES: | | Stage-0 is a root stage | | | | STAGE PLANS: | | Stage: Stage-0 | | Fetch Operator | | limit: -1 | | Processor Tree: | | TableScan | | alias: salaries | | Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE | | Select Operator | | expressions: age (type: int) | | outputColumnNames: _col0 | | Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE | | ListSink | | | +-----------------------------------------------------------------------------------------------+--+ 17 rows selected (0.227 seconds) And also, try to set the set hive.stats.autogather=true; Hope this helps!

vmurakami · ‎06-28-2018

Hi @Hamilton Castro! Could you check your ZK namespace under the hiveserver2 path? Would be like this: [root@node3 ~]# zookeeper-client Connecting to localhost:2181 WatchedEvent state:SyncConnected type:None path:null [zk: localhost:2181(CONNECTED) 0] ls /hiveserver2 [serverUri=node3:10000;version=1.2.1000.2.6.5.0-292;sequence=0000000013] btw, if your zk hosts it's not under the hosts as HS2, check if HS2 can reach 2181 port from ZK host Hope this helps!

vmurakami · ‎06-28-2018

Hi @Javert Kirilov! Could you share the describe formatted output from your table? And just asking, but its a managed table? Or external?

Online	Offline
Last Visited	‎12-23-2018 04:33 AM

Member Since	‎05-07-2018 06:05 PM
Last Visited	‎12-23-2018 04:33 AM
Posts	331
Kudos received	45

Cloudera Community

Re: Minifi not connecting to Nifi - remote instanc...

Re: getsnmp attribute

Re: XML and Hive parsing error with Serde.

Re: Ranger and HDFS over SSL

Re: livy2 zepplin issue

Re: can we store csv file into druid ?

Re: HIVE: dropping the table does not remove data

Re: Date Interval in Hive

Re: How to setup kafka cluster on hortonworks plat...

Re: Connect Hive from Jethro Client

Re: hive query with xpath on xml with namespace

Re: Unable to check size of Hive Table

Re: Hive explain plan fetching wrong row numbers

Re: How to resolve Unable to read HiveServer2 conf...

Re: HIVE: dropping the table does not remove data