Member since
05-07-2018
331
Posts
45
Kudos Received
35
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 9637 | 09-12-2018 10:09 PM | |
| 3763 | 09-10-2018 02:07 PM | |
| 11554 | 09-08-2018 05:47 AM | |
| 4098 | 09-08-2018 12:05 AM | |
| 4942 | 08-15-2018 10:44 PM |
07-05-2018
05:41 AM
Hi @heta desai! Yes you can 🙂 Here's a link with more details: http://druid.io/docs/latest/ingestion/data-formats.html Hope this helps!
... View more
07-04-2018
03:37 PM
Hey @Javert Kirilov ! Sorry for the long delay, so regarding your issue. If you really need to clean up your data plus structure then to guarantee, I'd drop the table and truncate it. Now about the issue, it's kinda strange to me. You mentioned that you're using pyspark right? So I made a research here, and saw smtg interesting (not sure if this is your case, as you're using SQLContext). https://spark.apache.org/docs/1.6.1/sql-programming-guide.html#saving-to-persistent-tables Anyways, hope this helps you!
... View more
07-02-2018
07:17 AM
Good to know @Souveek Ray ! Please if the issue is solved, I'd kindly ask you to accept as an answer. Doing this will help other users to find the answer and will engage the contributors to keep doing the good job 🙂
... View more
07-02-2018
06:50 AM
Hey @Pankaj Singh! Not sure if I get it right, but, how are you consuming/producing to/from kafka topics? Through kafka-console-consumer/kafka-console-producer? Could you share with us the output from the following commands? [root@node2 ~]# kafka-topics.sh --zookeeper node1:2181,node2:2181,node3:2181 --describe --topic vini
Topic:vini PartitionCount:3 ReplicationFactor:3 Configs:message.timestamp.type=LogAppendTime
Topic: vini Partition: 0 Leader: 1001 Replicas: 1001,1002,1003 Isr: 1002,1003,1001
Topic: vini Partition: 1 Leader: 1002 Replicas: 1002,1003,1001 Isr: 1002,1003,1001
Topic: vini Partition: 2 Leader: 1003 Replicas: 1003,1001,1002 Isr: 1002,1003,1001
[root@node2 ~]# kafka-console-producer.sh --broker-list node2:6667 --topic vini
>testing testing
>this is a test
>hcc [root@node2 ~]# kafka-console-consumer.sh --bootstrap-server node2:6667 --topic vini --from-beginning
this is a test
testing testing
hcc
[root@node2 ~]# zookeeper-client
[zk: localhost:2181(CONNECTED) 9] ls /brokers/topics/vini/partitions
[0, 1, 2] Hope this helps!
... View more
07-02-2018
06:33 AM
1 Kudo
Hey @Santanu Ghosh! I never used Jethro, but have you tried to set the JD_HIVE_JDBC_CLASSPATH manually? Like export JD_HIVE_JDBC_CLASSPATH=/usr/hdp/current/hive-client/lib/hive-jdbc*.jar in your session. Is there anything else on the logs? And the other variables from "jd-hadoop-env.sh" are being set? Hope this helps
... View more
06-29-2018
09:20 PM
Hey @Simon Jespersen! Try to change your //tns:root/tns:second/tns:third/tns:four for root/second/four I made a simple test here: hive> select xpath_string('<tns:root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tns="http://test.com" xmlns="http://xmlns.oracle.com/pcbpel/adapter/noname"> <tns:second><tns:third>10379</tns:third><tns:four>stats</tns:four><tns:five>1</tns:five><tns:six><tns:DokumentFilIndhold>K</tns:DokumentFilIndhold></tns:six><tns:seven>2018-06-28T12:57:36</tns:seven><tns:eight>2018-06-28T13:02:28</tns:eight></tns:second></tns:root>','root/second/four');
OK
stats Hope this helps!
... View more
06-28-2018
07:29 PM
Hi @Prathamesh H! Sorry about my delay, so regarding your issue. Hmm for partitioned table, afaik, you'll have to summarize per partition, unfortunately 😞 I just heard around, that it's possible to get the size on Hive 2.0. I'm not sure, i didn't test it. One last thing, the command that i had sent to you, in this case (for partitioned table) would be: analyze table <table> partition(col1,col2) compute statistics; Hope this helps!
... View more
06-28-2018
04:44 PM
Hi @Satish Anjaneyappa! Hm, what about the ANALYZE TABLE <TBL_NAME> COMPUTE STATISTICS? I made a test here, and it's doing good so far: --TABLE HAS 50 ROWS!
0: jdbc:hive2://node3:10000/default> CREATE EXTERNAL TABLE `salaries`(
0: jdbc:hive2://node3:10000/default> `gender` string,
0: jdbc:hive2://node3:10000/default> `age` int,
0: jdbc:hive2://node3:10000/default> `salary` double,
0: jdbc:hive2://node3:10000/default> `zip` int)
0: jdbc:hive2://node3:10000/default> ROW FORMAT DELIMITED
0: jdbc:hive2://node3:10000/default> FIELDS TERMINATED BY ','
0: jdbc:hive2://node3:10000/default> STORED AS INPUTFORMAT
0: jdbc:hive2://node3:10000/default> 'org.apache.hadoop.mapred.TextInputFormat'
0: jdbc:hive2://node3:10000/default> OUTPUTFORMAT
0: jdbc:hive2://node3:10000/default> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
0: jdbc:hive2://node3:10000/default> LOCATION
0: jdbc:hive2://node3:10000/default> 'hdfs://Admin-TrainingNS/apps/hive/warehouse/salaries'
0: jdbc:hive2://node3:10000/default> TBLPROPERTIES (
0: jdbc:hive2://node3:10000/default> 'COLUMN_STATS_ACCURATE'='true',
0: jdbc:hive2://node3:10000/default> 'numFiles'='1',
0: jdbc:hive2://node3:10000/default> 'numRows'='0',
0: jdbc:hive2://node3:10000/default> 'rawDataSize'='732',
0: jdbc:hive2://node3:10000/default> 'totalSize'='781',
0: jdbc:hive2://node3:10000/default> 'transient_lastDdlTime'='1529819960');
No rows affected (0.443 seconds)
0: jdbc:hive2://node3:10000/default> explain select count(1) from salaries;
+------------------------------------------------------------------------------------------------------+--+
| Explain |
+------------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES: |
| Stage-1 is a root stage |
| Stage-0 depends on stages: Stage-1 |
| |
| STAGE PLANS: |
| Stage: Stage-1 |
| Map Reduce |
| Map Operator Tree: |
| TableScan |
| alias: salaries |
| Statistics: Num rows: 1 Data size: 732 Basic stats: COMPLETE Column stats: COMPLETE |
| Select Operator |
| Statistics: Num rows: 1 Data size: 732 Basic stats: COMPLETE Column stats: COMPLETE |
| Group By Operator |
| aggregations: count(1) |
| mode: hash |
| outputColumnNames: _col0 |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE |
| Reduce Output Operator |
| sort order: |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE |
| value expressions: _col0 (type: bigint) |
| Reduce Operator Tree: |
| Group By Operator |
| aggregations: count(VALUE._col0) |
| mode: mergepartial |
| outputColumnNames: _col0 |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE |
| File Output Operator |
| compressed: false |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE |
| table: |
| input format: org.apache.hadoop.mapred.TextInputFormat |
| output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| ListSink |
| |
+------------------------------------------------------------------------------------------------------+--+
42 rows selected (0.255 seconds)
0: jdbc:hive2://node3:10000/default> explain select * from salaries;
+----------------------------------------------------------------------------------------------------------+--+
| Explain |
+----------------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES: |
| Stage-0 is a root stage |
| |
| STAGE PLANS: |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| TableScan |
| alias: salaries |
| Statistics: Num rows: 6 Data size: 732 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: gender (type: string), age (type: int), salary (type: double), zip (type: int) |
| outputColumnNames: _col0, _col1, _col2, _col3 |
| Statistics: Num rows: 6 Data size: 732 Basic stats: COMPLETE Column stats: NONE |
| ListSink |
| |
+----------------------------------------------------------------------------------------------------------+--+
17 rows selected (0.232 seconds)
0: jdbc:hive2://node3:10000/default> desc salaries;
+-----------+------------+----------+--+
| col_name | data_type | comment |
+-----------+------------+----------+--+
| gender | string | |
| age | int | |
| salary | double | |
| zip | int | |
+-----------+------------+----------+--+
4 rows selected (0.426 seconds)
0: jdbc:hive2://node3:10000/default> explain select age from salaries;
+------------------------------------------------------------------------------------------------+--+
| Explain |
+------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES: |
| Stage-0 is a root stage |
| |
| STAGE PLANS: |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| TableScan |
| alias: salaries |
| Statistics: Num rows: 183 Data size: 732 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: age (type: int) |
| outputColumnNames: _col0 |
| Statistics: Num rows: 183 Data size: 732 Basic stats: COMPLETE Column stats: NONE |
| ListSink |
| |
+------------------------------------------------------------------------------------------------+--+
0: jdbc:hive2://node3:10000/default> analyze table salaries compute statistics ;
INFO : Number of reduce tasks is set to 0 since there's no reduce operator
INFO : number of splits:1
INFO : Submitting tokens for job: job_1529940007017_0004
INFO : The url to track the job: http://node4:8088/proxy/application_1529940007017_0004/
INFO : Starting Job = job_1529940007017_0004, Tracking URL = http://node4:8088/proxy/application_1529940007017_0004/
INFO : Kill Command = /usr/hdp/2.6.5.0-292/hadoop/bin/hadoop job -kill job_1529940007017_0004
INFO : Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
INFO : 2018-06-28 16:38:56,357 Stage-0 map = 0%, reduce = 0%
INFO : 2018-06-28 16:39:02,796 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 2.93 sec
INFO : MapReduce Total cumulative CPU time: 2 seconds 930 msec
INFO : Ended Job = job_1529940007017_0004
INFO : Table default.salaries stats: [numFiles=1, numRows=50, totalSize=781, rawDataSize=732]
No rows affected (16.338 seconds)
0: jdbc:hive2://node3:10000/default> explain select * from salaries;
+----------------------------------------------------------------------------------------------------------+--+
| Explain |
+----------------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES: |
| Stage-0 is a root stage |
| |
| STAGE PLANS: |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| TableScan |
| alias: salaries |
| Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: gender (type: string), age (type: int), salary (type: double), zip (type: int) |
| outputColumnNames: _col0, _col1, _col2, _col3 |
| Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE |
| ListSink |
| |
+----------------------------------------------------------------------------------------------------------+--+
17 rows selected (0.226 seconds)
0: jdbc:hive2://node3:10000/default> explain select age from salaries;
+-----------------------------------------------------------------------------------------------+--+
| Explain |
+-----------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES: |
| Stage-0 is a root stage |
| |
| STAGE PLANS: |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| TableScan |
| alias: salaries |
| Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: age (type: int) |
| outputColumnNames: _col0 |
| Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE |
| ListSink |
| |
+-----------------------------------------------------------------------------------------------+--+
17 rows selected (0.227 seconds) And also, try to set the set hive.stats.autogather=true; Hope this helps!
... View more
06-28-2018
04:28 PM
Hi @Hamilton Castro! Could you check your ZK namespace under the hiveserver2 path? Would be like this: [root@node3 ~]# zookeeper-client
Connecting to localhost:2181
WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0] ls /hiveserver2
[serverUri=node3:10000;version=1.2.1000.2.6.5.0-292;sequence=0000000013] btw, if your zk hosts it's not under the hosts as HS2, check if HS2 can reach 2181 port from ZK host Hope this helps!
... View more
06-28-2018
04:14 PM
Hi @Javert Kirilov! Could you share the describe formatted output from your table? And just asking, but its a managed table? Or external?
... View more