About gaurang_n_shah

gaurang_n_shah · ‎01-18-2019

not exactly, as I mentioned state could either be started or installed only. I was to see if service is facing any issue,

gaurang_n_shah · ‎01-18-2019

Hi Guys, I am trying to check the service status with Ambari rest API, However I am not able to find any documention which explain thing in great details. For example, If I hit following REST URL http://localhost:8080/api/v1/clusters/Sandbox/services/HDFS I get following output "maintenance_state": "OFF", "repository_state": "CURRENT", "service_name": "HDFS", "state": "STARTED" }, "alerts_summary": { "CRITICAL": 0, "MAINTENANCE": 8, "OK": 293, "UNKNOWN": 4, "WARNING": 0 }, However, I am not sure how to interpret this things. Should I care about MAINTENANCE, UNKNOWN and WARNING and just checking that nothing CRITICAL is good enough. This is mainly for developers to understand and track the time how much any service is down.

gaurang_n_shah · ‎10-01-2018

I am trying to understand the hive query plan for a simple distinct query and I have small confusion regrading output of one of the stage. I have simple table with just two column, id and value. and just 4 rows as mentioned below. Data: <code>hive> select * from temp.test_distinct; OK 1 100 2 100 3 100 4 150 Plan <code>hive> explain select distinct value from temp.test_distinct; OK Plan not optimized by CBO. Vertex dependency in root stage Reducer 2 <- Map 1 (SIMPLE_EDGE) Stage-0 Fetch Operator limit:-1 Stage-1 Reducer 2 File Output Operator [FS_6] compressed:false Statistics:Num rows: 2 Data size: 10 Basic stats: COMPLETE Column stats: NONE table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"} Group By Operator [GBY_4] | keys:KEY._col0 (type: string) | outputColumnNames:["_col0"] | Statistics:Num rows: 2 Data size: 10 Basic stats: COMPLETE Column stats: NONE |<-Map 1 [SIMPLE_EDGE] Reduce Output Operator [RS_3] key expressions:_col0 (type: string) Map-reduce partition columns:_col0 (type: string) sort order:+ Statistics:Num rows: 4 Data size: 20 Basic stats: COMPLETE Column stats: NONE Group By Operator [GBY_2] keys:value (type: string) outputColumnNames:["_col0"] Statistics:Num rows: 4 Data size: 20 Basic stats: COMPLETE Column stats: NONE Select Operator [SEL_1] outputColumnNames:["value"] Statistics:Num rows: 4 Data size: 20 Basic stats: COMPLETE Column stats: NONE TableScan [TS_0] alias:test_distinct Statistics:Num rows: 4 Data size: 20 Basic stats: COMPLETE Column stats: NONE Time taken: 0.181 seconds, Fetched: 35 row(s) Confusion: TableScan, Select Operator and Group By Operator shows that they processed 4 rows which make sense to me. But shouldn't the next stage after Group By Operator get only 2 rows the process. As group by will remove other rows. In my DAG I can see the output of mapper is just two rows and not four, however that doesn't see to match with Plan. May i looking it wrong?

gaurang_n_shah · ‎09-12-2018

@Venkatesh Kancharla open up a new question with all the details. and logs.

gaurang_n_shah · ‎09-07-2018

Compaction works only on transactional table, and to make any table transactional it should meet following properties. Should be ORC Table Should be bucketed Should be managed table. Due you see the last point, you can't run compaction on non transactional table, if you do it from hive you will definitely get error, not sure from spark.

gaurang_n_shah · ‎09-05-2018

you are not getting the desired result as your compaction has failed. please check the yarn log to understand what might have gone wrong.

gaurang_n_shah · ‎08-13-2018

I found the solution, so posting here. The problem I was having is I was just stopping the docker container after making change in the command I was using to start the HDP image, didn't realize need to remove container as well. Following command helped. Save docker Work docker commit <hdp_container_id> <hdp_container_id> Stop and Remove docker. docker stop <hdp_container_id> docker rm <hdp_container_id> Open 9083 port (hive metastore) by modifying start-sandbox-hdp-standalone_2-6-4.sh #!/bin/bash echo "Waiting for docker daemon to start up:" until docker ps 2>&1| grep STATUS>/dev/null; do sleep 1; done; >/dev/null docker ps -a | grep sandbox-hdp if [ $? -eq 0 ]; then docker start sandbox-hdp else docker pull hortonworks/sandbox-hdp-standalone:2.6.4 docker run --name sandbox-hdp --hostname "sandbox-hdp.hortonworks.com" --privileged -d \ -p 9083:9083 \ - Start Docker ./start-sandbox-hdp-standalone_2-6-4.sh

gaurang_n_shah · ‎08-10-2018

@Sandeep Nemuri how do i check if metastore is up and running from my local machine. I logged into docker container and I can see I am able to telnet into the 9083 port. However if I try to do that from my local machine, it doesn't work. The one thing I realized is the port is not exposed in docker image or mentioned in webpage anywhere. https://hortonworks.com/tutorial/hortonworks-sandbox-guide/section/3/ I exposed the port and restarted the docker container however, I am still not able to connect to that port using telnet from my local machine or from presto server ( which also on my local machine).

gaurang_n_shah · ‎08-10-2018

I am trying to connect to hive metastore to my HDP sandbox. However it's throwing following error. Hive Catalogonnector.name=hive-hadoop2 hive.metastore.uri=thrift://sandbox-hdp.hortonworks.com:9083 hive.metastore.authentication.type=NONE Error: Query 20180810_005352_00000_umgac failed: Failed connecting to Hive metastore: [sandbox-hdp.hortonworks.com:9083] I tried using following values for hive.metastore.uri however I am gettting the same error. thrift://localhost:9083 thrift://127.0.0.1:9083 thrift://<IP of docker container>:9083 thrift://<IP of local machine>:9083

gaurang_n_shah · ‎07-11-2018

hi Guys. I am trying to load data files into my hive table and facing issue. if files are located on local it doesn't work. if I move the file to hdfs then it works without any issue. Following command is not working in beeline, however it works perfectly in hive load data local inpath '/home/gaurang.shah/test.json' into table temp.test; Data is located on the node where one of the instance of hiverserver2 is running. I have given it all the permission as well. [gaurang.shah@aa ~] pwd /home/gaurang.shah [gaurang.shah@aa ~]$ ll test.json -rwxrwxrwx 1 gaurang.shah domain users 56 Jul 11 13:54 test.json

Online	Offline
Last Visited	‎03-12-2019 07:44 PM

Member Since	‎02-06-2018 04:34 AM
Last Visited	‎03-12-2019 07:44 PM
Posts	47
Kudos received	5

Cloudera Community

Re: Amabri REST API check service status

Amabri REST API check service status

Hive Query Plan - Understand input and output of s...

Re: Hive Multiple Small Files

Re: Hive compactions on External table

Re: How to run manual compaction on the "nested pa...

Re: Presto not able to connect to hive: Failed co...

Re: Presto not able to connect to hive: Failed co...

Presto not able to connect to hive: Failed connec...

Load data local inpath is not working for beeline.