Member since
09-25-2015
230
Posts
274
Kudos Received
39
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
10785 | 07-05-2016 01:19 PM | |
3253 | 04-01-2016 02:16 PM | |
459 | 02-17-2016 11:54 AM | |
1406 | 02-17-2016 11:50 AM | |
3984 | 02-16-2016 02:08 AM |
09-28-2017
08:42 PM
@Laura Ngo I've found a workaround to show more than 3 hops on UI, I desperately needed that: I changed this file: /usr/hdp/2.6.0.3-8/atlas/server/webapp/atlas/js/collection/VLineageList.js and added depth parameter that @Sarath Subramanian suggested: ... getLineage: function(id, options) { var url = UrlLinks.lineageApiUrl(id) + '?depth=99'; ... PS: although changing source code is not recommended, I don't see any risk with such a simple change and benefits are great!!!
... View more
02-02-2017
03:06 AM
1 Kudo
As discussed here: https://community.hortonworks.com/questions/74995/hive-column-level-lineage-roadmap.html, next Atlas release will capture column level lineage automatically for hive. My questions: 1- When browsing column metadata on current Atlas UI I can see lineage spots for column. Is it possible to manually add column lineage using API and visualize them on current Atlas release (HDP 2.5.3 / Atlas 0.7.0) ? 2- Does anyone have RESP API calls example to manually add column lineage to Atlas? Thanks, Guilherme.
... View more
Labels:
12-13-2016
10:25 PM
1 Kudo
@Kumar Easy way is to create a "Date Dimension" table and SELECT from this table using month as filter and/or joining with other tables.
... View more
12-02-2016
12:07 AM
Look at this example: <records>
<record customer_id="0000-JTALA">
<income>200000</income>
<demographics>
<gender>F</gender>
<agecat>1</agecat>
<edcat>1</edcat>
<jobcat>2</jobcat>
<empcat>2</empcat>
<retire>0</retire>
<jobsat>1</jobsat>
<marital>1</marital>
<spousedcat>1</spousedcat>
<residecat>4</residecat>
<homeown>0</homeown>
<hometype>2</hometype>
<addresscat>2</addresscat>
</demographics>
<financial>
<income>18</income>
<creddebt>1.003392</creddebt>
<othdebt>2.740608</othdebt>
<default>0</default>
</financial>
</record>
<record>
.....
</record>
</records>
CREATE TABLE xml_bank(customer_id STRING, income BIGINT, demographics map<string,string>, financial map<string,string>)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.customer_id"="/record/@customer_id",
"column.xpath.income"="/record/income/text()",
"column.xpath.demographics"="/record/demographics/*",
"column.xpath.financial"="/record/financial/*"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES (
"xmlinput.start"="<record customer",
"xmlinput.end"="</record>"
);
select customer, income demographics, financial from xml_bank limit 1;
"0000-JTALA"
"200000"
{"gender":"F", "agecat":"1", "edcat":"1", "jobcat":"2", "empcat":"2", "retire":"0", "jobsat":"1", "marital":"1", "spousedcat":"1", "residecat":"4", "homeown":"0", "hometype":"2", "addresscat":"2"}
{"income": "18", "creddebt": "1.003392", "othdebt": "2.740608", "default": "0"}
... View more
12-01-2016
09:28 PM
@Dinu S Can you check YARN UI? I suspect you don't have enough resource available and second hive session is waiting for yarn to accept your session.if that's the case, you can use beeline instead hive cli, beeline will connect to hive server and don't need to launch an AM to connect (it might need to run multiple queries in parallel, but at least it allows you to connect and run queries that doesn't require YARN application).
... View more
12-01-2016
09:20 PM
@Neil Patel Most flexible way is to use XML Serde (https://github.com/dvasilen/Hive-XML-SerDe/wiki/XML-data-sources) where you can read XML using map and also nested maps/arrays. The problem is the XML Serde does not work when you already have a table / column with xml, you would need to use their internal functions and write your own UDFs. The workaround would be to create one table only with your XML column, save as text format, use XML Serde to read textfile with XML content translating them into maps/arrays and finally join result back with your original table. Also more examples here: https://community.hortonworks.com/questions/40979/hive-xml-parising-null-value-returned.html https://community.hortonworks.com/content/kbentry/972/hive-and-xml-pasring.html
... View more
10-12-2016
09:50 AM
It's related to "Allow Common Name Host Name Mismatch", I works if I used same hostname from knox certificate. Does anyone knows how to add that option on tableau? I can find it on ODBC settings, but not on tableau. Or where to change it by default on odbc driver?
... View more
10-11-2016
06:15 PM
I'm getting error below trying to connect tableau to hive thru knox on a OSX. The same connection works from unixODBC + Excel, but not from tableau. Error I get is: [Hortonworks][Hardy] (34) Error from server: authorize: cannot authorize peer. Here my tableau settings: Any idea on how to troubleshoot it?
... View more
Labels:
09-22-2016
08:51 PM
check this: https://community.hortonworks.com/questions/1652/how-can-i-query-hbase-from-hive.html
... View more
09-22-2016
08:20 PM
@Vivek S try this: SELECT a.item_id as item_A, b.item_id as item_B , count(1) as count_pair
FROM items a
INNER JOIN items b ON a.transaction_id = b.transaction_id
WHERE a.item_id > b.item_id
GROUP BY a.item_id, b.item_id
HAVING count(1) > 1
... View more
09-14-2016
11:28 PM
19 Kudos
In this article, we cover the following:
Demonstrate how hive sessions work
Clarify some misunderstanding about hive sessions behavior (most of the people, including myself until recently, believe hive sessions and queue allocation work different than the way they actually work)
Propose a solution for small-medium environments (up to 200 nodes) that allows concurrent users + multi-tenancy + soft allocation + preemption
Assumptions:
Tested with HDP 2.4.0
I'm using configuration suggested in best-practices and docs related to security and hive performance guide, which include following configurations:
hive engine (hive.execution.engine) = tez
hive do-as (hive.server2.enable.doAs) = false
hive default queues (hive.server2.tez.default.queues) = (queue-name1,queue-name2,etc)
hive number of sessions (hive.server2.tez.sessions.per.default.queue) = 1 (or up to 4)
hive start sessions (hive.server2.tez.initialize.default.sessions) = true
hive pre-warm containers (hive.prewarm.enabled) = true
hive num-containers (hive.prewarm.numcontainers) = (some number >1)
tez container reuse (tez.am.container.reuse.enabled) = true
all examples here related to hiveserver2 + jdbc client (beeline/other jdbc client or odbc driver), it does not apply with hive cli or other tools that interact directly with hivemetastore
Understanding #1 - default queues + number of sessions:
When you define hive default queues (hive.server2.tez.default.queues), hive number of sessions (hive.server2.tez.sessions.per.default.queue) and hive start sessions (hive.server2.tez.initialize.default.sessions) = true, hiveserver2 will create one Tez AM (application master) for each default queue x number of sessions when hiveserver2 service starts.
For example, if you define default queues = "queue1, queue2" and number of sessions = 2, hiveserver2 will create 4 Tez AM (2 for queue1 and 2 for queue2).
If you have continuous usage of hive server2, those Tez AM will keep running, but if your hiveserver2 is idle, those Tez AM will be killed based on timeout defined by tez.session.am.dag.submit.timeout.secs.
Understanding #2 - pre-warm containers
The pre-warm container is different from session initialization.
The number of containers is related to the amount of YARN execution containers (not considering AM container) that will be attached to each Tez AM by default. This same number of containers will be held by each AM, even when Tez AM is idle (not executing queries).
Understanding #3 - when initialized Tez AM (AM pool) is used
A query WILL ONLY use a Tez AM from the pool (initialized as described above) if you DO NOT specify queue name (tez.queue.name), in this case, Hiveserver2 will pick one of Tez AM idle/available (queue name here may be randomly selected). If you execute multiple queries using the same JDBC/ODBC connection, each query may be executed by different Tez AM and it can also be executed in different queue names.
If you specify queue name in your JDBC/OBDBC connection, hiveserver2 will create a new Tez AM for that connection and won't use any of the initialized sessions.
PS: only JDBC/ODBC clients and hive CLI (command line) can use initialized Tez AM.
Understanding #4 - what if you have more concurrent queries than the number of Tez AM initialized?
If you DO NOT specify queue name (as described above) hiveserver2 will hold your query until one of the default Tez AM (pool) is available to serve your query. There won't be any message in JDBC/OBDC client neither in the hiveserver2 log file. An uninformed user (or admin) may think JDBC/ODBC connection or hiveserver2 is broken, but it's just waiting for a Tez AM to execute the query.
If you DO specify queue name, it doesn't matter how many initialized Tez AM are in use or idle, hiveserver2 will create a new Tez AM for this connection and the query can be executed (if the queue has available resources).
Understanding #5 - how to make a custom queue name (defined by the user) to use AM pool
There is no way to allow users to specify queue names and at the same time use Tez AM pool (initialized sessions). If your use case requires different/dedicated Tez AM pool for each group of users, you need dedicated hiveserver2 services, each of them with respective default queue name and number of sessions, and ask each group of users to use respective hiveserver2.
Understanding #6 - number of containers for a single query
A number of containers each query will use are defined here ( https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works), which consider a number of resources available on the current queue, the number of resources available in a queue is defined by the minimum guaranteed capacity (yarn.scheduler.capacity.root._queuename_.capacity) and not maximum capacity (yarn.scheduler.capacity.root._queuename_.maximum-capacity).
In other words, if you are executing a heavy query, this query will probably use all the resources available in the queue, leaving no room for other queries to execute.
Understanding #7 - capacity scheduler ordering policy (FIFO or FAIR) and preemption
There is no preemption inside queues and FAIR ordering policy is only effective when containers are released by AM. If you are reusing containers (tez.am.container.reuse.enabled=true) as recommended, containers will only be released by AM when query finishes, if the first query being executed is heavy, all subsequent queries will wait for the first query to finish to receive the "fair" share.
Understanding #8 - a scenario where it all works together
My understanding is that all these pieces and settings were initially designed and very well optimized to work in the following scenario:
Large Cluster with lots of resources dedicated to queries.
High number of concurrent queries/users.
Multiple hiveserver2 instances, for example, one for light queries, other for heavy queries.
Fixed number of sessions (Tez AM) and fixed number of containers for each session, all of them pre-warmed (these numbers can be different for each type of query / hiveserver2, as described in #3).
No need for FAIR ordering policy or preemption because all the queries always use only the pre-warmed containers
Small/Medium cluster most common needs
Although the scenario described in item #8 is very interesting (and hive+tez is the best solution to run at that scale), it is not the reality in most of the small-medium Hadoop clusters, usually found in Hadoop adoption phase and POCs.
In my experience, what most of the users of such environments need is much simpler, something like this:
Multi-tenancy cluster, let's say Production and User layers sharing resources.
Soft limits and preemption (at night, when users are off, give 100% of resources to production, during the day keep production running with minimum allocation to accomplish SLA, but give most of the power to users).
Fair allocation (if just one user is running a query, give him all the resources, as soon as second, third and subsequent users start executing queries, give them equal amount of resources distribution, but also respect different queues minimum shares, that represents priorities).
Proposed solution / settings
Hive+Tez is the perfect solution for both scenarios described above, but now, I'm going to share details on how to adjust setttings for the "Small/Medium cluster most common needs":
YARN - Enable preemption as described here. My settings for YARN are: yarn.resourcemanager.scheduler.monitor.enable=trueyarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity=0.01
yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill=1000
yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval=1000
yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor=1
yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round=1
yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
Capacity Scheduler - Create one queue for each concurrent session, for example, if you want 3 concurrent users, create 3 queues (this is needed to achieve full preemption): yarn.scheduler.capacity.maximum-am-resource-percent=0.2
yarn.scheduler.capacity.maximum-applications=10000
yarn.scheduler.capacity.node-locality-delay=40
yarn.scheduler.capacity.queue-mappings-override.enable=false
yarn.scheduler.capacity.root.accessible-node-labels=*
yarn.scheduler.capacity.root.acl_administer_queue=*
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.default.acl_submit_applications=*
yarn.scheduler.capacity.root.default.capacity=30
yarn.scheduler.capacity.root.default.maximum-capacity=100
yarn.scheduler.capacity.root.default.state=RUNNING
yarn.scheduler.capacity.root.default.user-limit-factor=10
yarn.scheduler.capacity.root.hive-test.acl_administer_queue=*
yarn.scheduler.capacity.root.hive-test.acl_submit_applications=*
yarn.scheduler.capacity.root.hive-test.capacity=70
yarn.scheduler.capacity.root.hive-test.maximum-capacity=70
yarn.scheduler.capacity.root.hive-test.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.hive-test.ordering-policy=fifo
yarn.scheduler.capacity.root.hive-test.production.acl_administer_queue=*
yarn.scheduler.capacity.root.hive-test.production.acl_submit_applications=*
yarn.scheduler.capacity.root.hive-test.production.capacity=40
yarn.scheduler.capacity.root.hive-test.production.maximum-capacity=100
yarn.scheduler.capacity.root.hive-test.production.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.hive-test.production.ordering-policy=fifo
yarn.scheduler.capacity.root.hive-test.production.state=RUNNING
yarn.scheduler.capacity.root.hive-test.production.user-limit-factor=100
yarn.scheduler.capacity.root.hive-test.queues=production,user
yarn.scheduler.capacity.root.hive-test.state=RUNNING
yarn.scheduler.capacity.root.hive-test.user-limit-factor=10
yarn.scheduler.capacity.root.hive-test.user.acl_administer_queue=*
yarn.scheduler.capacity.root.hive-test.user.acl_submit_applications=*
yarn.scheduler.capacity.root.hive-test.user.capacity=60
yarn.scheduler.capacity.root.hive-test.user.maximum-capacity=100
yarn.scheduler.capacity.root.hive-test.user.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.hive-test.user.ordering-policy=fifo
yarn.scheduler.capacity.root.hive-test.user.queues=user1,user2,user3
yarn.scheduler.capacity.root.hive-test.user.state=RUNNING
yarn.scheduler.capacity.root.hive-test.user.user-limit-factor=100
yarn.scheduler.capacity.root.hive-test.user.user1.acl_administer_queue=*
yarn.scheduler.capacity.root.hive-test.user.user1.acl_submit_applications=*
yarn.scheduler.capacity.root.hive-test.user.user1.capacity=33
yarn.scheduler.capacity.root.hive-test.user.user1.maximum-capacity=100
yarn.scheduler.capacity.root.hive-test.user.user1.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.hive-test.user.user1.ordering-policy=fifo
yarn.scheduler.capacity.root.hive-test.user.user1.state=RUNNING
yarn.scheduler.capacity.root.hive-test.user.user1.user-limit-factor=10
yarn.scheduler.capacity.root.hive-test.user.user2.acl_administer_queue=*
yarn.scheduler.capacity.root.hive-test.user.user2.acl_submit_applications=*
yarn.scheduler.capacity.root.hive-test.user.user2.capacity=33
yarn.scheduler.capacity.root.hive-test.user.user2.maximum-capacity=100
yarn.scheduler.capacity.root.hive-test.user.user2.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.hive-test.user.user2.ordering-policy=fifo
yarn.scheduler.capacity.root.hive-test.user.user2.state=RUNNING
yarn.scheduler.capacity.root.hive-test.user.user2.user-limit-factor=10
yarn.scheduler.capacity.root.hive-test.user.user3.acl_administer_queue=*
yarn.scheduler.capacity.root.hive-test.user.user3.acl_submit_applications=*
yarn.scheduler.capacity.root.hive-test.user.user3.capacity=34
yarn.scheduler.capacity.root.hive-test.user.user3.maximum-capacity=100
yarn.scheduler.capacity.root.hive-test.user.user3.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.hive-test.user.user3.ordering-policy=fifo
yarn.scheduler.capacity.root.hive-test.user.user3.state=RUNNING
yarn.scheduler.capacity.root.hive-test.user.user3.user-limit-factor=10
yarn.scheduler.capacity.root.queues=default,hive-test
Hive/Tez - Define default queues and initialize 1 session per queue: hive.server2.tez.default.queues=user1,user2,user3
set hive.server2.tez.sessions.per.default.queue=1
set hive.server2.tez.initialize.default.sessions=true
set hive.prewarm.enabled=true
set hive.prewarm.numcontainers=2
set tez.am.container.reuse.enabled=true
Finally, it can also be useful to increase release timeout for idle containers on Tez settings, those will make all the containers to keep running for a short period after query finishes (not only the prewarm containers but also extra containers launched by AM), making subsequent queries initialization time faster. A suggestion would be:
tez.am.container.idle.release-timeout-min.millis=30000
tez.am.container.idle.release-timeout-max.millis=90000
PS: If you have multiple business units, each of them with different resource sharing/priorities, you need dedicated hiveserver2 for each business unit. For example: one hiveserver2 for 'marketing', one hiveserver2 for 'fraud', one hiveserver2 for 'financial'. As described above, hiveserver2 will only use initialized sessions if you don't specify tez.queue.name.
... View more
- Find more articles tagged with:
- Data Processing
- Hive
- How-ToTutorial
- preemption
- tez
- YARN
Labels:
09-14-2016
08:57 PM
Try to kill hue process (kill -9 7833), run hue testserver (/usr/lib/hue/build/env/bin/hue testserver) and then try to access hue from webbrowser.
... View more
09-10-2016
11:31 PM
@Suneet patil Try to kill current hue service, then start testserver and try to access hue testserver using browser: netstat -nap | grep 800
#copy pid
kill -9 [PID FROM PREVIOUS COMMAND]
/usr/lib/hue/build/env/bin/hue testserver
#ACCESS BROWSER ON PORT 8000
PS: don't try to run default hue for now. lets just see if test server runs ok.
... View more
09-09-2016
03:45 PM
what happens if you run command below and try to access hue: /usr/lib/hue/build/env/bin/hue testserver
... View more
09-09-2016
02:36 PM
@Suneet patil Check this: https://community.hortonworks.com/questions/24806/connection-refused-trying-to-access-port-8000-hue.html
... View more
09-06-2016
06:21 PM
6 Kudos
I found it tricky to make Kafka to work with SSL in a kerberized cluster. In this article I share ambari settings I used and console (producer/consumer) sample commands: 1- Install Ambari and deploy a cluster with Kafka 2- Kerberize cluster using Ambari (it can be AD Wizard, MIT Kerberos or Manual Keytabs) 3- Create keystore and truststore with proper certificates, more details here: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_security/content/ch_wire-ssl-certs.html 4- Change Kafka Settings on Ambari as below: Kafka Brokers - Listerners: listeners=SASL_PLAINTEXT://localhost:6667, SASL_SSL://localhost:6668 Custom kafka-broker - security.inter.broker.protocol=SASL_PLAINTEXT Custom kafka-broker - ssl.client.auth=none Custom kafka-broker - ssl.key.password=YOUR-PASSWORD Custom kafka-broker - ssl.keystore.location=/path-to-your-keystore Custom kafka-broker - ssl.keystore.password=YOUR-PASSWORD Custom kafka-broker - ssl.truststore.location=/path-to-your-truststore Custom kafka-broker - ssl.truststore.password=YOUR-PASSWORD 5- Add kafka public certificate to default jdk truststore (cacerts) on node you are going to test console producer/consumer 6- Try producer/consumer SASL PLAINTEXT #get kerberos ticket
kinit
<TYPE YOUR PASSWORD>
#producer plaintext
/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list kafka-node1.domain:6667,kafka-node2.domain:6667 --topic test --security-protocol SASL_PLAINTEXT
#consumer plaintext
/usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --zookeeper zk-node1.domain:2181,zk-node2.domain:2181 --topic test --from-beginning --security-protocol SASL_PLAINTEXT
7- Try producer/consumer with SASL SSL #kerberos ticket
kinit
<TYPE YOUR PASSWORD>
#producer ssl
/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list kafka-node1.domain:6668,kafka-node2.domain:6668 --topic test --new-producer --producer-property "security.protocol=SASL_SSL"
#consumer ssl
/usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --new-consumer --bootstrap-server kafka-node1.domain:6668,kafka-node2.domain:6668 --topic test --from-beginning --security-protocol SASL_SSL
Tested with HDP 2.4.2 + Ambari 2.2
... View more
- Find more articles tagged with:
- Data Ingestion & Streaming
- How-ToTutorial
- Kafka
- Kerberos
- notify-docs
- ssl
Labels:
08-31-2016
05:50 PM
@apappu Great article dfs.https.enable=true is needed if you are using HA, otherwise ambari fails to check HDFS. without that parameter ambari tries to connect to HTTP port (50070) instead of (50470). if you see error below, it's because you are missing dfs.https.enable=true: 2016-06-02 23:07:22,221 - Getting jmx metrics from NN failed. URL: http://test.support.com:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesys...
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 38, in get_value_from_jmx
_, data, _ = get_user_call_output(cmd, user=run_user, quiet=False)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py", line 61, in get_user_call_output
raise Fail(err_msg)
Fail: Execution of 'curl --negotiate -u : -s 'http://test.support.com:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesys...' 1>/tmp/tmpOGkdi9 2>/tmp/tmpCUfr6B' returned 7.
... View more
08-29-2016
01:39 AM
@Michael Dennis "MD" Uanang I have same problem, in my case, hbase log file shows error below: javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?
... View more
08-23-2016
07:35 AM
@kishore sanchina I'm this example I started Spark thrift server on port 10010 and connected using beeline to same port. You can use default port 10015 instead.
... View more
08-15-2016
03:37 PM
@Alberto Romero Really useful, solve my permissions problem!
... View more
07-10-2016
01:14 PM
@Sunile Manjee I would start looking at Ambari Metrics. Other option is to use nmon (http://nmon.sourceforge.net/pmwiki.php), an easy tool to visualize and export all OS metrics (CPU, memory, disk and network). @Randy Gelhausen is also working on a zeppelin notebook to visualize nmon exported data (target to monitoring all servers in a data center that are not part of HDP cluster). Another easy tool I like to use is iperf3 (http://software.es.net/iperf/), but this tool generates lot of network traffic to collect maximum bandwidth, I don't think it's a good idea to run this during your job execution.
... View more
07-08-2016
03:50 AM
@Narayana Ganta I have great experiences with XML Serde for deep and nested structures and also with huge data sizes. Regarding performance, please notice that the VTD (GPLv2 license) version (link below) can boost your performance: https://github.com/dvasilen/Hive-XML-SerDe-VTD
... View more
07-08-2016
01:39 AM
4 Kudos
Two simple steps do direct access custom banana dashboards: 1- Copy your_custom_dashboard.json file to folder below in banana server (you can export .json from your banana UI): /opt/lucidworks-hdpsearch/solr/server/solr-webapp/webapp/banana/app/dashboards/ 2- Point your browser to URL like below: http://sandbox.hortonworks.com:8983/solr/banana/index.html#/dashboard/file/your_custom_dashboard.json
... View more
- Find more articles tagged with:
- Banana
- dashboard
- Data Science & Advanced Analytics
- How-ToTutorial
- solr
Labels:
07-06-2016
10:04 PM
@vinita goyal This is not related to ambari metrics, but can help you to analyse your query execution / resource consumption: set hive.tez.exec.print.summary=true; Example output and more explanation can be found here: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_performance_tuning/content/ch_query_optimization_hive.html
... View more
07-06-2016
01:52 PM
@Kuldeep Kulkarni @Ancil McBarnett I have an environment with HDP 2.4.2, kerberos and hiveserver ssl and it's working: beeline -u "jdbc:hive2://fqdn.hostname:10001/;principal=hive/_HOST@DOMAIN;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=/certs/truststore_test;trustStorePassword=XXXXXX" Additionally, it also works thru knox, using https in topology url for hive: <service>
<role>HIVE</role>
<url>https://{{hive_server_host}}:{{hive_http_port}}/{{hive_http_path}}</url>
</service>
... View more
07-05-2016
01:19 PM
6 Kudos
An easier way, just run command bellow from linux terminal in ambari server host: ambari-admin-password-reset
... View more
06-20-2016
11:52 AM
@Saurabh Kumar Sorry for late reply. I used few code from hdp_viz project, but the new version runs directly on zeppelin, no need to install anything. Just import zeppelin notebook and follow steps there. Let me know if you need any help.
... View more
04-27-2016
04:36 AM
Awesome! Thank you. This is much easier than writing a custom processor.
... View more