Member since
02-29-2016
37
Posts
48
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
27540 | 12-22-2016 12:55 PM | |
1022 | 12-22-2016 10:49 AM |
05-08-2017
04:24 PM
Issue: User has a valid kerberos ticket, when he's trying to connect to Hive CLI - the session throws the below error [karthick@mrt1 ~]$ hive
WARNING: Use "yarn jar" to launch YARN applications.
Logging initialized using configuration in file:/etc/hive/2.4.3.0-227/0/hive-log4j.properties Exception in thread "main" java.lang.RuntimeException: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No cluster diagnostics found. at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:544)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No cluster diagnostics found.
at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:731)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:217)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:117)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:541) Resolution: Customer to open ports between the client and the datanodes as Tez is unable to launch AM. Also, request customer to set the below configs tez.am.client.am.port-range (32000-65000)
yarn.app.mapreduce.am.job.client.port-range (32000-65000)
tez.app.mapreduce.am.job.client.port-range (32000-65000)
Also on Redhat/CentOS, ephemeral port range can be configured via:
/etc/sysctl.conf
# Allowed local port range net.ipv4.ip_local_port_range = 32768 61000
... View more
Labels:
03-31-2017
11:46 PM
1 Kudo
@godavari m Can you ensure hive metastore is running OK. Enable debug for hive cli to get more information why its failing hive --hiveconf hive.root.logger=DEBUG, DRFA --hiveconf hive.log.file=/tmp/debug_hive.log
... View more
03-31-2017
11:35 PM
2 Kudos
The following parameters decide if ACID is enabled or disabled in your cluster. ACID Enable: hive.support.concurrency=true;
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
hive.enforce.bucketing=true;
hive.exec.dynamic.partition.mode=nostrict;
*** The following parameters are required for standalone hive metastore ***
hive.compactor.initiator.on=true;
hive.compactor.worker.threads=1
NOTE: Even though HiveServer2 runs with an embedded metastore, a standalone Hive metastore is required for ACID support to function properly. If you are not using ACID support with HiveServer2, you do not need a standalone metastore. ACID Disable: If not using ACID, then make sure the value of the parameters are set as below to avoid any hive locking issues hive.support.concurrency=false;
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager;
hive.enforce.bucketing=false;
hive.exec.dynamic.partition.mode=strict;
hive.compactor.initiator.on=false;
hive.compactor.worker.threads=0;
... View more
Labels:
03-31-2017
10:57 PM
2 Kudos
Issue: Below is a sample test CREATE DATABASE IF NOT EXISTS tmp;
DROP TABLE IF EXISTS tmp.orders;
CREATE TABLE tmp.orders (username STRING, order_creation_date TIMESTAMP, amount DOUBLE);
INSERT INTO TABLE tmp.orders VALUES ("jack", '2017-02-26 13:45:12', 88.2), ("jones", '2017-02-28 15:28:14', 92.4); HDP 2.4.2 SELECT username FROM tmp.orders WHERE order_creation_date > '2017-02-27';
OK
jones
select * from tmp.orders;
OK
jack 2017-02-26 13:45:12 88.2
jones 2017-02-28 15:28:14 92.4 HDP 2.5.0 & 2.5.3 SELECT username FROM tmp.orders WHERE order_creation_date > '2017-02-27';
OK
SELECT username FROM tmp.orders WHERE order_creation_date > TO_DATE('2017-02-27');
OK
You can notice the above 2 queries did not display any result
Resolution: Workaround: Use cast function in hive or include hh:mm:ss as below SELECT username FROM tmp.orders WHERE order_creation_date > '2017-02-27 00:00:00';
OK
jones This is a known bug in hive Reference: Apache, Hortonworks
... View more
Labels:
03-31-2017
10:20 PM
1 Kudo
Issue: Trying to run "msck repair table <tablename>" gives the below error Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. null Resolution: The above error occurs when hive.mv.files.thread=0, increasing the value of the parameter to 15 fixes the issue This is a known bug
... View more
Labels:
03-31-2017
05:17 PM
1 Kudo
Pipelined Sorter The pipelined sorter is a divide and conquer approach applied to the MapOutputBuffer’s sort
which operates on an entire chunk of 256 or 512 Mb of data space. The primary assumption is
that for a significant number of hadoop installations the CPU is under utilized by the task tracker,
with the fundamental limit for the map slots being the disks available. The DefaultSorter uses two buffers kvbuffer (data) and kvmeta (metadata) which grow from
different parts of the large buffer allocated for operations. The PipelinedSorter uses a series of similar buffer pairs which grow only in one direction and
involve no looping around back to the front. These bufferpairs are marked as a container class
named SortSpan. The allocation system preallocates 16Mb for the first kvmeta and then proceeds to mark the rest
of the entire buffer for kvbuffer0. The collect proceeds normally until either the entire buffer is full
or kvmeta0 acquires 1M items (16 byte per item, 1024x1024 items). At this point, the kvbuffer0 is marked off as full. The remaining buffer is now allocated according
to the peritem ratio from kvmeta0.size() vs kvbuffer0.size(). If the per item size is larger than
expected, then the kvmeta1 will have fewer items in the reserved space (so <16Mb space). The
collect thread proceeds to switch to use only kvmeta1 and kvbuffer1 (which is again the length of
the whole remaining buffer). A separate executor service (sortmaster) is hosting a threadpool for sorting the data that has
been collected into the kvmeta0 & kvbuffer0. The sortmaster.submit returns a future result which
is submitted to the merge heap, but the result has not been evaluated yet. Since every single sort span is sharenothing in nature, the sort operations can happen in
parallel & out of order for any number of spans without any dependency conflicts. The only
criteria is that the comparator operates in a threadsafe manner most binary comparators are,
but the default thread count is 1 to be safe. The spill is triggered when the last span cannot hold any more items or the task is over. The spill
needs to wait for all the sort futures to return before it can attempt a merge to disk. There is no
lock here except for the implicit one in Future::get(). The merge operation is a simple java.util.PriorityHeap with one code fastpath. The merger
provides a partition filter which is an iterator wrapper which will return false when the current
partition != the filter provided. This means that the data to IFile is being streamed from the
inmemory merge instead of being merged prior to the write. The merge for the SortSpan case is unique among all the other merges in hadoop, because it
actually has random access (because it is in memory) into the various lists it is merging. bisect: If two keys are read from the same sort span consecutively, it tries to do a bisect()
operation with the least key from the second least sort span. This does a binarysearch for
second.key() in the first sort span, trying to avoid comparing against a large number of keys. The
bisect returns the offset to which the least sortspan can “gallop” to before performing another
compare. This is especially useful in scenarios where a huge majority of the keys are identical or
if the input key list is already in order. RLE: The number of operations that resulted in an == compare result is kept track of during the
sort and the merge, which comes of use in the output phase. The data being dumped to disk is
RLE encoded if the key equality comes up at least 10% of the time. This optimization comes of
use when the data operation is something like a join, where the keys repeat for a large number of
times in the right table in the map-join case. The PipelinedSorter thus brings in two new features to the system an ability to run multiple sort
threads & a slightly better way of dealing with already sorted lists through the bisect. IFile modifications & future use fast-forward-merge: The RLE key scenario actually shines when it comes to merging the
various spills together (not implemented yet), since the number of key comparisons during the
merge is unnecessarily large when the same key is repeated a large number of times. So as
long as the same key is being repeated, no further key comparisons are required as we can
fastforward that data into the merged file. To turn on RLE on the merged output without forcing
more comparisons, a new magic DataInputBuffer has been added IFile.REPEAT_KEY, which
will forward the RLE information from the input file into the output file without any comparison
operations when fast-forwarding through a merge. bisect-bulk-merge: Similar to keeping the index record in the cache list during the merge,
another approach to bring the bisect() behaviour to the spill merges is to keep track of 32 keys
equally spaced in the spill file to check if we can gallop through any of the disk files while
merging. This can potentially be held in the map container memory instead of being written to
disk as it is by necessity a random access feature. Files without a bisect cache entry will
regress to the present situation, but the real win is in skipping comparison operations during a
disk merge and being able to literally copy all data from the current position till the bisect position
without ever doing per-record reads or compares.
... View more
Labels:
03-31-2017
04:43 PM
1 Kudo
Issue: After enabling the preemption parameters in yarn-site.xml as per link, it was found the applications are still waiting to run. This is because other queues are utilising all of the available resources. With Preemption enabled, under-served queues can begin to claim their allocated cluster resources almost immediately, without having to wait for other queues' applications to finish running. Resolution: Along with the below list of parameters to enable preemption, configure this parameter "yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity" in yarn-site.xml, otherwise you won't see preemption though enabled. The doc is missing the above parameter. List of parameters to enable preemption are: 1) yarn.resourcemanager.scheduler.monitor.enable 2) yarn.resourcemanager.scheduler.monitor.policies 3) yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval 4) yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill 5) yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round 6) yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor Reference: https://hortonworks.com/blog/better-slas-via-resource-preemption-in-yarns-capacityscheduler/ Addendum: Logged internal bug to include this parameter in Hortonworks official documentation
... View more
Labels:
03-10-2017
07:57 PM
Error message: Below error message from dmesg.txt [1294523.092549] python2.6[11900]: segfault at 11 ip 00007f36b7cd4e90 sp 00007ffd31ce5830 error 4 in libcrypto.so.1.0.0[7f36b7baf000+1d5000]
[1294523.530806] python2.6[11901]: segfault at 11 ip 00007f33ef805e90 sp 00007ffe41443280 error 4 in libcrypto.so.1.0.0[7f33ef6e0000+1d5000]
[1294523.934818] python2.6[11902]: segfault at 11 ip 00007f29280d5e90 sp 00007fff7b5facc0 error 4 in libcrypto.so.1.0.0[7f2927fb0000+1d5000]
[1294524.379260] python2.6[11903]: segfault at 11 ip 00007f2cfd50be90 sp 00007ffe17504610 error 4 in libcrypto.so.1.0.0[7f2cfd3e6000+1d5000]
Below error message from /var/log/hue/supervisor.log [15/Feb/2017 17:43:58 +0000] supervisor ERROR Exception in supervisor main loop
Traceback (most recent call last):
File "/usr/lib/hue/desktop/core/src/desktop/supervisor.py", line 414, in main
wait_loop(sups, options)
File "/usr/lib/hue/desktop/core/src/desktop/supervisor.py", line 424, in wait_loop
time.sleep(1)
File "/usr/lib/hue/desktop/core/src/desktop/supervisor.py", line 221, in sig_handler
raise SystemExit("Signal %d received. Exiting" % signum)
SystemExit: Signal 15 received. Exiting Workaround: Enable CherryPy web server in hue.ini configuration file use_cherrypy_server=true Reference: Internal bug BUG-49806
... View more
Labels:
12-28-2016
10:25 AM
1 Kudo
Issue: After enabling Kerberos with option 'Manage Kerberos principals and keytabs manually', adding a new component will not respond. This is due to a bug in Ambari 2.2. Refer link. Below is the error message reported on the web console Error: http://{ambari-server}:8080/api/v1/clusters/{cluster_name}/services/KERBEROS?fields=Services/attributes/kdc_validation_result,Services/attributes/kdc_validation_failure_details&_=1446149280683
Workaround: Add KERBEROS service via API as below and try installing the component and then remove the KERBEROS service 1. Run the below curl command on the ambari-server (Enable admin user if disabled) curl -H "X-Requested-By:ambari" -u admin:admin -i -X POST http://<ambari-server-host>:8080/api/v1/clusters/<cluster-name>/services/KERBEROS 2. Add the new component which was failing before with the above error using Add service wizard 3. Once the installation is complete you can remove the service Kerberos using the below curl command curl -H "X-Requested-By:ambari" -u admin:admin -i -X DELETE http://<ambari-server-host>:8080/api/v1/clusters/<cluster-name>/services/KERBEROS
... View more
Labels:
12-27-2016
04:59 PM
1 Kudo
Static views or instances are created by default in Ambari. For smartsense the default view is non-editable, so you may end up creating additional views in such situations. Using the below curl command you can change the hostname of the hst server on the static instance. curl -u admin:admin -H "X-Requested-By: ambari" -X PUT -d '{ "ViewInstanceInfo" : { "properties" : { "hst.server.url" : "<hst host>:9000"} } }’ http://<host>:<port>api/v1/views/HORTONWORKS_SMARTSENSE/versions/<version no>/instances/SmartSense
... View more
Labels: