Member since
03-01-2016
104
Posts
97
Kudos Received
3
Solutions
06-04-2017
12:36 AM
PROBLEM: Phoenix ODBC driver strips out "hint" part of "Select" statements. ODBC driver logs looks like below : Mar2016:15:00.601 INFO 6380Statement::SQLSetStmtAttrW:Attribute: SQL_ATTR_MAX_ROWS (1) ----Comments: the original query passed in has "hint" Mar2016:15:00.602 INFO 6380StatementState::InternalPrepare:Preparing query:select/*+ INDEX(c.more_xref_cad more_xref_acct_idx) */*from c.more_xref_cad where cad_acct_id =219980018 Mar2016:15:00.602 DEBUG 6380RESTAction::HMDebugCallback:Infor type: CURLINFO_TEXT ..... Mar2016:15:00.802 DEBUG 6380RESTAction::HMDebugCallback:Info data:Connected to localhost (127.0.0.1) port 8765(#1) Mar2016:15:00.802 DEBUG 6380RESTAction::HMDebugCallback:Infor type: CURLINFO_HEADER_OUT Mar2016:15:00.802 DEBUG 6380RESTAction::HMDebugCallback:Info data: POST / HTTP/1.1Host: localhost:8765Content-Type: application/octet-stream Accept: text/html, image/gif, image/jpeg,*; q=.2,*/*; q=.2 User-Agent: Phoenix ODBC Connection: keep-alive Content-Length: 160 Mar 20 16:15:00.803 DEBUG 6380 RESTAction::HMDebugCallback: Infor type: CURLINFO_DATA_OUT ----Comments: the query generated and submited to PQS has no "hint" part. Mar 20 16:15:00.803 DEBUG 6380 RESTAction::HMDebugCallback: Info data: {"request":"prepare","connectionId":"2166b30f-1bf8-1f9d-309e-4009877a1a62","sql":"SELECT * FROM c.more_xref_cad WHERE cad_acct_id = 219980018","maxRowCount":-1} Mar 20 16:15:00.803 DEBUG 6380 RESTAction::HMDebugCallback: Infor type: CURLINFO_TEXT Mar 20 16:15:00.803 DEBUG 6380 RESTAction::HMDebugCallback: Info data: upload completely sent off: 160 out of 160 bytes
ROOT CAUSE : As per Simba team,In the ODBC driver, there is a component that removes ODBC escape sequences such as {fn SIN(col_1)} and turns it into SIN(col_1). The reason for this is because Phoenix does not support such escape sequence but BI tools emits them. The problem here is that the component that removes escape sequence also removes the hint as it is currently being treated as comments. SOLUTION : Phoenix ODBC Driver GA (v1.0.6.1008) has the fix. Please raise a support case with HWX if you need additional assistance here.
... View more
Labels:
06-04-2017
12:09 AM
PROBLEM : Inner query does not recognize alias for the table and consider it as column family. Following exceptions are seen: 0: jdbc:phoenix:localhost:2181> SELECT a.tgb_id FROM "APC:TGB_CURRENT_STATUS" a WHERE a.utc_date_epoch in (SELECT b.utc_date_epoch FROM "APC:TGB_STATUS" b WHERE a.tgb_id = b.tgb_id ORDER BY b.utc_date_epoch DESC limit 2)
. . . . . . . . . . . . . . . > ;
Error: ERROR 1001 (42I01): Undefined column family. familyName=A.null (state=42I01,code=1001)
org.apache.phoenix.schema.ColumnFamilyNotFoundException: ERROR 1001 (42I01): Undefined column family. familyName=A.null
at org.apache.phoenix.schema.PTableImpl.getColumnFamily(PTableImpl.java:724)
at org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.resolveColumn(FromCompiler.java:365) STEPS TO REPRODUCE : 1. Create two tables from Phoenix with below schema CREATE TABLE "${namespace}:TGB_STATUS" (
tgb_id integer not null,
utc_date_epoch integer not null,
tgb_name varchar(40),
city_name varchar(25),
tgb_type_description varchar(64),
local_date_key integer,
utc_date_key integer,
enclosure_type integer,
door_tamper_flag boolean,
over_temp_flag boolean,
under_temp_flag boolean,
voltage_fault_flag boolean,
power_fail_flag boolean,
battery_low_flag boolean,
major_rectifier_alarm_flag boolean,
minor_rectifier_alarm_flag boolean,
env_fault_alarm_flag boolean,
fusion_proc_load integer,
fusion_proc_uptime integer,
pcc_installed_flag boolean,
vswr_alarm_flag boolean,
pcc_power_supply_flag boolean,
forward_power integer,
reflected_power integer,
alarm_bits integer,
alarm_bits_text varchar(8),
slot0_temperature integer,
slot0_channel integer,
slot0_active boolean,
slot1_temperature integer,
slot1_channel integer,
slot1_active boolean,
slot2_temperature integer,
slot2_channel integer,
slot2_active boolean,
slot3_temperature integer,
slot3_channel integer,
slot3_active boolean,
slot4_temperature integer,
slot4_channel integer,
slot4_active boolean,
slot5_temperature integer,
slot5_channel integer,
slot5_active boolean,
slot6_temperature integer,
slot6_channel integer,
slot6_active boolean,
slot7_temperature integer,
slot7_channel integer,
slot7_active boolean,
slot8_temperature integer,
slot8_channel integer,
slot8_active boolean,
slot9_temperature integer,
slot9_channel integer,
slot9_active boolean,
CONSTRAINT pk_tgb_status PRIMARY KEY (tgb_id ASC, utc_date_epoch DESC)
)SALT_BUCKETS = ${tgb_status};
And CREATE TABLE "${namespace}:TGB_CURRENT_STATUS" (
tgb_id integer not null,
utc_date_epoch integer,
tgb_name varchar(40),
city_name varchar(25),
tgb_type_description varchar(64),
local_date_key integer,
utc_date_key integer,
enclosure_type integer,
door_tamper_flag boolean,
over_temp_flag boolean,
under_temp_flag boolean,
voltage_fault_flag boolean,
power_fail_flag boolean,
battery_low_flag boolean,
major_rectifier_alarm_flag boolean,
minor_rectifier_alarm_flag boolean,
env_fault_alarm_flag boolean,
fusion_proc_load integer,
fusion_proc_uptime integer,
pcc_installed_flag boolean,
vswr_alarm_flag boolean,
pcc_power_supply_flag boolean,
forward_power integer,
reflected_power integer,
alarm_bits integer,
alarm_bits_text varchar(8),
slot0_temperature integer,
slot0_channel integer,
slot0_active boolean,
slot1_temperature integer,
slot1_channel integer,
slot1_active boolean,
slot2_temperature integer,
slot2_channel integer,
slot2_active boolean,
slot3_temperature integer,
slot3_channel integer,
slot3_active boolean,
slot4_temperature integer,
slot4_channel integer,
slot4_active boolean,
slot5_temperature integer,
slot5_channel integer,
slot5_active boolean,
slot6_temperature integer,
slot6_channel integer,
slot6_active boolean,
slot7_temperature integer,
slot7_channel integer,
slot7_active boolean,
slot8_temperature integer,
slot8_channel integer,
slot8_active boolean,
slot9_temperature integer,
slot9_channel integer,
slot9_active boolean,
CONSTRAINT pk_tgb_current_status PRIMARY KEY (tgb_id ASC)
);
2. Run the following query. SELECT a.tgb_id FROM "APC:TGB_CURRENT_STATUS" a WHERE a.utc_date_epoch in (SELECT b.utc_date_epoch FROM "APC:TGB_STATUS" b WHERE a.tgb_id = b.tgb_id ORDER BY b.utc_date_epoch DESC limit 2)
3. Try removing the alias in inner query as well : SELECT a.tgb_id FROM "APC:TGB_CURRENT_STATUS" a WHERE a.utc_date_epoch in (SELECT b.utc_date_epoch FROM "APC:TGB_STATUS" b WHERE "APC:TGB_CURRENT_STATUS".tgb_id = b.tgb_id ORDER BY b.utc_date_epoch DESC limit 2) ROOT CAUSE: This is a known issue reported in internal BUG-78975 with no workaround as of now.
... View more
Labels:
06-03-2017
11:36 PM
PROBLEM: When doing a select of a relatively large table (a few thousand rows) some rows return partially missing.When increasing the filter to return those specific rows, the values appear as expected. STEPS TO REPRODUCE: 1. Create a table CREATE TABLE IF NOT EXISTS TEST (
BUCKET VARCHAR,
TIMESTAMP_DATE TIMESTAMP,
TIMESTAMP UNSIGNED_LONG NOT NULL,
SRC VARCHAR,
DST VARCHAR,
ID VARCHAR,
ION VARCHAR,
IC BOOLEAN NOT NULL,
MI UNSIGNED_LONG,
AV UNSIGNED_LONG,
MA UNSIGNED_LONG,
CNT UNSIGNED_LONG,
DUMMY VARCHAR
CONSTRAINT pk PRIMARY KEY (BUCKET, TIMESTAMP DESC, SRC, DST, ID, ION, IC)
); 2. Use a python script to generate a CSV with 5000 rows for i in xrange(5000):
print "5SEC,2016-07-21 07:25:35.{i},146908593500{i},WWWWWWWW,AAA,BBBB,CCCCCCCC,false,{i}1181000,1788000{i},2497001{i},{i},aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa{i}".format(i=i) 3. Bulk inserting the csv in the table phoenix/bin/psql.py localhost -t TEST large.csv
4. Please see that one row that contains no TIMESTAMP_DATE and null values in MI and MA 0: jdbc:phoenix:localhost:2181> select * from TEST
....
+---------+--------------------------+-------------------+-----------+------+-------+-----------+--------+--------------+--------------+--------------+-------+----------------------------------------------------------------------------+
| BUCKET | TIMESTAMP_DATE | TIMESTAMP | SRC | DST | ID | ION | IC | MI | AV | MA | CNT | DUMMY |
+---------+--------------------------+-------------------+-----------+------+-------+-----------+--------+--------------+--------------+--------------+-------+----------------------------------------------------------------------------+
| 5SEC | 2016-07-21 07:25:35.100 | 1469085935001000 | WWWWWWWW | AAA | BBBB | CCCCCCCC | false | 10001181000 | 17880001000 | 24970011000 | 1000 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1000 |
| 5SEC | 2016-07-21 07:25:35.999 | 146908593500999 | WWWWWWWW | AAA | BBBB | CCCCCCCC | false | 9991181000 | 1788000999 | 2497001999 | 999 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa999 |
| 5SEC | 2016-07-21 07:25:35.998 | 146908593500998 | WWWWWWWW | AAA | BBBB | CCCCCCCC | false | 9981181000 | 1788000998 | 2497001998 | 998 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa998 |
| 5SEC | | 146908593500997 | WWWWWWWW | AAA | BBBB | CCCCCCCC | false | null | 1788000997 | null | 997 | |
| 5SEC | 2016-07-21 07:25:35.996 | 146908593500996 | WWWWWWWW | AAA | BBBB | CCCCCCCC | false | 9961181000 | 1788000996 | 2497001996 | 996 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa996 |
| 5SEC | 2016-07-21 07:25:35.995 | 146908593500995 | WWWWWWWW | AAA | BBBB | CCCCCCCC | false | 9951181000 | 1788000995 | 2497001995 | 995 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa995 |
| 5SEC | 2016-07-21 07:25:35.994 | 146908593500994 | WWWWWWWW | AAA | BBBB | CCCCCCCC | false | 9941181000 | 1788000994 | 2497001994 | 994 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa994 |
.... 5. When selecting that row specifically the values are correct 0: jdbc:phoenix:localhost:2181> select * from TEST where timestamp = 146908593500997;
+---------+--------------------------+------------------+-----------+------+-------+-----------+--------+-------------+-------------+-------------+------+---------------------------------------------------------------------------+
| BUCKET | TIMESTAMP_DATE | TIMESTAMP | SRC | DST | ID | ION | IC | MI | AV | MA | CNT | DUMMY |
+---------+--------------------------+------------------+-----------+------+-------+-----------+--------+-------------+-------------+-------------+------+---------------------------------------------------------------------------+
| 5SEC | 2016-07-21 07:25:35.997 | 146908593500997 | WWWWWWWW | AAA | BBBB | CCCCCCCC | false | 9971181000 | 1788000997 | 2497001997 | 997 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa997 |
+---------+--------------------------+------------------+-----------+------+-------+-----------+--------+-------------+-------------+-------------+------+---------------------------------------------------------------------------+
1 row selected (0.159 seconds) SOLUTION : This a known issue and is unresolved as of now. Please track it under PHOENIX-3112. WORKAROUND: Try increasing value of "hbase.client.scanner.max.result.size" which helped in many cases. But it has its own side effects of inducing memory pressure. REFERENCES: https://issues.apache.org/jira/browse/PHOENIX-3112
... View more
Labels:
06-03-2017
10:55 PM
PROBLEM : Customers create namespace and assign users "RX" permission , however users are not able to create tables in namespace and require "RWX" permission. CREATE TABLE example_datapipe.table_test1 ( id BIGINT not null primary key); Error: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions (user=g0123434@ABC.EXAMPLE.COM, scope=SYSTEM:CATALOG, family=0:SALT_BUCKETS, params=[table=SYSTEM:CATALOG,family=0:SALT_BUCKETS],action=WRITE)
ROOT CAUSE : Yes this is an expected behavior. To create a Phoenix table, you MUST have the ability to create the HBase table and you MUST have the ability to write to the SYSTEM.CATALOG table. WORKAROUND : If you do not want normal users to have write access on system.catalog, try one of the following alternatives :- 1) Let an admin manually create the Phoenix tables for the unprivileged users. 2) Temporarily grant them write access to the SYSTEM.CATALOG table and revoke later. REFERENCE: Internal discussion under EAR-5831
... View more
Labels:
06-03-2017
10:19 PM
STEPS TO REPRODUCE: 1. Setup table and local index: create table mytest (col1 varchar, col2 varchar constraint pk primary key (col1));
upsert into mytest values ('123','123');
upsert into mytest values ('456','456');
create local index myidx on mytest (col2); 2. Obtain current timestamp and prove rows can be seen: select cast(current_time() as bigint) from mytest limit 1;
+--------------------------------------------+
| TO_BIGINT(TIME '2017-04-19 09:57:47.867') |
+--------------------------------------------+
| 1492595867867 |
+--------------------------------------------+
sqlline.py "aw25k-3:2181/hbase-secure;currentSCN=1492595867867"
select * from mytest;
+-------+-------+
| COL1 | COL2 |
+-------+-------+
| 123 | 123 |
| 456 | 456 |
+-------+-------+ 3. Disconnect sqlline and reconnect without SCN. Drop the local index, insert more rows, recreate the index: sqlline.py
drop index myidx on mytest;
upsert into mytest values ('678','678');
upsert into mytest values ('456','654');
create local index myidx on mytest (col2); 4. Re-attempt flashback query: sqlline.py "example-3:2181/hbase-secure;currentSCN=1492595867867"
select * from mytest;
+-------+-------+
| COL1 | COL2 |
+-------+-------+
+-------+-------+
No rows selected (0.18 seconds)
select * from mytest where col2 = '123'
+-------+-------+
| COL1 | COL2 |
+-------+-------+
+-------+-------+
No rows selected (0.18 seconds)
SELECT /*+NO_INDEX*/ * from mytest;
select * from mytest;
+-------+-------+
| COL1 | COL2 |
+-------+-------+
| 123 | 123 |
| 456 | 456 |
+-------+-------+ ROOT CAUSE : This is because we create data for new indexes with the latest timestamp as in data table.Fix includes to create initial index also with the timestamp of the data. (at least for local index). SOLUTION: Fix is available in next maintenance release of HDP 2.6. Request for a hotfix by a support case if you are on an earlier version. REFERENCE: Internal BUG-79529
... View more
Labels:
04-03-2017
08:03 AM
@Rohan Pednekar Could you provide pointers for HBASE APIs which should be used during data writing so that the data reaches only desired nodes ?
... View more
03-30-2017
06:32 PM
Connecting to HBase throws the below exceptions: [root@hl1 hbase]# cd /usr/hdp/current/phoenix-client/bin/
[root@hl1 bin]# python sqlline.py localhost:2181:/hbase-unsecure
sun.misc.SignalHandler not found in gnu.gcj.runtime.SystemClassLoader{urls=[file:/etc/hbase/conf/,file:/usr/hdp/2.5.3.0-37/phoenix/bin/../phoenix-4.7.0.2.5.3.0-37-client.jar,file:./,file:/etc/hadoop/conf/,file:/usr/hdp/2.5.3.0-37/hadoop/conf/,file:/usr/hdp/2.5.3.0-37/hadoop-hdfs/./], parent=gnu.gcj.runtime.ExtensionClassLoader{urls=[], parent=null}}
Setting property: [incremental, false]
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect jdbc:phoenix:localhost:2181:/hbase-unsecure none none org.apache.phoenix.jdbc.PhoenixDriver
Connecting to jdbc:phoenix:localhost:2181:/hbase-unsecurejava.lang.ClassFormatError: org.apache.phoenix.jdbc.PhoenixDriver (unrecognized class file version)
at java.lang.VMClassLoader.defineClass(libgcj.so.10)
at java.lang.ClassLoader.defineClass(libgcj.so.10)
at java.security.SecureClassLoader.defineClass(libgcj.so.10)
at java.net.URLClassLoader.findClass(libgcj.so.10)
at java.lang.ClassLoader.loadClass(libgcj.so.10)
at java.lang.ClassLoader.loadClass(libgcj.so.10)
at java.lang.Class.forName(libgcj.so.10)
at sqlline.DatabaseConnection.connect(DatabaseConnection.java:115)
at sqlline.DatabaseConnection.getConnection(DatabaseConnection.java:203)
at sqlline.Commands.connect(Commands.java:1064)
at sqlline.Commands.connect(Commands.java:996)
at java.lang.reflect.Method.invoke(libgcj.so.10)
at sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:36)
at sqlline.SqlLine.dispatch(SqlLine.java:804)
at sqlline.SqlLine.initArgs(SqlLine.java:588)
at sqlline.SqlLine.begin(SqlLine.java:656)
at sqlline.SqlLine.start(SqlLine.java:398)
at sqlline.SqlLine.main(SqlLine.java:292)sqlline version 1.1.8java.lang.NullPointerException
at sqlline.SqlLine.begin(SqlLine.java:680)
at sqlline.SqlLine.start(SqlLine.java:398)
at sqlline.SqlLine.main(SqlLine.java:292)java.lang.NullPointerException
at sqlline.SqlLine.begin(SqlLine.java:680)
at sqlline.SqlLine.start(SqlLine.java:398)
at sqlline.SqlLine.main(SqlLine.java:292)java.lang.NullPointerException
at sqlline.SqlLine.begin(SqlLine.java:680)
at sqlline.SqlLine.start(SqlLine.java:398) ROOT CAUSE: We didn't have JAVA_HOME set. Script was referring GCJ. SOLUTION: export JAVA_HOME=/usr/jdk64/jdk1.8.0_77/ export HBASE_CONF_PATH=/etc/hbase/conf/ export PATH=$JAVA_HOME/bin:$PATH
... View more
Labels:
01-19-2017
04:47 PM
DESCRIPTION: CREATE TABLE EXAMPLE_REPORT1
> (
> key string,
> claim_type_code string,
> yearservice string,
> monthservice string
> )
> STORED BY "org.apache.hadoop.hive.hbase.HBaseStorageHandler"
> WITH SERDEPROPERTIES("hbase.columns.mapping" = ":key,claim_type_code:claim_type_code,
> yearservice:yearservice,monthservice:monthservice")
> TBLPROPERTIES("hbase.table.name"="EXAMPLE_REPORT1”); Exception received was as follows: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:208)at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:326)
at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:301)
at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:166)
at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:161)
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:794)
at org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:602)
[...]
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.getMetaReplicaNodes(ZooKeeperWatcher.java:395)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:562)
at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:61)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1192) ROOT CAUSE: Hive attempted to contact Zookeeper to get Meta location but could not retrieve required info. This is a known issue described in HBASE-16732 WORKAROUND In this instance of issue, restarting zookeeper fixed the issue as it reinitialized hbase znodes and Hive was able to get the required info.
... View more
Labels:
01-19-2017
08:40 AM
1 Kudo
DESCRIPTION: Received frequent alerts for connection timeout with journal node. Upon checking the connectivity, it results in below output. curl -v http://123.example.com:8480--max-time 4 | tail -4
* About to connect() to 123.example.com:8480 port 8480 (#0)
* Trying 10.24.16.11... connected * Connected to 123.example.com (10.24.16.11) port 8480 (#0)
> GET / HTTP/1.1 > User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2
> Host: 123.example.com:8480 > Accept: */*
> % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- 0:00:04 --:--:-- 0* Operation timed out after 4000 milliseconds with 0 bytes received 0 0 0 0 0 0 0 0 --:--:-- 0:00:04 --:--:-- 0* Closing connection #0
curl: (28) Operation timed out after 4000 milliseconds with 0 bytes received Checking netstat command further for port 8480 gives us huge number of CLOSE_WAIT messages. [root@123 ~]# netstat -putane | grep -i 8480
tcp 0 0 0.0.0.0:8480 0.0.0.0:* LISTEN 72383 1586576877 1719/java
tcp 1 0 10.24.16.11:8480 10.24.17.11:46572 CLOSE_WAIT 72383 1587407492 1719/java
tcp 1 0 10.24.16.11:8480 10.24.17.11:57944 CLOSE_WAIT 72383 1586744345 1719/java
tcp 1 0 10.24.16.11:8480 10.24.17.11:57462 CLOSE_WAIT 72383 1586708412 1719/java Check the meaning of CLOSE_WAIT here link ROOT CAUSE: It was found that an edits_in_progress file was stuck as an orphan file since last two months while the edits recorded in it are already captured in other completed edits file. Due to this, the port 8480 of the respective Journal node process was coming up in CLOSE_WAIT as the socket is not closed properly. SOLUTION : Removed the orphan edits_in_progress file and restarted journal nodes.
... View more
Labels:
12-27-2016
06:30 PM
1 Kudo
SYMPTOMS: If MaxApplications=1000 set in configuration than jobs would start getting stuck in "accepted" state after 500+ submissions. The following error is seen in resource manager logs: caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1452018403088_0506 to YARN :
org.apache.hadoop.security.AccessControlException: Queue root.hive1 already has 1000 applications, cannot accept submission of application: application_1452018403088_0506
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:271)
at org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:291) at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290)
... 18 more ROOT CAUSE: This a known issue reported in internal jira BUG-50642. As of date this issue is unresolved. However in Hadoop 2.8.0 this behavior is not seen. WORKAROUND: Please set yarn.scheduler.capacity.root.ordering-policy.fair.enable-size-based-weight=false
... View more
Labels: