Created 11-05-2019 08:42 AM
I just noticed that the SmartSense Activity Explorer Zeppelin Notebooks have been failing to run on a production HDP 2.6.1 cluster. I'm not sure how long the issue has been occurring since the dashboards haven't been used much until now. Whenever we try to run the paragraphs, we immediately get an error about unable to make a connection. No other information is given. We are able to connect to Phoenix through psql.py, so we know Phoenix is working properly, just not the dashboard. We've tried restarting the activity explorer, which hasn't fixed the issue. Has someone seen this issue? Any ideas? I'm including the logs we are seeing below.
==> activity-explorer.log <==
2019-11-05 10:34:42,555  INFO [qtp1209702763-1653] NotebookServer:711 - New operation from 10.142.131.4 : 62057 : admin : GET_NOTE : 2BPD7951H
2019-11-05 10:34:42,558  WARN [qtp1209702763-1653] VFSNotebookRepo:292 - Get Note revisions feature isn't supported in class org.apache.zeppelin.notebook.repo.VFSNotebookRepo
2019-11-05 10:34:45,886  INFO [pool-2-thread-31] SchedulerFactory:131 - Job paragraph_1490380022011_880344082 started by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpretershared_session255064451
2019-11-05 10:34:45,887  INFO [pool-2-thread-31] Paragraph:366 - run paragraph 20160728-152731_1797959357 using null org.apache.zeppelin.interpreter.LazyOpenInterpreter@4a66e7be
 
==> zeppelin-interpreter-phoenix-phoenix--<HOSTNAME> <==
2019-11-05 10:34:45,889  INFO [pool-2-thread-4] SchedulerFactory:131 - Job remoteInterpretJob_1572971685889 started by scheduler org.apache.zeppelin.phoenix.PhoenixInterpreter717591913
2019-11-05 10:34:45,889  INFO [pool-2-thread-4] PhoenixInterpreter:192 - Run SQL command 'SELECT file_size_category  as "Size category",
       total_files         as "Total files",
       avg_file_size       as "Avg file size"
FROM (
     SELECT  CASE WHEN file_size_range_end <= 10000  THEN 'Tiny (0-10K)'
               WHEN file_size_range_end <= 1000000  THEN 'Mini (10K-1M)'
               WHEN file_size_range_end <= 30000000  THEN 'Small (1M-30M)'
               WHEN file_size_range_end <= 128000000  THEN 'Medium (30M-128M)'
               ELSE 'Large (128M+)'
          END as file_size_category,
          sum(file_count) as total_files,
          (sum(total_size) / sum(file_count)) as avg_file_size
     FROM ACTIVITY.HDFS_USER_FILE_SUMMARY
     WHERE analysis_date in ( SELECT MAX(analysis_date)
                              FROM ACTIVITY.HDFS_USER_FILE_SUMMARY)
     GROUP BY file_size_category
)'
2019-11-05 10:34:45,890  INFO [pool-2-thread-4] SchedulerFactory:137 - Job remoteInterpretJob_1572971685889 finished by scheduler org.apache.zeppelin.phoenix.PhoenixInterpreter717591913
==> activity-explorer.log <==
2019-11-05 10:34:45,891  WARN [pool-2-thread-31] NotebookServer:2067 - Job 20160728-152731_1797959357 is finished, status: ERROR, exception: null, result: %text ERROR 103 (08004): Unable to establish connection.
2019-11-05 10:34:45,909  INFO [pool-2-thread-31] SchedulerFactory:137 - Job paragraph_1490380022011_880344082 finished by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpretershared_session255064451
Created 11-06-2019 12:32 PM
After looking into this some more, we found the error trace below the first time that a paragraph was called after the interpreter was restarted. This didn't show up originally since the above log was only trying to run a paragraph, not necessarily just after the interpreter was restarted. As you can see, in the end there is an exception about a class not being accessible. Once we made sure the wandisco class was accessible to the interpreter in the classpath, then everything started to work properly.
2019-11-06 10:24:48,850 ERROR [pool-2-thread-2] PhoenixInterpreter:108 - Cannot open connection
java.sql.SQLException: ERROR 103 (08004): Unable to establish connection.
        at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:386)
        at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
        at org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:288)
        at org.apache.phoenix.query.ConnectionQueryServicesImpl.access$300(ConnectionQueryServicesImpl.java:171)
        at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1881)
        at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1860)
        at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:77)
        at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:1860)
        at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:162)
        at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:131)
        at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:133)
        at java.sql.DriverManager.getConnection(DriverManager.java:664)
        at java.sql.DriverManager.getConnection(DriverManager.java:247)
        at org.apache.zeppelin.phoenix.PhoenixInterpreter.open(PhoenixInterpreter.java:99)
        at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
        at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:493)
        at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
        at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
        at org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:410)
       at org.apache.hadoop.hbase.client.ConnectionManager.createConnectionInternal(ConnectionManager.java:319)
        at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:144)
        at org.apache.phoenix.query.HConnectionFactory$HConnectionFactoryImpl.createConnection(HConnectionFactory.java:47)
        at org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:286)
        ... 22 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
        ... 27 more
Caused by: java.lang.NoClassDefFoundError: com/wandisco/shadow/com/google/protobuf/InvalidProtocolBufferException
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2573)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2586)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2625)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2607)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
Created 11-05-2019 08:53 AM
Dashboards used by Activity explorer will use phoenix interpreter configured to connect ams-hbase. Verify if AMS hbase is accessible using sqlline.py
#export HBASE_CONF_PATH=/etc/ams-hbase/conf
#<PathTo>/sqlline.py <ams-fqdn>:61181:/ams-hbase-secure (if AMS is running in embedded mode>
Created 11-05-2019 09:33 AM
I've verified that I can access Phoenix through sqlline.py and psql.py using the configuration in /etc/ams-hbase/conf, and run queries as the activity-explorer user that I'm trying to run through Zeppelin.
One thing of note with all this: we've changed the ZNode parent from ams-hbase-secure1 to ams-hbase-secure2. I've verified that the value in /etc/ams-hbase/conf/hbase-site.xml holds the new value, but the value in /etc/ams-metrics-collector/conf/hbase-site.xml is the old value and hasn't been updated recently. activity-env.sh points to /etc/ams-hbase/conf, so I believe this shouldn't be an issue, but it was a bit confusing when I first came across it.
Created 11-06-2019 12:32 PM
After looking into this some more, we found the error trace below the first time that a paragraph was called after the interpreter was restarted. This didn't show up originally since the above log was only trying to run a paragraph, not necessarily just after the interpreter was restarted. As you can see, in the end there is an exception about a class not being accessible. Once we made sure the wandisco class was accessible to the interpreter in the classpath, then everything started to work properly.
2019-11-06 10:24:48,850 ERROR [pool-2-thread-2] PhoenixInterpreter:108 - Cannot open connection
java.sql.SQLException: ERROR 103 (08004): Unable to establish connection.
        at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:386)
        at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
        at org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:288)
        at org.apache.phoenix.query.ConnectionQueryServicesImpl.access$300(ConnectionQueryServicesImpl.java:171)
        at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1881)
        at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1860)
        at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:77)
        at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:1860)
        at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:162)
        at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:131)
        at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:133)
        at java.sql.DriverManager.getConnection(DriverManager.java:664)
        at java.sql.DriverManager.getConnection(DriverManager.java:247)
        at org.apache.zeppelin.phoenix.PhoenixInterpreter.open(PhoenixInterpreter.java:99)
        at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
        at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:493)
        at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
        at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
        at org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:410)
       at org.apache.hadoop.hbase.client.ConnectionManager.createConnectionInternal(ConnectionManager.java:319)
        at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:144)
        at org.apache.phoenix.query.HConnectionFactory$HConnectionFactoryImpl.createConnection(HConnectionFactory.java:47)
        at org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:286)
        ... 22 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
        ... 27 more
Caused by: java.lang.NoClassDefFoundError: com/wandisco/shadow/com/google/protobuf/InvalidProtocolBufferException
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2573)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2586)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2625)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2607)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)