Member since
07-17-2017
23
Posts
1
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1067 | 04-30-2020 12:06 PM | |
1501 | 11-06-2019 12:32 PM | |
1320 | 08-22-2019 09:13 AM | |
828 | 08-22-2019 08:39 AM |
10-08-2020
06:59 AM
Sofiane, I apologize but I don't have notes from that point anymore sadly. If I come across anything I'll definitely update this page. Sorry!
... View more
04-30-2020
12:06 PM
So after a few days of struggling, I was able to launch a trial CDP-DC cluster on my server and explore myself. I installed the runtime along with most of the CDF parcels. It looks as though the two services mentioned above (Zeppelin and Druid) are not currently available on CDP-DC. If anyone has any information on how we could add these services or any timelines for these services being added, I'd love to know, especially for Druid!
... View more
04-24-2020
08:35 AM
I'm trying to discern what services will be available to us when we upgrade from CDH 5.16.2 to CDP-DC with CDF. From what I can tell, we should have access to any components part of CDF, as well as anything in the Cloudera Runtime. The thing is, I'm seeing some discrepancies.
In the maven artifacts for Cloudera Runtime 7.0.3, it mentions a lot of services, including Druid and Zeppelin: https://docs.cloudera.com/cdpdc/7.0/release-guide/topics/cdpdc-runtime-maven-703.html
But in the component versions list of Cloudera Runtime 7.0.3, Druid and Zeppelin are both not included: https://docs.cloudera.com/runtime/7.0.3/release-notes/topics/rt-runtime-component-versions.html
Additionally, in the documentation for CDF, I don't see Druid listed, even though in the past HDF has had Druid included:
https://www.cloudera.com/content/dam/www/marketing/images/diagrams/cdf-diagram.png
Can someone verify for me whether Druid and/or Zeppelin is available for use by users of CDP-DC with CDF? Additionally, how should I understand the discrepancies between the first two links above?
Thanks!
... View more
Labels:
11-21-2019
10:38 AM
I have an HDP 2.6.1 cluster where we’ve had yarn.log-aggregation.retain-seconds set to 30 days for a while, and everything was working properly. Four days ago we changed the property to 15 days instead and restarted the services. The check interval is set to the default, so we expected within 1.5 days, we’d see the logs older than 15 days deleted. For some reason, we are still seeing 30 days of logs kept. The other properties all seem to be set properly. The only weird setting I can find is that we are using the LogAggregationIndexedFileController as our primary file controller class. The LogAggregationTFileController is still available as the second in the list. I found YARN-8279 (https://issues.apache.org/jira/browse/YARN-8279), which seems sort of related, except that we are still seeing logs being put into the right suffix folder, and it still seems to be deleting logs older than 30 days. It just doesn’t seem to have updated to 15 days as the cutoff instead. I’ve looked in the logs for the Resource Manager, Timeline Server, and one of the Name Nodes, and nothing that would explain this has popped up. Any ideas where to go to figure out what is happening? Additionally, can someone confirm in which process the deletion service actually runs? Is it the resource manager, timeline server, or something else?
... View more
Labels:
11-06-2019
12:32 PM
After looking into this some more, we found the error trace below the first time that a paragraph was called after the interpreter was restarted. This didn't show up originally since the above log was only trying to run a paragraph, not necessarily just after the interpreter was restarted. As you can see, in the end there is an exception about a class not being accessible. Once we made sure the wandisco class was accessible to the interpreter in the classpath, then everything started to work properly. 2019-11-06 10:24:48,850 ERROR [pool-2-thread-2] PhoenixInterpreter:108 - Cannot open connection
java.sql.SQLException: ERROR 103 (08004): Unable to establish connection.
at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:386)
at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:288)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.access$300(ConnectionQueryServicesImpl.java:171)
at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1881)
at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1860)
at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:77)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:1860)
at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:162)
at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:131)
at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:133)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at org.apache.zeppelin.phoenix.PhoenixInterpreter.open(PhoenixInterpreter.java:99)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:493)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
at org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:410)
at org.apache.hadoop.hbase.client.ConnectionManager.createConnectionInternal(ConnectionManager.java:319)
at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:144)
at org.apache.phoenix.query.HConnectionFactory$HConnectionFactoryImpl.createConnection(HConnectionFactory.java:47)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:286)
... 22 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
... 27 more
Caused by: java.lang.NoClassDefFoundError: com/wandisco/shadow/com/google/protobuf/InvalidProtocolBufferException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2573)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2586)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2625)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2607)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
... View more
11-05-2019
09:33 AM
I've verified that I can access Phoenix through sqlline.py and psql.py using the configuration in /etc/ams-hbase/conf, and run queries as the activity-explorer user that I'm trying to run through Zeppelin. One thing of note with all this: we've changed the ZNode parent from ams-hbase-secure1 to ams-hbase-secure2. I've verified that the value in /etc/ams-hbase/conf/hbase-site.xml holds the new value, but the value in /etc/ams-metrics-collector/conf/hbase-site.xml is the old value and hasn't been updated recently. activity-env.sh points to /etc/ams-hbase/conf, so I believe this shouldn't be an issue, but it was a bit confusing when I first came across it.
... View more
11-05-2019
08:42 AM
I just noticed that the SmartSense Activity Explorer Zeppelin Notebooks have been failing to run on a production HDP 2.6.1 cluster. I'm not sure how long the issue has been occurring since the dashboards haven't been used much until now. Whenever we try to run the paragraphs, we immediately get an error about unable to make a connection. No other information is given. We are able to connect to Phoenix through psql.py, so we know Phoenix is working properly, just not the dashboard. We've tried restarting the activity explorer, which hasn't fixed the issue. Has someone seen this issue? Any ideas? I'm including the logs we are seeing below. ==> activity-explorer.log <==
2019-11-05 10:34:42,555 INFO [qtp1209702763-1653] NotebookServer:711 - New operation from 10.142.131.4 : 62057 : admin : GET_NOTE : 2BPD7951H
2019-11-05 10:34:42,558 WARN [qtp1209702763-1653] VFSNotebookRepo:292 - Get Note revisions feature isn't supported in class org.apache.zeppelin.notebook.repo.VFSNotebookRepo
2019-11-05 10:34:45,886 INFO [pool-2-thread-31] SchedulerFactory:131 - Job paragraph_1490380022011_880344082 started by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpretershared_session255064451
2019-11-05 10:34:45,887 INFO [pool-2-thread-31] Paragraph:366 - run paragraph 20160728-152731_1797959357 using null org.apache.zeppelin.interpreter.LazyOpenInterpreter@4a66e7be
==> zeppelin-interpreter-phoenix-phoenix--<HOSTNAME> <==
2019-11-05 10:34:45,889 INFO [pool-2-thread-4] SchedulerFactory:131 - Job remoteInterpretJob_1572971685889 started by scheduler org.apache.zeppelin.phoenix.PhoenixInterpreter717591913
2019-11-05 10:34:45,889 INFO [pool-2-thread-4] PhoenixInterpreter:192 - Run SQL command 'SELECT file_size_category as "Size category",
total_files as "Total files",
avg_file_size as "Avg file size"
FROM (
SELECT CASE WHEN file_size_range_end <= 10000 THEN 'Tiny (0-10K)'
WHEN file_size_range_end <= 1000000 THEN 'Mini (10K-1M)'
WHEN file_size_range_end <= 30000000 THEN 'Small (1M-30M)'
WHEN file_size_range_end <= 128000000 THEN 'Medium (30M-128M)'
ELSE 'Large (128M+)'
END as file_size_category,
sum(file_count) as total_files,
(sum(total_size) / sum(file_count)) as avg_file_size
FROM ACTIVITY.HDFS_USER_FILE_SUMMARY
WHERE analysis_date in ( SELECT MAX(analysis_date)
FROM ACTIVITY.HDFS_USER_FILE_SUMMARY)
GROUP BY file_size_category
)'
2019-11-05 10:34:45,890 INFO [pool-2-thread-4] SchedulerFactory:137 - Job remoteInterpretJob_1572971685889 finished by scheduler org.apache.zeppelin.phoenix.PhoenixInterpreter717591913
==> activity-explorer.log <==
2019-11-05 10:34:45,891 WARN [pool-2-thread-31] NotebookServer:2067 - Job 20160728-152731_1797959357 is finished, status: ERROR, exception: null, result: %text ERROR 103 (08004): Unable to establish connection.
2019-11-05 10:34:45,909 INFO [pool-2-thread-31] SchedulerFactory:137 - Job paragraph_1490380022011_880344082 finished by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpretershared_session255064451
... View more
08-22-2019
09:16 AM
So the exception you have listed indicates that the JDBC driver was installed properly, since the callstack includes code in the MySQL driver. I would check that the hostname and port you supplied are correct, and that you can access that node from the node you are working on. The commands below would be a good starting point. ping msl-dpe-perf80-100g.msl.lab telnet msl-dpe-perf80-100g.msl.lab 3306
... View more
08-22-2019
09:13 AM
So EOL just means that Python 2 will no longer be supported. That doesn't mean anything using Python 2 will not be supported. Any applications that currently use Python 2.7 will still work, but any bugs in Python 2.7 won't be fixed. Additionally, pip will not work with Python 2.7 after version 19.1. As long as the version of pip you have installed is less than this, you'll still be able to use Python 2.7 for a while, until the repository itself is taken down or reconfigured in a way that breaks old pip. See https://stackoverflow.com/questions/54915381/will-pip-work-for-python-2-7-after-its-end-of-life-on-1st-jan-2020 for more details on that. I am not sure on the plans for HDP/CDH/CDP support for Python 2.7 going forward, so that's a question for the dev team directly.
... View more
08-22-2019
08:39 AM
1. I'm not sure I understand what you mean by communicate. When SSSD is first started, it will sync all of the users and groups in AD to the local node, so any existing users will be able to log in, and have the correct groups ready for them (assuming configuration is set up properly). 2. Rolling back SSSD is possible but troublesome. It would consist of stopping the service and uninstalling it from the node. I'm not sure if the users and groups would still be on the node, but you would need to uninstall that as well. There may be some other pieces left around, but none that I would expect to cause any differences, unless you were to try to install SSSD again.
... View more