Member since
03-04-2015
96
Posts
12
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2952 | 01-04-2017 02:33 PM | |
11027 | 07-17-2015 03:11 PM |
10-12-2020
10:08 PM
I imported our existing v5.12 workflows via command-line loaddata. They show up in Hue 3 Oozie Editor, but not Hue 4. We are using CDH 5.16. I find the new "everything is document" paradigm confusing and misleading - Oozie workflows, Hive queries, Spark jobs etc. are not physical documents - in the Unix/HDFS sense that normal users would expect, with absolute paths that can be accessed and manipulated directly. The traditional-style Hue 3 UI lets one focus on working with the technology at hand, instead of imposing The Grand Unifying Design on the user.
... View more
07-01-2020
01:23 PM
The Phoenix-Hive storage handler as of v4.14.0 (CDH 5.12) seems buggy. I was able to get the Hive external wrapper table working for simple queries, after tweaking column mapping around upper/lower case gotchas. However, it fails to work when I tried the "INSERT OVERWRITE DIRECTORY ... SELECT ..." command to export to file: org.apache.phoenix.schema.ColumnNotFoundException: ERROR 504 (42703): Undefined column. columnName=<table name> This is a known problem that no one is apparently looking at: https://issues.apache.org/jira/browse/PHOENIX-4804
... View more
02-15-2019
06:40 PM
Try put <extraClassPath> settings into the global Spark config in Ambari instead, in the spark-defaults section (you may have to add them as customized). This works for us with Cloudera and Spark 1.6.
... View more
07-24-2018
01:29 PM
We were unable to access Spark app log files from either YARN or Spark History Server UIs, with error "Error getting logs at <worker_node>:8041". We can see the logs with "yarn logs" command. Turns out our yarn.nodemanager.remote-app-log-dir = /tmp/logs, but the directory was owned by "yarn:yarn". Following your instruction fixed the issue. Thanks a lot! Miles
... View more
04-20-2018
10:04 PM
Ambari server was getting HTTP 502 trying to query timeline metrics - this fixed it for us behind corporate firewall. Thanks!
... View more
04-10-2018
08:57 AM
Hi Eric: Thanks for your explanation. Would you be able to point us to the formal licensing statements stating the same? It would be required by our corporate to approve CDK (and CDS2 for that matter) for production use. Miles
... View more
04-09-2018
03:58 PM
1 Kudo
You can choose to either compile the package into your application jar, or manually install it on every spark/yarn worker node and include the dir in your <extraClassPath>. Sample pom.xml on HDP 2.6.3: <dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.3.2.6.3.0-235</version>
<scope>provided</scope>
</dependency>
...
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-csv_2.10</artifactId>
<version>1.5.0</version>
<scope>provided</scope>
</dependency> Use " <scope>provided</scope> " if you choose external installation. Leave out if you want to compile in. Simpler to compile in, but if you have a large cluster or multiple Spark applications that will share such external libraries, using "provided" scope may be more optimal. In this case, you would need to specify: --conf "spark.driver.extraClassPath=...:<your ext lib path>/*" --conf "spark.executor.extraClassPath=...:<your ext lib path>/*" on your spark-submit command line.
... View more
03-30-2018
09:18 AM
We are interested in using CDK instead of the base Apache version for its additional operation features. However, I cannot find any info about its pricing and licensing terms - required by my corporate legal. Nothing on CDK documentation. The main Cloudera product and download pages seem to have been redesigned, and do not even provide links to the individual component distros like Spark 2 and Kafka. Pricing is also changed to base on high-level "solutions" instead of individual software products. Since CDK is essentially CM+ZooKeeper+Kafka in a parcel, would it be licensed on the same basis as base CDH? I believe (the simpler) Cloudera Spark 2 is indeed free, but cannot find official info on that, either. Can the Cloudera corporate folks help answer? Thanks, Miles Yao
... View more
03-20-2018
07:48 PM
1 Kudo
Note that phoenix-spark2.jar MUST precede phoenix-client.jar in extraClassPath, otherwise connection will fail with: java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame
... View more
02-01-2018
06:39 PM
Found the solution - place phoenix-spark2.jar before phoenix-client.jar, and everything worked. The Spark2/Scala 2.11 versions of org.apache.phoenix.spark classes need to overlay those included in the main phoenix-client.jar. Try it and let us know. 🙂
... View more
01-31-2018
06:07 PM
We tried this too on our HDP 2.6.3 cluster. Sure enough, we got the same issue: /usr/hdp/current/spark2-client/bin/spark-shell --master yarn-client --driver-memory 3g --executor-memory 3g --num-executors 2 --executor-cores 2 --conf "spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-client.jar:/usr/hdp/current/phoenix-client/phoenix-spark2.jar:/etc/hbase/conf" --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-client.jar:/usr/hdp/current/phoenix-client/phoenix-spark2.jar:/etc/hbase/conf" scala> val jobsDF = spark.read.format("org.apache.phoenix.spark").options(Map(
| "table" -> "ns.Jobs", "zkUrl" -> zkUrl)).load
ivysettings.xml file not found in HIVE_HOME or HIVE_CONF_DIR,file:/usr/hdp/2.6.3.0-235/phoenix/phoenix-4.7.0.2.6.3.0-235-client.jar!/ivysettings.xml will be used
2018-01-30 16:24:33,254 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x79bb14d8 connecting to ZooKeeper ensemble=zkhost1:2181,zkhost2:2181,zkhost3:2181
java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.getDeclaredMethod(Class.java:2128)
at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1575)
...
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
... 49 elided
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.DataFrame
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 83 more
Tweaking extraClassPath and --jars using phoenix-client.jar, phoenix-4.7.0.2.6.3.0-235-spark2.jar, and spark-sql_2.11-2.2.0.2.6.3.0-235.jar made no difference. I am inclined to agree with this other fellow that Hortonworks' phoenix-client.jar is not actually Spark2-compatible, release note to the contrary.
... View more
01-30-2018
06:48 PM
If you look outside the Hortonworks distribution 😉 , Cloudera is pushing Kudu, which is supposed to be a middle ground between Hive and Phoenix. There is also Splice Machines, an MVCC SQL engine on top of HBase which is now open-sourced. Good luck!
... View more
01-08-2018
03:38 PM
URL: http://archive.cloudera.com/cloudera-labs/phoenix/parcels/1.2/
... View more
01-05-2018
08:50 AM
Updates on CDH-compatibility effort in both Apache and Cloudera-labs are tracked in this thread.
... View more
01-05-2018
08:46 AM
I have submitted pull request for our CDH 5.12 compatibility patch to CLAB PHOENIX as a new branch. Cloudera engineers, please review. Others interested can download directly from my repo. At the same time, per Flavio, the new Apache Phoenix branch 4.13-cdh5.11.2 is near completion. Will Cloudera support it?
... View more
12-20-2017
04:48 PM
Still stuck in Debian 7 for HDP 2.6.3, while other OSes have moved up. Meanwhile, required JDK also moved up to 1.8.0_77. This makes the platform less and less viable for enterprise customers standardized on Debian. Any plan or commitment??
... View more
12-11-2017
11:14 AM
Excellent news!!! We had to work around by building our own branch off cloudera-labs phoenix1-4.7.0_1.3.0 (the actual latest version), modifying most pom.xml files and four phoenix-core classes to work with HBase API changes in CDH 5.12 and generate JARs that match the released package. Still, Phoenix being held back at 4.7 is becoming an architectural show-stopper for us, particularly the missing support for namespace (for data migration) and Spark 2. From PHOENIX-4372, it sounds like this release is actually based on the Apache codebase branch 4.13-HBASE-1.2. So we can expect all the features listed in the Apache project release notes, is that correct? And we should expect the official Cloudera Lab codebase and parcel repo to be updated soon? Anyway, thanks again for the timely Christmas gift! Miles Yao
... View more
09-15-2017
09:08 AM
Attempt to run the official apache-phoenix-4.10.0-HBase-1.2 on CDH 5.12 failed due to tight coupling of CLAB_PHOENIX with HBase, and what looks like subtle API incompatibilities between Phoenix-embedded and CDH HBase classes. Namespace mapping partially worked, however.
... View more
09-14-2017
02:54 PM
When you run OfflineMetaRepair, most likely you will run it from your userid or root. Then we may get some opaque errors like "java.lang.AbstractMethodError: org.apache.hadoop.hbase.ipc.RpcScheduler.getWriteQueueLength()". If you check in HDFS, you may see that the meta directory is no longer owned by hbase: $ hdfs dfs -ls /hbase/data/hbase/
Found 2 items
drwxr-xr-x - root hbase 0 2017-09-12 13:58 /hbase/data/hbase/meta
drwxr-xr-x - hbase hbase 0 2016-06-15 15:02 /hbase/data/hbase/namespace Manually chown -R it and restart HBase fixed it for me.
... View more
09-07-2017
10:16 AM
1 Kudo
Same issue here. Spark 2 support is not added until Phoenix 4.10 (this March) - this is a road block for our production plan. Can Cloudera give some ETA for the upgrade?
... View more
08-22-2017
08:38 PM
Thanks for the write-up. Does the above imply that the newly split region will always stay on the same RS, or is it configurable? If it's always local, then won't the load on the "hot" region server just get heavier and heavier, until the global load balancer thread kicks in? Shouldn't HBase just create the new daughter regions on the least-loaded RS instead? There was a lot of discussion related to this in HBASE-3373, but it isn't clear what the resulting implementation was.
... View more
08-22-2017
09:59 AM
HBase namespace support is added in Phoenix 4.8. Hortonworks has backported it to their version of 4.7.0 - the HDP 2.6 Ambari now supports the parameter switch " phoenix.schema.isNamespaceMappingEnabled " When will Cloudera upgrade its Phoenix distro (or better, integrate it into CDH and HBase)? Currently, this feature disparity has broken bulk data migration between CDH and HDP - Phoenix Spark plugin cannot access tables on both clusters within the same application. 2017-08-21 16:12:38,048 INFO [main] impl.MetricsSystemImpl: phoenix metrics system started
java.sql.SQLException: ERROR 726 (43M10): Inconsistent namespace mapping properites.. Ensure that config phoenix.schema.isNamespaceMappingEnabled is consitent on client and server.
... View more
08-13-2017
09:20 PM
This feature behaves unexpectedly when the table is migrated from another HBase cluster. In this case, the table creation time can be much later than the row timestamps of all its data. A flashback query meant to select an earlier subset of data will return the following failure instead: scala> df.count
2017-08-11 20:12:40,550 INFO [main] mapreduce.PhoenixInputFormat: UseSelectColumns=true, selectColumnList.size()=3, selectColumnList=TIMESTR,DBID,OPTION
2017-08-11 20:12:40,550 INFO [main] mapreduce.PhoenixInputFormat: Select Statement: SELECT "TIMESTR","DBID","OPTION" FROM NS.USAGES
2017-08-11 20:12:40,558 ERROR [main] mapreduce.PhoenixInputFormat: Failed to get the query plan with error [ERROR 1012 (42M03): Table undefined. tableName=NS.USAGES]
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[count#13L])
+- TungstenExchange SinglePartition, None
+- TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#16L])
+- Project
+- Scan ExistingRDD[TIMESTR#10,DBID#11,OPTION#12]
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate.doExecute(TungstenAggregate.scala:80)
... Which apparently means that Phoenix considers the table nonexistent at this point. I tested the same approach in sqlline and sure enough, the table is missing from "!tables" Any workaround?
... View more
08-09-2017
11:42 AM
My purpose is to get the module document's timestamp-based DF query to work: val df = sqlContext.read
.options(Map(HBaseTableCatalog.tableCatalog -> writeCatalog, HBaseSparkConf.MIN_TIMESTAMP -> "0",
HBaseSparkConf.MAX_TIMESTAMP -> oldMs.toString))
.format("org.apache.hadoop.hbase.spark")
.load() Looks like I can substitute the constant def HBaseTableCatalog.tableCatalog with its string value "catalog" without needing the class. However, Cloudera's version of HBaseSparkConf is also divergent - it doesn't have the official version's constants MIN_TIMESTAMP / MAX_TIMESTAMP, or their current incarnations TIMERANGE_START / TIMERANGE_END . Further substituting them with literals compiled but failed at run time: val getRdd = sqlContext.read
.options(Map("catalog" -> cat, "hbase.spark.query.timerange.start" -> startMs.toString,
"hbase.spark.query.timerange.end" -> currMs.toString))
.format("org.apache.hadoop.hbase.spark")
.load() Exception in thread "main" java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:313)
at scala.None$.get(Option.scala:311)
at org.apache.hadoop.hbase.spark.DefaultSource.createRelation(DefaultSource.scala:78)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
... View more
08-08-2017
02:00 PM
How about finding HBaseTableCatalog class? We have a single CDH 5.7.1 cluster for production, and it is definitely not in the installed hbase-spark-1.2.0-cdh5.7.1.jar, nor in the spark-sql_2.10-1.6.0-cdh5.7.1.jar. Git repo for this class suggests that the class (and the whole data source) was not introduced until branch-2. Is this why the CDH version of HBase-Spark module doesn't include it? When then would the module be brought up to date? Thanks, Miles
... View more
08-03-2017
09:31 PM
Same question about accessing Phoenix tables from full HDP 2.6 Spark 2 SQL.
... View more
07-21-2017
01:29 PM
That's good news. But I think the requester would like to know when Cloudera plans to integrate Spark 2 into CDH, not as a separate install (like what Hortonworks does). Thanks, Miles
... View more
07-12-2017
06:35 PM
On HDP 2.6, appending $CLASSPATH seems to break Spark2 interpreter with: "org.apache.zeppelin.interpreter.InterpreterException: Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;" Is the included Phoenix-Spark driver (phoenix-spark-4.7.0.2.6.1.0-129.jar) certified to work with Spark2? I thought it's the preferred way rather than via JDBC. Thanks!
... View more