Member since
03-04-2015
96
Posts
12
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7730 | 01-04-2017 02:33 PM | |
14871 | 07-17-2015 03:11 PM |
10-12-2020
10:08 PM
I imported our existing v5.12 workflows via command-line loaddata. They show up in Hue 3 Oozie Editor, but not Hue 4. We are using CDH 5.16. I find the new "everything is document" paradigm confusing and misleading - Oozie workflows, Hive queries, Spark jobs etc. are not physical documents - in the Unix/HDFS sense that normal users would expect, with absolute paths that can be accessed and manipulated directly. The traditional-style Hue 3 UI lets one focus on working with the technology at hand, instead of imposing The Grand Unifying Design on the user.
... View more
07-01-2020
01:23 PM
The Phoenix-Hive storage handler as of v4.14.0 (CDH 5.12) seems buggy. I was able to get the Hive external wrapper table working for simple queries, after tweaking column mapping around upper/lower case gotchas. However, it fails to work when I tried the "INSERT OVERWRITE DIRECTORY ... SELECT ..." command to export to file: org.apache.phoenix.schema.ColumnNotFoundException: ERROR 504 (42703): Undefined column. columnName=<table name> This is a known problem that no one is apparently looking at: https://issues.apache.org/jira/browse/PHOENIX-4804
... View more
02-15-2019
06:40 PM
Try put <extraClassPath> settings into the global Spark config in Ambari instead, in the spark-defaults section (you may have to add them as customized). This works for us with Cloudera and Spark 1.6.
... View more
07-24-2018
01:29 PM
We were unable to access Spark app log files from either YARN or Spark History Server UIs, with error "Error getting logs at <worker_node>:8041". We can see the logs with "yarn logs" command. Turns out our yarn.nodemanager.remote-app-log-dir = /tmp/logs, but the directory was owned by "yarn:yarn". Following your instruction fixed the issue. Thanks a lot! Miles
... View more
04-10-2018
08:57 AM
Hi Eric: Thanks for your explanation. Would you be able to point us to the formal licensing statements stating the same? It would be required by our corporate to approve CDK (and CDS2 for that matter) for production use. Miles
... View more
03-30-2018
09:18 AM
We are interested in using CDK instead of the base Apache version for its additional operation features. However, I cannot find any info about its pricing and licensing terms - required by my corporate legal. Nothing on CDK documentation. The main Cloudera product and download pages seem to have been redesigned, and do not even provide links to the individual component distros like Spark 2 and Kafka. Pricing is also changed to base on high-level "solutions" instead of individual software products. Since CDK is essentially CM+ZooKeeper+Kafka in a parcel, would it be licensed on the same basis as base CDH? I believe (the simpler) Cloudera Spark 2 is indeed free, but cannot find official info on that, either. Can the Cloudera corporate folks help answer? Thanks, Miles Yao
... View more
Labels:
- Labels:
-
Apache Kafka
-
Apache Spark
03-20-2018
07:48 PM
1 Kudo
Note that phoenix-spark2.jar MUST precede phoenix-client.jar in extraClassPath, otherwise connection will fail with: java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame
... View more
09-14-2017
02:54 PM
When you run OfflineMetaRepair, most likely you will run it from your userid or root. Then we may get some opaque errors like "java.lang.AbstractMethodError: org.apache.hadoop.hbase.ipc.RpcScheduler.getWriteQueueLength()". If you check in HDFS, you may see that the meta directory is no longer owned by hbase: $ hdfs dfs -ls /hbase/data/hbase/
Found 2 items
drwxr-xr-x - root hbase 0 2017-09-12 13:58 /hbase/data/hbase/meta
drwxr-xr-x - hbase hbase 0 2016-06-15 15:02 /hbase/data/hbase/namespace Manually chown -R it and restart HBase fixed it for me.
... View more
08-22-2017
08:38 PM
Thanks for the write-up. Does the above imply that the newly split region will always stay on the same RS, or is it configurable? If it's always local, then won't the load on the "hot" region server just get heavier and heavier, until the global load balancer thread kicks in? Shouldn't HBase just create the new daughter regions on the least-loaded RS instead? There was a lot of discussion related to this in HBASE-3373, but it isn't clear what the resulting implementation was.
... View more
08-13-2017
09:20 PM
This feature behaves unexpectedly when the table is migrated from another HBase cluster. In this case, the table creation time can be much later than the row timestamps of all its data. A flashback query meant to select an earlier subset of data will return the following failure instead: scala> df.count
2017-08-11 20:12:40,550 INFO [main] mapreduce.PhoenixInputFormat: UseSelectColumns=true, selectColumnList.size()=3, selectColumnList=TIMESTR,DBID,OPTION
2017-08-11 20:12:40,550 INFO [main] mapreduce.PhoenixInputFormat: Select Statement: SELECT "TIMESTR","DBID","OPTION" FROM NS.USAGES
2017-08-11 20:12:40,558 ERROR [main] mapreduce.PhoenixInputFormat: Failed to get the query plan with error [ERROR 1012 (42M03): Table undefined. tableName=NS.USAGES]
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[count#13L])
+- TungstenExchange SinglePartition, None
+- TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#16L])
+- Project
+- Scan ExistingRDD[TIMESTR#10,DBID#11,OPTION#12]
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate.doExecute(TungstenAggregate.scala:80)
... Which apparently means that Phoenix considers the table nonexistent at this point. I tested the same approach in sqlline and sure enough, the table is missing from "!tables" Any workaround?
... View more