Member since
07-02-2018
5
Posts
0
Kudos Received
0
Solutions
08-25-2019
05:25 AM
Hi,
1. When should I use GeoSpatial data in HDFS and when in HBase?
2. Seem that for HDFS I should use ESRI Geometry and Hive Spatial, and for HBase use GeoMesa?
3. Sample query we require - For a defined polygon (GeoJSON column), find all routes (table row = route) that go through it.
4. Another sample query we require - For a defined polygon (GeoJSON column), find all other polygons (other rows) which match it (congruent).
5. General question - As for as I know, HBase is not suitable for analytics - When is it suitable to use it as an Analytics engine using Spark? How would this perform compared to HDFS? Seems the presentation below from ESRI addresses the HDFS vs HBase question, but it is not quite clear to me: http://proceedings.esri.com/library/userconf/fed15/papers/fed_142.pdf Thanks!
... View more
Labels:
- Labels:
-
Apache HBase
08-12-2019
08:06 AM
I did not find a specific note about compatibility and support.
The closest thing was under "Integrating MIT Kerberos and Active Directory", which addresses only Windows 2008, 2012 and 2016:
https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_sg_hadoop_security_active_directory_integrate.html
".. Get and verify the list of encryption types set with this command:
On Microsoft Windows 2008, 2012, or 2016:.."
Bottom line - Can we use Active Directory on Windows 2013, or is there a compatibility matrix?
(The organization prefers not to use a local KDC with a cross-realm trust)
Thanks
... View more
Labels:
- Labels:
-
Cloudera Manager
02-28-2019
06:52 AM
Labels:
- Labels:
-
Apache Kafka
02-27-2019
02:44 AM
Hi, We are now integrating our CDH with Active Directory, and seems that the default "user:group" definitions in CDH (hive:hive, hdfs:hdfs, flume:flume etc), which uses an identical user and group name, cannot be created as such in Active Directory. (Cannot have two objects of the same name within the same container) What is the recommended best practice here? Do we require having Hadoop principals defined only locally (OS level) and sync with SSSD? Thanks
... View more
Labels:
- Labels:
-
Apache Sentry
-
Kerberos
07-02-2018
08:27 AM
Hi, We are designing a detection system, in which we have two main parts: 1. Key-based queries: - Get the last 20 activities for a specified key - We expect several thousands per second, but want something that can scale to much more if required for large clients. - Could be HBase or Kudu 2. Ad-hoc queries: - Ad-hoc analytics - should serve about 20 concurrent users. (Say, up to 100, for large clients) - Could be HDFS Parquet or Kudu We wanted to use a single storage for both, and Kudu seems great, if he can just deal with queries at high-rate. Is Kudu a good fit for these kind of systems which usually use a NoSQL engine such as HBase or Cassandra? What is the limit for Kudu in terms of queries-per-second? (Of course, depends on cluster specs, partitioning etc - can take this into account - but a rough estimate on scalability) A link to something official or a recent benchmerk would also be appreciated. Thanks
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Kudu