Member since
05-09-2016
280
Posts
58
Kudos Received
31
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3744 | 03-28-2018 02:12 PM | |
3022 | 01-09-2018 09:05 PM | |
1649 | 12-13-2016 05:07 AM | |
5022 | 12-12-2016 02:57 AM | |
4307 | 12-08-2016 07:08 PM |
03-28-2018
02:12 PM
Hi @Scott Shaw, thanks for that answer. I will take a look. I realized this can be achieved by Atlas. The metadata changes would be picked up by HiveHook and it will send to ATLAS_HOOK topic of Kafka. I am working on comparing two options to consume this JSON message from the topic: 1) Connect with Nifi to filter that JSON and use PutEmail processor for notification 2) Write a custom Java Kafka Consumer that does the same thing as above. Please let me know how you feel.
... View more
03-27-2018
05:37 PM
Thank you @Constantin Stanca so much. This was very much needed. Can you please let me know the Nifi version compatible with HDP 2.6. I have HDP cluster already installed on google cloud, cannot install a separate HDF cluster. Is there a standalone jar of Nifi that can work in HDP cluster?
... View more
03-26-2018
07:12 PM
1 Kudo
Hi everyone,
I am using HDP 2.6 and I want to track Hive tables metadata changes in real time. I have HiveHook enabled and I can see Kafka JSON messages in ATLAS_HOOK and ATLAS_ENTITIES topics.
Also, Atlas is able to consume these entity updates. I am looking for the most optimal way to get entity updates info real time. 1) Is there a way to create a NotificationServer (like SMTP) to which Atlas will send these updates? 2) Or do I have to create a custom Kafka consumer that reads data directly from ATLAS_HOOK or ATLAS_ENTITIES topics in JSON?
P.S - I do not want to read everything from Kafka topic. There are thousands of tables but I want metadata changes for specific tables only.
Please let me know how to setup the offsets related to particular databases/tables only. Thanks
... View more
Labels:
- Labels:
-
Apache Atlas
-
Apache Hive
-
Apache Kafka
03-19-2018
08:39 PM
1 Kudo
Hi guys, I am using hdp version 2.6.1.40-4 on our dev servers. The hive version is 1.2.1. We use Hive tables as the source to our framework in which we read different columns from different tables and then we run some spark jobs to do processing. We maintain the config table in Hive in which we specify what columns we want from a source table. If someone changes the column name/add some new columns in their source table, we have to maintain this config table manually. Please throw some ideas on what are the different exciting ways to monitor/track real-time what is happening in Hive metastore and what could be the most suitable push notification mechanism to alert us in any form? Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Hive
01-09-2018
09:05 PM
Solved it. It was missing values for the RowKey as pointed out by the error: org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.initRowKey(HBaseTableCatalog.scala:141) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.<init>(HBaseTableCatalog.scala:152) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:209) at I created the Row object which included all dataframe columns and then it worked.
... View more
01-08-2018
11:01 PM
Hi Guys, I am using Spark 1.6.3 and HBase is 1.1.2 on hdp2.6. I have to use Spark 1.6, cannot go to Spark 2. The connector jar is shc-1.0.0-1.6-s_2.10.jar. I am writing to hbase table from the pyspark dataframe: cat = json.dumps({"table":{"namespace":"dsc", "name":"table1", "tableCoder":"PrimitiveType"},"rowkey":"key","columns": {"individual_id":{"cf":"rowkey", "col":"key", "type":"string"}, "model_id":{"cf":"cf1", "col":"model_id", "type":"string"}, "individual_id":{"cf":"cf1", "col":"individual_id", "type":"string"}, "individual_id_proxy":{"cf":"cf1", "col":"individual_id_proxy", "type":"string"}}})
df.write.option("catalog",cat).format("org.apache.spark.sql.execution.datasources.hbase").save()
The error is: An error occurred while calling o202.save.
: java.lang.UnsupportedOperationException: empty.tail
at scala.collection.TraversableLike$class.tail(TraversableLike.scala:445)
at scala.collection.mutable.ArraySeq.scala$collection$IndexedSeqOptimized$super$tail(ArraySeq.scala:45)
at scala.collection.IndexedSeqOptimized$class.tail(IndexedSeqOptimized.scala:123)
at scala.collection.mutable.ArraySeq.tail(ArraySeq.scala:45)
at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.initRowKey(HBaseTableCatalog.scala:141)
at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.<init>(HBaseTableCatalog.scala:152)
at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:209)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.<init>(HBaseRelation.scala:163)
at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:58)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745) Please let me know if anyone has come across this.
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Spark
10-19-2017
08:40 PM
Thanks @Timothy Spann for your answer. These links are really helpful. I used python for Spark MLlib so will use the same for H2O as well.
... View more
10-19-2017
04:26 PM
Hi experts, Just curious to know about the differences between Spark MLlib/ML and H2O in terms of implementation of algorithms, performance and usability and which one is better in what kinds of use-cases? Thanks a lot in advance.
... View more
Labels:
- Labels:
-
Apache Spark
04-22-2017
10:25 PM
Got it right. Actually whenever I was starting my hive shell, I was getting this warning: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. So I installed Tez(version 0.8.5) and changed the execution engine of Hive to Tez. Now all Hive queries that involve MapReduce job are running. My hive version is 2.1.1, that I guess do not work with MapReduce As per the regex, thanks alot @Ed Berezitsky,
Those regex worked.
... View more
04-22-2017
07:19 PM
Just saw that any query in Hive which involves map reduce job is giving the same exception.
... View more