About mrizvi

mrizvi · ‎03-28-2018

Hi @Scott Shaw, thanks for that answer. I will take a look. I realized this can be achieved by Atlas. The metadata changes would be picked up by HiveHook and it will send to ATLAS_HOOK topic of Kafka. I am working on comparing two options to consume this JSON message from the topic: 1) Connect with Nifi to filter that JSON and use PutEmail processor for notification 2) Write a custom Java Kafka Consumer that does the same thing as above. Please let me know how you feel.

mrizvi · ‎03-27-2018

Thank you @Constantin Stanca so much. This was very much needed. Can you please let me know the Nifi version compatible with HDP 2.6. I have HDP cluster already installed on google cloud, cannot install a separate HDF cluster. Is there a standalone jar of Nifi that can work in HDP cluster?

mrizvi · ‎03-26-2018

Hi everyone, I am using HDP 2.6 and I want to track Hive tables metadata changes in real time. I have HiveHook enabled and I can see Kafka JSON messages in ATLAS_HOOK and ATLAS_ENTITIES topics. Also, Atlas is able to consume these entity updates. I am looking for the most optimal way to get entity updates info real time. 1) Is there a way to create a NotificationServer (like SMTP) to which Atlas will send these updates? 2) Or do I have to create a custom Kafka consumer that reads data directly from ATLAS_HOOK or ATLAS_ENTITIES topics in JSON? P.S - I do not want to read everything from Kafka topic. There are thousands of tables but I want metadata changes for specific tables only. Please let me know how to setup the offsets related to particular databases/tables only. Thanks

mrizvi · ‎03-19-2018

Hi guys, I am using hdp version 2.6.1.40-4 on our dev servers. The hive version is 1.2.1. We use Hive tables as the source to our framework in which we read different columns from different tables and then we run some spark jobs to do processing. We maintain the config table in Hive in which we specify what columns we want from a source table. If someone changes the column name/add some new columns in their source table, we have to maintain this config table manually. Please throw some ideas on what are the different exciting ways to monitor/track real-time what is happening in Hive metastore and what could be the most suitable push notification mechanism to alert us in any form? Thanks in advance.

mrizvi · ‎01-09-2018

Solved it. It was missing values for the RowKey as pointed out by the error: org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.initRowKey(HBaseTableCatalog.scala:141) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.<init>(HBaseTableCatalog.scala:152) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:209) at I created the Row object which included all dataframe columns and then it worked.

mrizvi · ‎01-08-2018

Hi Guys, I am using Spark 1.6.3 and HBase is 1.1.2 on hdp2.6. I have to use Spark 1.6, cannot go to Spark 2. The connector jar is shc-1.0.0-1.6-s_2.10.jar. I am writing to hbase table from the pyspark dataframe: cat = json.dumps({"table":{"namespace":"dsc", "name":"table1", "tableCoder":"PrimitiveType"},"rowkey":"key","columns": {"individual_id":{"cf":"rowkey", "col":"key", "type":"string"}, "model_id":{"cf":"cf1", "col":"model_id", "type":"string"}, "individual_id":{"cf":"cf1", "col":"individual_id", "type":"string"}, "individual_id_proxy":{"cf":"cf1", "col":"individual_id_proxy", "type":"string"}}}) df.write.option("catalog",cat).format("org.apache.spark.sql.execution.datasources.hbase").save() The error is: An error occurred while calling o202.save. : java.lang.UnsupportedOperationException: empty.tail at scala.collection.TraversableLike$class.tail(TraversableLike.scala:445) at scala.collection.mutable.ArraySeq.scala$collection$IndexedSeqOptimized$super$tail(ArraySeq.scala:45) at scala.collection.IndexedSeqOptimized$class.tail(IndexedSeqOptimized.scala:123) at scala.collection.mutable.ArraySeq.tail(ArraySeq.scala:45) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.initRowKey(HBaseTableCatalog.scala:141) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.<init>(HBaseTableCatalog.scala:152) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:209) at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.<init>(HBaseRelation.scala:163) at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:58) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745) Please let me know if anyone has come across this.

mrizvi · ‎10-19-2017

Thanks @Timothy Spann for your answer. These links are really helpful. I used python for Spark MLlib so will use the same for H2O as well.

mrizvi · ‎10-19-2017

Hi experts, Just curious to know about the differences between Spark MLlib/ML and H2O in terms of implementation of algorithms, performance and usability and which one is better in what kinds of use-cases? Thanks a lot in advance.

mrizvi · ‎04-22-2017

Got it right. Actually whenever I was starting my hive shell, I was getting this warning: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. So I installed Tez(version 0.8.5) and changed the execution engine of Hive to Tez. Now all Hive queries that involve MapReduce job are running. My hive version is 2.1.1, that I guess do not work with MapReduce As per the regex, thanks alot @Ed Berezitsky, Those regex worked.

mrizvi · ‎04-22-2017

Just saw that any query in Hive which involves map reduce job is giving the same exception.

Online	Offline
Last Visited	‎08-14-2019 08:03 PM

Member Since	‎05-09-2016 01:14 AM
Last Visited	‎08-14-2019 08:03 PM
Posts	280
Kudos received	58

Cloudera Community

Re: Hive database/table monitoring

Re: Exception while using Spark HBase Connector on...

Re: Like Example.jar is there any sample pig scrip...

Re: I am trying to use sandbox with virtual machin...

Re: Pig on Hortonworks Sandbox In Azure

Re: Hive database/table monitoring

Re: Tracking of Hive tables metadata changes in re...

Tracking of Hive tables metadata changes in real t...

Hive database/table monitoring

Re: Exception while using Spark HBase Connector on...

Exception while using Spark HBase Connector on HDP...

Re: Difference between Spark MLlib/ML and H20

Difference between Spark MLlib/ML and H20

Re: Getting exception while inserting the file hav...

Re: Getting exception while inserting the file hav...