About contactvivekjai

contactvivekjai · ‎07-03-2018

Hi Community, I'm running a basic spark job which reads from an HBase table. I can see the job is getting complete without any error, but in output I get the empty rows. Will appreciate any help. Below is my code object objectName { def catalog = s"""{ |"table":{"namespace":"namespaceName", "name":"tableName"}, |"rowkey":"rowKeyAttribute", |"columns":{ |"Key":{"cf":"rowkey", "col":"rowKeyAttribute", "type":"string"}, |"col1":{"cf":"cfName", "col":"col1", "type":"bigint"}, |"col2":{"cf":"cfName", "col":"col2", "type":"string"} |} |}""".stripMargin def main(args: Array[String]) { val spark = SparkSession.builder() .appName("dummyApplication") .getOrCreate() val sc = spark.sparkContext val sqlContext = spark.sqlContext import sqlContext.implicits._ def withCatalog(cat: String): DataFrame = { sqlContext .read .options(Map(HBaseTableCatalog.tableCatalog -> cat)) .format("org.apache.spark.sql.execution.datasources.hbase") .load() } }

contactvivekjai · ‎07-03-2018

@Sandeep Nemuri thanks for the help, surely will do so.

contactvivekjai · ‎07-03-2018

Hi @Sandeep Nemuri thanks for answering. I did as mentioned above and now my code runs fine with deploy-mode as client, but now I ran into another issue where the data displayed is blank. However I can see the data on hbase shell prompt. I have already checked the rowkey,table name and namespace name. What do you think is missing ? val df = withCatalog(catalog) df.show df.select($"col0").show(10) spark.stop()

contactvivekjai · ‎07-02-2018

@karthik nedunchezhiyan I have attached hbase-site.xml, I hope it helps.

contactvivekjai · ‎07-02-2018

Hi Community, I'm new to spark and have been struggling to execute a spark job which connects to a HBase table. In YARN GUI I can see the spark job is getting to state Running but then it fails with below error : 18/07/01 21:08:20 INFO ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x6a8bcb640x0, quorum=localhost:2181, baseZNode=/hbase 18/07/01 21:08:20 INFO ClientCnxn: Opening socket connection to server localhost.localdomain/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 18/07/01 21:08:20 WARN ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125) 18/07/01 21:08:20 WARN RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid ZooKeeper is up and running. Below is my code //Create table catalog object foo { def catalog = s"""{ |"table":{"namespace":"foo", "name":"bar"}, |"rowkey":"key", |"columns":{ |"col0":{"col":"key", "type":"string"}, |"col1":{"cf":"data", "col":"id", "type":"bigint"} |} |} "" ". stripMargin def main(args: Array[String]) { val spark = SparkSession.builder() .appName("foo") .getOrCreate() choose sc = kick.sparkContext val sqlContext = spark.sqlContext import sqlContext.implicits._ def withCatalog(cat: String): DataFrame = { sqlContext .read .options(Map(HBaseTableCatalog.tableCatalog -> cat)) .format("org.apache.spark.sql.execution.datasources.hbase") .load() } // Read from HBase table val df = withCatalog(catalog) df.show df.filter($"col0" === "1528801346000_200550232_2955") .select($"col0", $"col1").show spark.stop() } I'll really appreciate any help on this. I couldnt find any convincing answer in stackoverflow as well.

contactvivekjai · ‎05-11-2018

@Shu you are awesome. Thanks for the help. It does resolve my problem. I wanted to make rowkey using the combination of json attributes, in order to achieve I used updateAttribute processor and declared the key there using the combination of resource id, metric id and timestamp. Then included this rowkey in AttributesToJson.

contactvivekjai · ‎05-10-2018

I have flattened a json using jolt since the input was json array it has resulted in list for the keys. But to use PutHBaseJson processor I need scalar values to define rowkey. Is there a way to repeat the whole json again and keep one value at a time? So that the uniqueness is maintained. Below are my input, transformation and output Input { "resource": { "id": "1234", "name": "Resourse_Name" }, "data": [ { "measurement": { "key": "5678", "value": "status" }, "timestamp": 1517784040000, "value": 1 }, { "measurement": { "key": "91011", "value": "location" }, "timestamp": 1519984070000, "value": 0 } ] } Transformation [ { "operation": "shift", "spec": { "resource": { "id": "resource_id", "name": "resource_name" }, // // Turn all the SecondaryRatings into prefixed data // like "rating-Design" : 4 "data": { "*": { // the "&" in "rating-&" means go up the tree 0 levels, // grab what is ther and subtitute it in "measurement": { "*": "measurement_&" }, "timestamp": "measurement_timestamp", "value": "value" } } } } ] Output { "resource_id" : "1234", "resource_name" : "Resourse_Name", "measurement_key" : [ "5678", "91011" ], "measurement_value" : [ "status", "location" ], "measurement_timestamp" : [ 1517784040000, 1519984070000 ], "value" : [ 1, 0 ] } Expected output { "resource_id" : "1234", "resource_name" : "Resourse_Name", "measurement_key" : "5678", "measurement_value" : "status", "measurement_timestamp" : 1517784040000, , "value" : 1 }, { "resource_id" : "1234", "resource_name" : "Resourse_Name", "measurement_key" : "91011" , "measurement_value" : "location" , "measurement_timestamp" : 1519984070000 , "value" : 0 }

contactvivekjai · ‎05-10-2018

Thanks a lot ... really appreciate it. I was struggling to get this done, you saved a lot of my time.

contactvivekjai · ‎05-08-2018

Hi, I'm struggling to put Json into HBase as the file I'm receiving is as below. I'm able to parse individual json objects but I have no idea how to iterate over the file. I tried splitText, splitContent, splitRecord with no luck. Any help will be very much appreciated. { "version": "1", "source": { "type": "internal", "id": "123" } } { "version": "1", "source": { "type": "external", "id": "456" } }

contactvivekjai · ‎04-23-2018

Hello Community, I'm new to Nifi and some time really find it annoying when I delete or move some part of my data flow and couldnt get it back. I googled it hoping that there would be some way to undo in Nifi but found out that a Jira is still open since 2015. Is this correct ? Or is there a way to undo ?

Online	Offline
Last Visited	‎03-12-2019 01:33 PM

Member Since	‎03-31-2018 10:12 PM
Last Visited	‎03-12-2019 01:33 PM
Posts	27
Kudos received	2

Cloudera Community

Spark job reeturns empty rows from HBase

Re: Getting error while trying to connect HBase us...

Re: Getting error while trying to connect HBase us...

Re: Getting error while trying to connect HBase us...

Getting error while trying to connect HBase using ...

Re: How to repeat a json to maintain uniqueness fo...

How to repeat a json to maintain uniqueness for Pu...

Re: How to split flow file into multiple Json file...

How to split flow file into multiple Json files

How to undo in nifi ?