Member since
03-31-2018
27
Posts
2
Kudos Received
0
Solutions
07-03-2018
03:17 PM
Hi Community, I'm running a basic spark job which reads from an HBase table. I can see the job is getting complete without any error, but in output I get the empty rows. Will appreciate any help. Below is my code object objectName {
def catalog = s"""{
|"table":{"namespace":"namespaceName", "name":"tableName"},
|"rowkey":"rowKeyAttribute",
|"columns":{
|"Key":{"cf":"rowkey", "col":"rowKeyAttribute", "type":"string"},
|"col1":{"cf":"cfName", "col":"col1", "type":"bigint"},
|"col2":{"cf":"cfName", "col":"col2", "type":"string"}
|}
|}""".stripMargin
def main(args: Array[String]) {
val spark = SparkSession.builder()
.appName("dummyApplication")
.getOrCreate()
val sc = spark.sparkContext
val sqlContext = spark.sqlContext
import sqlContext.implicits._
def withCatalog(cat: String): DataFrame = {
sqlContext
.read
.options(Map(HBaseTableCatalog.tableCatalog -> cat))
.format("org.apache.spark.sql.execution.datasources.hbase")
.load()
}
}
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Spark
07-03-2018
02:58 PM
@Sandeep Nemuri thanks for the help, surely will do so.
... View more
07-03-2018
12:37 PM
Hi @Sandeep Nemuri thanks for answering. I did as mentioned above and now my code runs fine with deploy-mode as client, but now I ran into another issue where the data displayed is blank. However I can see the data on hbase shell prompt. I have already checked the rowkey,table name and namespace name. What do you think is missing ? val df = withCatalog(catalog)
df.show
df.select($"col0").show(10)
spark.stop()
... View more
07-02-2018
07:56 AM
@karthik nedunchezhiyan I have attached hbase-site.xml, I hope it helps.
... View more
07-02-2018
02:36 AM
Hi Community, I'm new to spark and have been struggling to execute a spark job which connects to a HBase table. In YARN GUI I can see the spark job is getting to state Running but then it fails with below error : 18/07/01 21:08:20 INFO ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x6a8bcb640x0, quorum=localhost:2181, baseZNode=/hbase 18/07/01 21:08:20 INFO ClientCnxn: Opening socket connection to server localhost.localdomain/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 18/07/01 21:08:20 WARN ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125) 18/07/01 21:08:20 WARN RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid ZooKeeper is up and running. Below is my code //Create table catalog
object foo {
def catalog = s"""{
|"table":{"namespace":"foo", "name":"bar"},
|"rowkey":"key",
|"columns":{
|"col0":{"col":"key", "type":"string"},
|"col1":{"cf":"data", "col":"id", "type":"bigint"}
|}
|} "" ". stripMargin
def main(args: Array[String]) {
val spark = SparkSession.builder()
.appName("foo")
.getOrCreate()
choose sc = kick.sparkContext
val sqlContext = spark.sqlContext
import sqlContext.implicits._
def withCatalog(cat: String): DataFrame = {
sqlContext
.read
.options(Map(HBaseTableCatalog.tableCatalog -> cat))
.format("org.apache.spark.sql.execution.datasources.hbase")
.load()
}
// Read from HBase table
val df = withCatalog(catalog)
df.show
df.filter($"col0" === "1528801346000_200550232_2955")
.select($"col0", $"col1").show
spark.stop()
}
I'll really appreciate any help on this. I couldnt find any convincing answer in stackoverflow as well.
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Spark
05-11-2018
11:32 AM
1 Kudo
@Shu you are awesome. Thanks for the help. It does resolve my problem. I wanted to make rowkey using the combination of json attributes, in order to achieve I used updateAttribute processor and declared the key there using the combination of resource id, metric id and timestamp. Then included this rowkey in AttributesToJson.
... View more
05-10-2018
11:32 PM
1 Kudo
I have flattened a json using jolt since the input was json array it has resulted in list for the keys. But to use PutHBaseJson processor I need scalar values to define rowkey. Is there a way to repeat the whole json again and keep one value at a time? So that the uniqueness is maintained.
Below are my input, transformation and output Input {
"resource": {
"id": "1234",
"name": "Resourse_Name"
},
"data": [
{
"measurement": {
"key": "5678",
"value": "status"
},
"timestamp": 1517784040000,
"value": 1
},
{
"measurement": {
"key": "91011",
"value": "location"
},
"timestamp": 1519984070000,
"value": 0
}
]
}
Transformation [
{
"operation": "shift",
"spec": {
"resource": {
"id": "resource_id",
"name": "resource_name"
},
//
// Turn all the SecondaryRatings into prefixed data
// like "rating-Design" : 4
"data": {
"*": {
// the "&" in "rating-&" means go up the tree 0 levels,
// grab what is ther and subtitute it in
"measurement": {
"*": "measurement_&"
},
"timestamp": "measurement_timestamp",
"value": "value"
}
}
}
}
]
Output {
"resource_id" : "1234",
"resource_name" : "Resourse_Name",
"measurement_key" : [ "5678", "91011" ],
"measurement_value" : [ "status", "location" ],
"measurement_timestamp" : [ 1517784040000, 1519984070000 ],
"value" : [ 1, 0 ]
}
Expected output {
"resource_id" : "1234",
"resource_name" : "Resourse_Name",
"measurement_key" : "5678",
"measurement_value" : "status",
"measurement_timestamp" : 1517784040000, ,
"value" : 1
},
{
"resource_id" : "1234",
"resource_name" : "Resourse_Name",
"measurement_key" : "91011" ,
"measurement_value" : "location" ,
"measurement_timestamp" : 1519984070000 ,
"value" : 0
}
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache HBase
05-10-2018
11:34 AM
Thanks a lot ... really appreciate it. I was struggling to get this done, you saved a lot of my time.
... View more
05-08-2018
07:54 PM
Hi,
I'm struggling to put Json into HBase as the file I'm receiving is as below. I'm able to parse individual json objects but I have no idea how to iterate over the file. I tried splitText, splitContent, splitRecord with no luck. Any help will be very much appreciated. {
"version": "1",
"source": {
"type": "internal",
"id": "123"
}
}
{
"version": "1",
"source": {
"type": "external",
"id": "456"
}
}
... View more
Labels:
- Labels:
-
Apache NiFi
04-23-2018
01:17 PM
Hello Community, I'm new to Nifi and some time really find it annoying when I delete or move some part of my data flow and couldnt get it back. I googled it hoping that there would be some way to undo in Nifi but found out that a Jira is still open since 2015. Is this correct ? Or is there a way to undo ?
... View more
Labels:
- Labels:
-
Apache NiFi
- « Previous
-
- 1
- 2
- Next »