Member since
03-31-2018
27
Posts
2
Kudos Received
0
Solutions
10-29-2018
08:35 AM
Hello, @John O'Hara is this problem resolved ? Even I'm having the issue I have rpc timeout set as 90000 but I get the error as 60000.
... View more
10-28-2018
04:11 PM
Hi @Shu, I wanted to do that as the last option as it will impact everyone else's work which is not nice. Isn't there a way to override those properties at all ?
... View more
10-26-2018
01:47 PM
While reading from hbase I'm getting error as below Caused by: java.io.IOException: Call to .local/:60020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=15, waitTime=59998, operationTimeout=59997 expired. at org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1263) at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1231) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:218) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:292) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32831) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:219) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:63) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:211) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:396) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:370) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:136) ... 4 more Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=15, waitTime=59998, operationTimeout=59997 expired. at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:70) at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1205) However I'm using below configuration in code to avoid timestamp, somehow these properties are not being overridden : <code>hbaseConf.set("hbase.rpc.timeout", "1800000")
hbaseConf.set("hbase.client.scanner.timeout.period", "1800000")
Wondering if I'm missing something. Can anyone please help ?
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Spark
10-18-2018
04:21 PM
Hi @wsalazar , Thanks for the nice explanation. I was wondering how you do it with composite keys ? We have spent sometime exploring phoenix encoder. Using this, data insertion is good but while reading and doing the range scan it somehow is very slow. Seems only the first part of the composite key is used for filter and rest of the key is not taken into account.
... View more
10-12-2018
10:05 AM
I think if you add hive-site.xml in your spark-submit command it should work. You can add it as --files /usr/hdp/current/spark2-client/conf/hive-site.xml
... View more
08-24-2018
02:56 PM
Thanks for reply @Matt Burgess after reading several blogs I came to realize that indeed I have to go for record-based processors. I'm trying to use PutHBaseRecord processor but cant get the composite row key as I used in my earlier flow. I'll post a new question with correct tags and sample.
... View more
08-22-2018
05:34 PM
Hi, I'm consuming data from kafka then parsing the jsons and inserting into hbase. Since my data is nested json I have to use split json twice however I observed that split json is slowing down the performance of my entire flow causing too much data in queue. We have 2 node cluster and I have set the number of concurrent tasks for each of the processors as 5. How can I enhance it performance ? Is using jolt transformation instead of splitjson a better idea ?
... View more
07-18-2018
11:02 PM
well I too have exactly same error. I have checked permissions on my script and it has execute permissions. @Vijay Kumar J is it possible for you to elaborate how it was resolved ?
... View more
07-03-2018
10:13 PM
@Felix Albani
just found that if I mention table as "table":{"name":"namespace:tablename"} in catalog then it works. Thanks for your time.
... View more
07-03-2018
09:51 PM
@Felix Albani know what, I tried for a table with default namespace. I'm able to view data. Seems its working for tables without namespace.
... View more
07-03-2018
09:22 PM
@Felix Albani I too really wanted to try this but these libraries are not deployed in cluster instead I create a dependencies jar and then I use it spark-submit.
... View more
07-03-2018
09:03 PM
Hi @Felix Albani I have checked these things already.
... View more
07-03-2018
07:15 PM
Hi @Felix Albani thanks for your response. Please accept my sincere apologies I somehow missed to include that part of the code. I have updated now. This is the output I see(Please note that I have changed the number of columns in above code, hence the difference). +----+----+----+----+----+----+----+----+----+
|col4|col7|col1|col3|col6|col0|col8|col2|col5|
+----+----+----+----+----+----+----+----+----+
+----+----+----+----+----+----+----+----+----+
18/07/03 16:16:27 INFO CodeGenerator: Code generated in 10.60842 ms
18/07/03 16:16:27 INFO CodeGenerator: Code generated in 8.990531 ms
+----+
|col0|
+----+
+----+
... View more
07-03-2018
03:17 PM
Hi Community, I'm running a basic spark job which reads from an HBase table. I can see the job is getting complete without any error, but in output I get the empty rows. Will appreciate any help. Below is my code object objectName {
def catalog = s"""{
|"table":{"namespace":"namespaceName", "name":"tableName"},
|"rowkey":"rowKeyAttribute",
|"columns":{
|"Key":{"cf":"rowkey", "col":"rowKeyAttribute", "type":"string"},
|"col1":{"cf":"cfName", "col":"col1", "type":"bigint"},
|"col2":{"cf":"cfName", "col":"col2", "type":"string"}
|}
|}""".stripMargin
def main(args: Array[String]) {
val spark = SparkSession.builder()
.appName("dummyApplication")
.getOrCreate()
val sc = spark.sparkContext
val sqlContext = spark.sqlContext
import sqlContext.implicits._
def withCatalog(cat: String): DataFrame = {
sqlContext
.read
.options(Map(HBaseTableCatalog.tableCatalog -> cat))
.format("org.apache.spark.sql.execution.datasources.hbase")
.load()
}
}
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Spark
07-03-2018
02:58 PM
@Sandeep Nemuri thanks for the help, surely will do so.
... View more
07-03-2018
12:37 PM
Hi @Sandeep Nemuri thanks for answering. I did as mentioned above and now my code runs fine with deploy-mode as client, but now I ran into another issue where the data displayed is blank. However I can see the data on hbase shell prompt. I have already checked the rowkey,table name and namespace name. What do you think is missing ? val df = withCatalog(catalog)
df.show
df.select($"col0").show(10)
spark.stop()
... View more
07-02-2018
07:56 AM
@karthik nedunchezhiyan I have attached hbase-site.xml, I hope it helps.
... View more
07-02-2018
02:36 AM
Hi Community, I'm new to spark and have been struggling to execute a spark job which connects to a HBase table. In YARN GUI I can see the spark job is getting to state Running but then it fails with below error : 18/07/01 21:08:20 INFO ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x6a8bcb640x0, quorum=localhost:2181, baseZNode=/hbase 18/07/01 21:08:20 INFO ClientCnxn: Opening socket connection to server localhost.localdomain/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 18/07/01 21:08:20 WARN ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125) 18/07/01 21:08:20 WARN RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid ZooKeeper is up and running. Below is my code //Create table catalog
object foo {
def catalog = s"""{
|"table":{"namespace":"foo", "name":"bar"},
|"rowkey":"key",
|"columns":{
|"col0":{"col":"key", "type":"string"},
|"col1":{"cf":"data", "col":"id", "type":"bigint"}
|}
|} "" ". stripMargin
def main(args: Array[String]) {
val spark = SparkSession.builder()
.appName("foo")
.getOrCreate()
choose sc = kick.sparkContext
val sqlContext = spark.sqlContext
import sqlContext.implicits._
def withCatalog(cat: String): DataFrame = {
sqlContext
.read
.options(Map(HBaseTableCatalog.tableCatalog -> cat))
.format("org.apache.spark.sql.execution.datasources.hbase")
.load()
}
// Read from HBase table
val df = withCatalog(catalog)
df.show
df.filter($"col0" === "1528801346000_200550232_2955")
.select($"col0", $"col1").show
spark.stop()
}
I'll really appreciate any help on this. I couldnt find any convincing answer in stackoverflow as well.
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Spark
05-11-2018
11:32 AM
1 Kudo
@Shu you are awesome. Thanks for the help. It does resolve my problem. I wanted to make rowkey using the combination of json attributes, in order to achieve I used updateAttribute processor and declared the key there using the combination of resource id, metric id and timestamp. Then included this rowkey in AttributesToJson.
... View more
05-10-2018
11:32 PM
1 Kudo
I have flattened a json using jolt since the input was json array it has resulted in list for the keys. But to use PutHBaseJson processor I need scalar values to define rowkey. Is there a way to repeat the whole json again and keep one value at a time? So that the uniqueness is maintained.
Below are my input, transformation and output Input {
"resource": {
"id": "1234",
"name": "Resourse_Name"
},
"data": [
{
"measurement": {
"key": "5678",
"value": "status"
},
"timestamp": 1517784040000,
"value": 1
},
{
"measurement": {
"key": "91011",
"value": "location"
},
"timestamp": 1519984070000,
"value": 0
}
]
}
Transformation [
{
"operation": "shift",
"spec": {
"resource": {
"id": "resource_id",
"name": "resource_name"
},
//
// Turn all the SecondaryRatings into prefixed data
// like "rating-Design" : 4
"data": {
"*": {
// the "&" in "rating-&" means go up the tree 0 levels,
// grab what is ther and subtitute it in
"measurement": {
"*": "measurement_&"
},
"timestamp": "measurement_timestamp",
"value": "value"
}
}
}
}
]
Output {
"resource_id" : "1234",
"resource_name" : "Resourse_Name",
"measurement_key" : [ "5678", "91011" ],
"measurement_value" : [ "status", "location" ],
"measurement_timestamp" : [ 1517784040000, 1519984070000 ],
"value" : [ 1, 0 ]
}
Expected output {
"resource_id" : "1234",
"resource_name" : "Resourse_Name",
"measurement_key" : "5678",
"measurement_value" : "status",
"measurement_timestamp" : 1517784040000, ,
"value" : 1
},
{
"resource_id" : "1234",
"resource_name" : "Resourse_Name",
"measurement_key" : "91011" ,
"measurement_value" : "location" ,
"measurement_timestamp" : 1519984070000 ,
"value" : 0
}
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache HBase
05-10-2018
11:34 AM
Thanks a lot ... really appreciate it. I was struggling to get this done, you saved a lot of my time.
... View more
05-08-2018
07:54 PM
Hi,
I'm struggling to put Json into HBase as the file I'm receiving is as below. I'm able to parse individual json objects but I have no idea how to iterate over the file. I tried splitText, splitContent, splitRecord with no luck. Any help will be very much appreciated. {
"version": "1",
"source": {
"type": "internal",
"id": "123"
}
}
{
"version": "1",
"source": {
"type": "external",
"id": "456"
}
}
... View more
Labels:
- Labels:
-
Apache NiFi
04-23-2018
01:17 PM
Hello Community, I'm new to Nifi and some time really find it annoying when I delete or move some part of my data flow and couldnt get it back. I googled it hoping that there would be some way to undo in Nifi but found out that a Jira is still open since 2015. Is this correct ? Or is there a way to undo ?
... View more
Labels:
- Labels:
-
Apache NiFi
04-17-2018
11:11 AM
Thanks for the Answer Herald, your understanding of my question is correct. All I'm trying is to get the format correct for my hive table column names. I mean I just put the formatted file on the hdfs and read using the external hive table.
... View more
04-17-2018
09:38 AM
I have a requirement where I need to create a hive table which uses the same data multiple times. In order to do that I'm having multiple replace text steps followed by a merge content. File content is as below : abc123 active true sometext I want to read the file as below into hive table : abc123;active;true;sometext;123;active;c1;true; In order to achieve this I'm using replace text where first I read as below(regex replace): abc123;active;true;sometext; Then 123;active; And finally as c1;true; But I'm not able to merge content horizontally, is there a way to do it ? Or may be a easier way to achieve the same result. Any help will be appriciated. Thanks.
... View more