Member since
02-18-2017
10
Posts
0
Kudos Received
0
Solutions
05-12-2017
06:20 AM
Based on the columns in spark dataframe need to do a lookup on another huge HBASE table. Is there any efficient way available to perform lookup operation in Spark Dataframe
... View more
Labels:
05-12-2017
06:17 AM
Using spark accumulable collection can we able to accumulate data streaming in different kakfa events.
... View more
Labels:
05-12-2017
06:15 AM
Why RDD looks to be faster on certain operations like filter compared to DataFrame.
... View more
Labels:
05-05-2017
07:02 AM
In spark streaming, how to load data received in different events into collection. Lets say on event 1, 1000 records are streamed and on event 2 again another 1000 records are streamed. Now at the end of event 2 I want both the event data (1000 + 1000) I tried accumulable collection in spark to accumulate data streamed in different events. But, it did not work. Please help.
... View more
Labels:
04-25-2017
10:14 AM
I want to improve the performance of HBase Read operation. Is there any option available in spark hbase connector to scale up the hbase read operation?
... View more
04-25-2017
06:48 AM
Need a solution for the below scenario: Let say 3 million records are stored in HBASE which is a past data. Now on streaming let say 10k records have been pulled and for those 10k records need to get its matching records from HBASE based on the key and the operation should complete in less than a half a minute. We are using Spark HBASE connector.
... View more
Labels:
03-13-2017
04:00 PM
val cat = s"""{ |"table":{"namespace":"myTable", "name":"person", "tableCoder":"PrimitiveType"}, |"rowkey":"ROW", |"columns":{ |"col0":{"cf":"person", "col":"detail", "type":"string"} |} |}""".stripMargin Here("col0":{"cf":"person", "col":"detail", "type":"string"}) you were missing the rowkey details. For example: see here, id column is pointed to HBASE rowkey "id":{"cf":"rowkey", "col":"key", "type":"string"},
... View more
02-23-2017
11:54 AM
I tried to follow the same steps mentioned above to override the log4j properties but still not working. My problem is I am running kafka streaming job in yarn cluster mode and when I go and see logs after one hour in web UI,the logs are grown big. I like to know the steps where I can write logs in local file system/hdfs so I would go and see the logs in unix terminal instead of using web UI.
... View more