Member since
12-14-2015
27
Posts
22
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
18016 | 03-17-2016 08:39 AM |
08-11-2016
01:29 PM
As sql dialect keywords are case insensitive , so Double quotes(") just helps parser to avoid converting them back to upper case. May be you can create a another view over your current view with case insensitive column name. You may check the actual syntax on the site, JFYR create view view2(col1 integer, col2 integer) as select "col1","col2" from view1;
... View more
02-22-2017
03:36 PM
Do not piggy-back on others' questions which have already been answered, please. Ask your own question.
... View more
06-21-2016
01:25 PM
Hive using MR-over-hbase-snapshots would be a viable solution. Perform a snapshot in hbase, then use hive to directly read from the underlying HFiles.
... View more
05-10-2016
11:02 AM
Contrary to popular believe Spark is not in-memory only a) Simple read no shuffle ( no joins, ... ) For the initial reads Spark like MapReduce reads the data in a stream and processes it as it comes along. I.e. unless there is a reason spark will NOT materialize the full RDDs in memory ( you can tell him to do it however if you want to cache a small dataset ) An RDD is resilient because spark knows how to recreate it ( re read a block from hdfs for example ) not because its stored in mem in different locations. ( that can be done too though. ) So if you filter out most of your data or do an efficient aggregation that aggregates on the map side you will never have the full table in memory. b) Shuffle This is done very similarly to MapReduce as it writes the map outputs to disc and reads them with the reducers through http. However spark uses an aggressive filesystem buffer strategy on the Linux filesystem so if the OS has memory available the data will not be actually written to physical disc. c) After Shuffle RDDs after shuffle are normally cached by the engine ( otherwise a failed node or RDD would require a complete re run of the job ) however as abdelkrim mentions Spark can spill these to disc unless you overrule that. d) Spark Streaming This is a bit different. Spark streaming expects all data to fit in memory unless you overwrite settings.
... View more
08-25-2016
02:10 AM
2 Kudos
@Michel Sumbul When you talk about encryption in HBase, you Encrypt HFile and WAL. You cannot encrypt only some columns and not others. When you encrypt the HFile, your cells are encrypted. Please check the following link on how to implement this. https://hbase.apache.org/book.html#hbase.encryption.server You can also create HDFS level encryption zone for /hbase directory and your data will be encrypted. Please check the following link https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_hdfs_admin_tools/content/hbase-with-hdfs-encr.html
... View more
03-11-2018
08:32 PM
For me adding the line below to spark-defaults.conf helped based on packages installed on my test cluster. spark.executor.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native/:/usr/hdp/current/share/lzo/0.6.0/lib/native/Linux-amd64-64/
... View more
09-22-2016
04:21 AM
@Artem Ervits @Neeraj Sabharwal - I am trying to leverage size-based throttling but keep getting ThrottlingException when I start hbase, even when there is hardly any data in hbase. I am sure this is some mis-configuration from my end but I cannot seem to find that out. Any inputs would be appreciated. Just to also add there is some correlation here between number of pre-splits and throttling size limit because the error shows up only when number of pre-splits are more.
Details : Hbase version : 1.1.2, Number of region servers :4, Number of regions : 116, HeapMemory for Region Server : 2GB Quotas set : TABLE => ns1:table1 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, LIMIT => 10G/sec, SCOPE => MACHINE TABLE => ns2:table2 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, LIMIT => 10G/sec, SCOPE => MACHINE Region server stack trace (notice below that the error is about read size limit exceeded, and later the size of scan is only 28 (bytes?). Stack trace:- 2016-09-17 22:35:40,674 DEBUG [B.defaultRpcServer.handler=55,queue=1,port=58526] quotas.RegionServerQuotaManager: Throttling exception for user=root table=ns1:table1 numWrites=0 numReads=0 numScans=1: read size limit exceeded - wait 0.00sec 2016-09-17 22:35:40,676 DEBUG [B.defaultRpcServer.handler=55,queue=1,port=58526] ipc.RpcServer: B.defaultRpcServer.handler=55,queue=1,port=58526: callId: 52 service: ClientService methodName: Scan size: 28 connection: 10.65.141.170:42806
org.apache.hadoop.hbase.quotas.ThrottlingException: read size limit exceeded - wait 0.00sec
at org.apache.hadoop.hbase.quotas.ThrottlingException.throwThrottlingException(ThrottlingException.java:107)
... View more
02-02-2016
03:22 PM
@Michel Sumbul has this been resolved? Can you post your solution or accept best answer?
... View more
01-25-2016
01:41 PM
@Michel Sumbul I was able to reach 840k/sec reads on AWS, Centos7, XFS filesystem, 9 nodes, 12 7200 RPM drives in non-mapreduce mode. Same hardware write-only test resulted in 185k/sec. For mixed workload, I got 148k/sec writes and 270k/s reads. This is with snappy compression on.
... View more