About michelsumbul

asinghal · ‎08-11-2016

As sql dialect keywords are case insensitive , so Double quotes(") just helps parser to avoid converting them back to upper case. May be you can create a another view over your current view with case insensitive column name. You may check the actual syntax on the site, JFYR create view view2(col1 integer, col2 integer) as select "col1","col2" from view1;

elserj · ‎02-22-2017

Do not piggy-back on others' questions which have already been answered, please. Ask your own question.

tyu · ‎06-21-2016

Hive using MR-over-hbase-snapshots would be a viable solution. Perform a snapshot in hbase, then use hive to directly read from the underlying HFiles.

bleonhardi · ‎05-10-2016

Contrary to popular believe Spark is not in-memory only a) Simple read no shuffle ( no joins, ... ) For the initial reads Spark like MapReduce reads the data in a stream and processes it as it comes along. I.e. unless there is a reason spark will NOT materialize the full RDDs in memory ( you can tell him to do it however if you want to cache a small dataset ) An RDD is resilient because spark knows how to recreate it ( re read a block from hdfs for example ) not because its stored in mem in different locations. ( that can be done too though. ) So if you filter out most of your data or do an efficient aggregation that aggregates on the map side you will never have the full table in memory. b) Shuffle This is done very similarly to MapReduce as it writes the map outputs to disc and reads them with the reducers through http. However spark uses an aggressive filesystem buffer strategy on the Linux filesystem so if the OS has memory available the data will not be actually written to physical disc. c) After Shuffle RDDs after shuffle are normally cached by the engine ( otherwise a failed node or RDD would require a complete re run of the job ) however as abdelkrim mentions Spark can spill these to disc unless you overrule that. d) Spark Streaming This is a bit different. Spark streaming expects all data to fit in memory unless you overwrite settings.

mqureshi · ‎08-25-2016

@Michel Sumbul When you talk about encryption in HBase, you Encrypt HFile and WAL. You cannot encrypt only some columns and not others. When you encrypt the HFile, your cells are encrypted. Please check the following link on how to implement this. https://hbase.apache.org/book.html#hbase.encryption.server You can also create HDFS level encryption zone for /hbase directory and your data will be encrypted. Please check the following link https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_hdfs_admin_tools/content/hbase-with-hdfs-encr.html

kushalbohra · ‎03-11-2018

For me adding the line below to spark-defaults.conf helped based on packages installed on my test cluster. spark.executor.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native/:/usr/hdp/current/share/lzo/0.6.0/lib/native/Linux-amd64-64/

sumit_nigam · ‎09-22-2016

@Artem Ervits @Neeraj Sabharwal - I am trying to leverage size-based throttling but keep getting ThrottlingException when I start hbase, even when there is hardly any data in hbase. I am sure this is some mis-configuration from my end but I cannot seem to find that out. Any inputs would be appreciated. Just to also add there is some correlation here between number of pre-splits and throttling size limit because the error shows up only when number of pre-splits are more. Details : Hbase version : 1.1.2, Number of region servers :4, Number of regions : 116, HeapMemory for Region Server : 2GB Quotas set : TABLE => ns1:table1 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, LIMIT => 10G/sec, SCOPE => MACHINE TABLE => ns2:table2 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, LIMIT => 10G/sec, SCOPE => MACHINE Region server stack trace (notice below that the error is about read size limit exceeded, and later the size of scan is only 28 (bytes?). Stack trace:- 2016-09-17 22:35:40,674 DEBUG [B.defaultRpcServer.handler=55,queue=1,port=58526] quotas.RegionServerQuotaManager: Throttling exception for user=root table=ns1:table1 numWrites=0 numReads=0 numScans=1: read size limit exceeded - wait 0.00sec 2016-09-17 22:35:40,676 DEBUG [B.defaultRpcServer.handler=55,queue=1,port=58526] ipc.RpcServer: B.defaultRpcServer.handler=55,queue=1,port=58526: callId: 52 service: ClientService methodName: Scan size: 28 connection: 10.65.141.170:42806 org.apache.hadoop.hbase.quotas.ThrottlingException: read size limit exceeded - wait 0.00sec at org.apache.hadoop.hbase.quotas.ThrottlingException.throwThrottlingException(ThrottlingException.java:107)

nsabharwal · ‎02-01-2016

@Chris Gambino ! Does not work in Zeppelin

aervits · ‎02-02-2016

@Michel Sumbul has this been resolved? Can you post your solution or accept best answer?

aervits · ‎01-25-2016

@Michel Sumbul I was able to reach 840k/sec reads on AWS, Centos7, XFS filesystem, 9 nodes, 12 7200 RPM drives in non-mapreduce mode. Same hardware write-only test resulted in 185k/sec. For mixed workload, I got 148k/sec writes and 270k/s reads. This is with snappy compression on.

Online	Offline
Last Visited	‎07-14-2017 08:48 AM

Member Since	‎12-14-2015 09:38 AM
Last Visited	‎07-14-2017 08:48 AM
Posts	27
Kudos received	22

Cloudera Community

Re: this version of libhadoop was built without sn...

Re: Phoenix on Hbase existing table - Upper/lowerc...

Re: Phoenix: how to make regex on field?

Re: CBO for Hive over hbase

Re: dataframe bigger than evalable memory

Re: Hbase Encryption of the cell content and encry...

Re: this version of libhadoop was built without sn...

Re: Limit ressource allocate to HBase query based ...

Re: Error when query hive/hbase from %sql

Re: Several getSFTP in parallel to different sever...

Re: HDFS Compression vs Performance