About mqureshi

mqureshi · ‎10-12-2016

@ARUN Please see the following link. This issue has been answered before. https://community.hortonworks.com/questions/11779/hbase-master-shutting-down-with-zookeeper-delete-f.html

mqureshi · ‎10-11-2016

@Sunile Manjee Integration between Spark and HBase relies simply on HBaseContext which provides HBase configuration to Spark Executor. So, to answer which protocol is used, the answer is simple RPC. Please check following link for more details. https://hbase.apache.org/book.html#spark and here is the github link to HBase Spark module. https://github.com/apache/hbase/tree/master/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark

mqureshi · ‎10-11-2016

@mohamed sabri marnaoui Is it hanging or just waiting in the queue to run?

mqureshi · ‎10-10-2016

@Vikram Rathod Well that should be easy. You use Apache Ranger to create different organization groups and set authorization permissions. At HDFS level, you can create directories like /region/US, /region/UK, /region/APAC and then respective subdirectories to separate data. Each of these directories and their subdirectories can have further granular level permissions using Ranger and configure the cluster with Atlas for auditing and lineage information. You can also use HDFS storage quotas if you want but it appears that to start with, you don't need that. As for resource distribution, use YARN.

mqureshi · ‎10-10-2016

@Vikram Rathod Are you saying you will have just one cluster to serve all these regions? Your question has almost no details. Can you please share your requirements. Please remember that one cluster will not expand to more than one data center. If you will have one cluster for all regions, then you still just size based on your volume and SLAs and set the right expectations for users. for example, if your only cluster is in US then users in UK and APAC should expect slower response times due to network latency. I don't think it affects cluster size. Please provide more details, so we can help you answer.

mqureshi · ‎10-08-2016

@Cruz DSouza Let's start with the following. Check user permissions in your MySQL. login to MySQL shell and then see permissions in user table for user "hive". SELECT User, host from MySQL.user where user = 'hive' or try to find permissions for user hive and which hosts it can log in from. SHOW GRANTS for 'hive'@'%'; If you don't see permissions for user hive, specially for the host you are logging in from, you can run the following. Notice, this statement below, gives hive permission to run from any client host. You might not need this and in that case, customize according to your requirements GRANT ALL PRIVILEGES ON hive.* TO 'hive'@'%';

mqureshi · ‎10-06-2016

Negative. If you check the Jira's, they are unresolved. We don't ship unresolved issues in our product. So, your only option right now is to download the patch and apply to your installation. That will affect support if you have that because you are applying a non hortonworks patch. I would suggest that you simply distcp the file and then compress it. You are only saving a step. It's not saving you any time or giving better performance.

mqureshi · ‎10-06-2016

@Tran Quyet Thang According to the hive documentation, "filepath can refer to a file (in which case Hive will move the file into the table) or it can be a directory (in which case Hive will move all the files within that directory into the table)." I don't think, wild cards are allowed in the path. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables

mqureshi · ‎10-05-2016

@Vaibhav Kumar Two things here. 1. I don't understand your use of "a.id = b.id where b.id is null". When b.id is null, a.id and b.id will never be equal. However, it's your query and you probably know more about it, so you can ignore my comment if you know what I am talking about. 2. I think you need to use ROW_NUMBER function and then select the third row. This link describes usage of ROW_NUMBER() for SQL 2005 but it's the same for Hive.

mqureshi · ‎10-05-2016

@Arun Reddy This feature is still not available in Hadoop by default. You can add a patch but distcp doesn't compress data. following JIRA will give you all the details including the patch you want to download. https://issues.apache.org/jira/browse/HADOOP-8065 Following is the new JIRA https://issues.apache.org/jira/browse/HADOOP-13114 --> use this one if you decide to apply the patch.

Online	Offline
Last Visited	‎10-31-2017 03:17 AM

Member Since	‎06-07-2016 09:05 AM
Last Visited	‎10-31-2017 03:17 AM
Posts	923
Kudos received	310

Cloudera Community

Re: YARN recommended configuration

Re: How to resolve for NULL values when they are c...

Re: Why is spark has better speed than Hadoop

Re: Is it possible to assign Hadoop queues to Hado...

Re: Kafka NiFi HDF Installation

Re: hbase region server going down

Re: what protocol is used for the new spark hbase ...

Re: Spark job stage cancelled because SparkContext...

Re: Architecture Design for different regions clie...

Re: Architecture Design for different regions clie...

Re: Hive Meta store - mysql connection issue org.a...

Re: Distcp compression not working

Re: spark HiveThriftServer2 sql AnalysisException:...

Re: Select nth row in hive

Re: Distcp compression not working