About mqureshi

mqureshi · ‎08-01-2016

@sankar rao what's the user name? does this old user have a directory under /user/<username>? Also, can this user run queries outside of HUE, using beeline?

mqureshi · ‎08-01-2016

@Christopher Amatulli you can see a certified reference architecture for HDP here. This document will show you distribution of services across different machines. See page 5. https://hortonworks.com/wp-content/uploads/2013/10/4AA5-9017ENW.pdf

mqureshi · ‎08-01-2016

@Christopher Amatulli Hadoop was created to work with local attached storage. The whole idea of bringing compute to storage. This enables you to have failure redundancy, parallel processing on local data and reliability as a disk or node failure will simply kick off an automatic mechanism to re replicate lost data. For most performance you should have your data local to where your compute is and where your job is running. So Spark should read data on a local partition, rather than a remote storage. That bing said, for cost purpose companies might put old data in low cost storage like S3 and then run their jobs on that remote storage. This works but with the expectation that it is going to be slow compared to reading data from local disk. So, it depends on what your requirements are. Are you going to have a PB of data? That might be a good reason to have remote low cost storage like S3 to save money. Depending on your requirements, you may keep storage separate but it is not how you would usually go about. Also, Hive doesn't need to be separate. It runs on your compute nodes. Hive reads data that is in your HDFS. It is not a database in the sense that it will have its own storage. So you store data in HDFS (whether local or remote storage) and create Hive tables on that data and run your queries. As you can imagine, it would be more efficient if the data is local. Hope this helps. Please feel free to comment if you have additional questions.

mqureshi · ‎08-01-2016

@sankar rao Is your beeswax remote from cluster? It would be better if it's on one of the edge nodes. You are running out of memory. Not sure what the cause is but it could be low memory on your machine. how much memory do you have for beeswax?

mqureshi · ‎08-01-2016

Hi I am running a job against a secured spark cluster and I have a valid keytab and proxuuser settings for this user defined in core-site.xml. When I run the job, I get the following error. Any idea? 16/08/01 01:06:55 INFO yarn.Client: Attempting to login to the Kerberos using principal: <principal@REALM.COM> and keytab: <path to keytab file> 16/08/01 01:06:55 INFO client.RMProxy: Connecting to ResourceManager at <host>/<IP>:8032 16/08/01 01:06:55 INFO yarn.Client: Requesting a new application from cluster with 4 NodeManagers 16/08/01 01:06:55 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (3072 MB per container) 16/08/01 01:06:55 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 16/08/01 01:06:55 INFO yarn.Client: Setting up container launch context for our AM 16/08/01 01:06:55 INFO yarn.Client: Setting up the launch environment for our AM container 16/08/01 01:06:56 INFO yarn.Client: Credentials file set to: credentials-227f50ae-ab28-4b37-823d-15b3d723185a16/08/01 01:06:56 INFO yarn.YarnSparkHadoopUtil: getting token for namenode: hdfs://<host>:8020/user/<proxyuser>/.sparkStaging/application_1469977170124_000416/08/01 01:06:56 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 119 for <proxyuser> on 10.0.0.10:802016/08/01 01:06:56 ERROR spark.SparkContext: Error initializing SparkContext.org.apache.hadoop.security.AccessControlException: <proxyuser> tries to renew a token with renewer <kerberos principal> at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:484)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7503)at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:549)at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:673)at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:984)at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:415)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

mqureshi · ‎08-01-2016

@ripunjay godhani Is this a sandbox, just for testing and experimenting? If yes, then it's fine. For anything else, no. This is not recommended.

mqureshi · ‎08-01-2016

@Samie WALA I am assuming your PC is remote to the cluster? You are working from home and cluster is your work cluster. You are connected using a VPN which is using your home internet connection. When you login to shell, that shell is running on the same machine as HBase. Is that right? As opposed to shell, your Java application is running on your home PC? When you run your query in shell, it doesn't have to stream result over the network. The result stays right there and displayed right away. Shell is actually very highly optimized and doesn't have any overhead. It doesn't need much. Shell tends to be the fastest. Your application running on your PC has to go over the network to make a request, which seems like pretty slow in this case. You didn't mention how big is the result that is being streamed over the network to your PC. If it's big then network issues might become more pronounced. You have not shared your code, but there could be some room for optimization there too. One way to check your code if possible is to run your code on an edge node or some machine on the same network and see the difference.

mqureshi · ‎07-31-2016

@Saurabh Kumar One way to fix this particular error is to download the tez.tar.gz file (the version you need) and put it in /hdp/apps/2.3.4.0-3485/tez/. See this link for instructions. But I wonder what else you'll get into. But, I think would be a nice way to try to get out of this particular error. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_upgrading_hdp_manually/content/start-tez-22.html

mqureshi · ‎07-30-2016

Is this a secure cluster? You need to increase ulimits for root user. check this link.

mqureshi · ‎07-30-2016

@sujitha sanku I would write something like following: Hortonworks has a broad partner ecosystem with over 1700 partners (confirm number with Ajay Singh..it's probably more now) across ISVs, SI and resellers. Hortonworks understands that our customer's success requires us to make sure that their Hadoop deployment integrates easily with existing technologies in their data center. We have strong partnership with hundreds of ISVs including Tableau, Qlikview, Cognosm, Zoomdata as well as database vendors like Teradata, Oracle and SAP. Fast connectors are supported to exchange data between all leading databases. We also have close partnership with all leading and boutique SIs so if you are looking for consulting help with your project implementations, you'll be able to leverage your existing relationships with your SI partner who can help you provide resources to make sure you are successful with your Hortonworks deployments and projects.

Online	Offline
Last Visited	‎10-31-2017 03:17 AM

Member Since	‎06-07-2016 09:05 AM
Last Visited	‎10-31-2017 03:17 AM
Posts	923
Kudos received	310

Cloudera Community

Re: YARN recommended configuration

Re: How to resolve for NULL values when they are c...

Re: Why is spark has better speed than Hadoop

Re: Is it possible to assign Hadoop queues to Hado...

Re: Kafka NiFi HDF Installation

Re: Hue(beewax) hive error: Internal error process...

Re: Physical layout of architecture

Re: Physical layout of architecture

Re: Hue(beewax) hive error: Internal error process...

spark-submit --proxy-user eror

Re: can i use the same disk with diff mount points...

Re: HBase Java API scan is too slow

Re: by mistake I removed /hdp/apps/2.3.4.0-3485 an...

Re: HiveServer2 Hive user's nofile ulimit above 64...

Re: Ease of integration