Member since
06-07-2016
923
Posts
322
Kudos Received
115
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3324 | 10-18-2017 10:19 PM | |
3674 | 10-18-2017 09:51 PM | |
13377 | 09-21-2017 01:35 PM | |
1367 | 08-04-2017 02:00 PM | |
1797 | 07-31-2017 03:02 PM |
08-01-2016
05:34 PM
@sankar rao what's the user name? does this old user have a directory under /user/<username>? Also, can this user run queries outside of HUE, using beeline?
... View more
08-01-2016
03:56 PM
1 Kudo
@Christopher Amatulli you can see a certified reference architecture for HDP here. This document will show you distribution of services across different machines. See page 5. https://hortonworks.com/wp-content/uploads/2013/10/4AA5-9017ENW.pdf
... View more
08-01-2016
03:24 PM
@Christopher Amatulli Hadoop was created to work with local attached storage. The whole idea of bringing compute to storage. This enables you to have failure redundancy, parallel processing on local data and reliability as a disk or node failure will simply kick off an automatic mechanism to re replicate lost data. For most performance you should have your data local to where your compute is and where your job is running. So Spark should read data on a local partition, rather than a remote storage. That bing said, for cost purpose companies might put old data in low cost storage like S3 and then run their jobs on that remote storage. This works but with the expectation that it is going to be slow compared to reading data from local disk. So, it depends on what your requirements are. Are you going to have a PB of data? That might be a good reason to have remote low cost storage like S3 to save money. Depending on your requirements, you may keep storage separate but it is not how you would usually go about. Also, Hive doesn't need to be separate. It runs on your compute nodes. Hive reads data that is in your HDFS. It is not a database in the sense that it will have its own storage. So you store data in HDFS (whether local or remote storage) and create Hive tables on that data and run your queries. As you can imagine, it would be more efficient if the data is local. Hope this helps. Please feel free to comment if you have additional questions.
... View more
08-01-2016
01:47 PM
@sankar rao Is your beeswax remote from cluster? It would be better if it's on one of the edge nodes. You are running out of memory. Not sure what the cause is but it could be low memory on your machine. how much memory do you have for beeswax?
... View more
08-01-2016
06:34 AM
Hi I am running a job against a secured spark cluster and I have a valid keytab and proxuuser settings for this user defined in core-site.xml. When I run the job, I get the following error. Any idea? 16/08/01 01:06:55 INFO yarn.Client: Attempting to login to the Kerberos using principal: <principal@REALM.COM> and keytab: <path to keytab file>
16/08/01 01:06:55 INFO client.RMProxy: Connecting to ResourceManager at <host>/<IP>:8032
16/08/01 01:06:55 INFO yarn.Client: Requesting a new application from cluster with 4 NodeManagers
16/08/01 01:06:55 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (3072 MB per container)
16/08/01 01:06:55 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
16/08/01 01:06:55 INFO yarn.Client: Setting up container launch context for our AM
16/08/01 01:06:55 INFO yarn.Client: Setting up the launch environment for our AM container
16/08/01 01:06:56 INFO yarn.Client: Credentials file set to: credentials-227f50ae-ab28-4b37-823d-15b3d723185a16/08/01 01:06:56 INFO yarn.YarnSparkHadoopUtil: getting token for namenode: hdfs://<host>:8020/user/<proxyuser>/.sparkStaging/application_1469977170124_000416/08/01 01:06:56 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 119 for <proxyuser> on 10.0.0.10:802016/08/01 01:06:56
ERROR spark.SparkContext: Error initializing SparkContext.org.apache.hadoop.security.AccessControlException: <proxyuser> tries to renew a token with renewer <kerberos principal>
at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:484)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7503)at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:549)at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:673)at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:984)at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:415)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
... View more
Labels:
- Labels:
-
Apache Spark
08-01-2016
06:26 AM
@ripunjay godhani Is this a sandbox, just for testing and experimenting? If yes, then it's fine. For anything else, no. This is not recommended.
... View more
08-01-2016
03:22 AM
1 Kudo
@Samie WALA I am assuming your PC is remote to the cluster? You are working from home and cluster is your work cluster. You are connected using a VPN which is using your home internet connection. When you login to shell, that shell is running on the same machine as HBase. Is that right? As opposed to shell, your Java application is running on your home PC? When you run your query in shell, it doesn't have to stream result over the network. The result stays right there and displayed right away. Shell is actually very highly optimized and doesn't have any overhead. It doesn't need much. Shell tends to be the fastest. Your application running on your PC has to go over the network to make a request, which seems like pretty slow in this case. You didn't mention how big is the result that is being streamed over the network to your PC. If it's big then network issues might become more pronounced. You have not shared your code, but there could be some room for optimization there too. One way to check your code if possible is to run your code on an edge node or some machine on the same network and see the difference.
... View more
07-31-2016
03:26 PM
@Saurabh Kumar One way to fix this particular error is to download the tez.tar.gz file (the version you need) and put it in /hdp/apps/2.3.4.0-3485/tez/. See this link for instructions. But I wonder what else you'll get into. But, I think would be a nice way to try to get out of this particular error. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_upgrading_hdp_manually/content/start-tez-22.html
... View more
07-30-2016
07:57 PM
Is this a secure cluster? You need to increase ulimits for root user. check this link.
... View more
07-30-2016
03:34 PM
2 Kudos
@sujitha sanku
I would write something like following: Hortonworks has a broad partner ecosystem with over 1700 partners (confirm number with Ajay Singh..it's probably more now) across ISVs, SI and resellers. Hortonworks understands that our customer's success requires us to make sure that their Hadoop deployment integrates easily with existing technologies in their data center. We have strong partnership with hundreds of ISVs including Tableau, Qlikview, Cognosm, Zoomdata as well as database vendors like Teradata, Oracle and SAP. Fast connectors are supported to exchange data between all leading databases. We also have close partnership with all leading and boutique SIs so if you are looking for consulting help with your project implementations, you'll be able to leverage your existing relationships with your SI partner who can help you provide resources to make sure you are successful with your Hortonworks deployments and projects.
... View more