About sunile_manjee

vvaks · ‎03-29-2016

You asked why Data Science teams claim that they cannot do most of their work on the cluster with R. My point is that it is due to the fact that R is mainly a client side studio, not that different from Eclipse (but with more tools). The article I suggested points out the hoops you have to jump through to run R across a cluster. Spark R does not really address this just yet since Spark R is simply an R interpreter that turns the instructions sets into RDDs and then executes them as Spark on the cluster. Spark R does not actually use any of the R packages to exectue logic. Take a look at the Spark R page (https://spark.apache.org/docs/1.6.0/sparkr.html) it mainly talks about creating data frames using R syntax. The section on machine learning covers Gaussian and Binomial GLM, thats it, that is Spark R at this point. If the requirements of your project can be satisfied with using these techniques then great, you can now do your work on the cluster. If not, you will need to learn Spark and Scala. Until Spark has all of the functions and algorithms that R is capable of, Spark R will not completely solve the problem. That is why data scientist that do not have a strong dev background continue to sample data to make it fit on their workstation, so that they can continue to use all of the packages that R provides.

aervits · ‎03-18-2016

I gave you a couple of choices in your other thread https://community.hortonworks.com/questions/23666/resynchronize-the-hbase-data-betweentwo-clusters.html

Enis · ‎03-18-2016

The SyncTool is not in HDP releases yet, but we are tracking it to bring the tool to released versions.

jyadav · ‎03-16-2016

In general, Zookeeper doesn't actually required huge drives because it will only store metadata information for many services, I have seen customer using 100G to 250G of partition size for zookeeper data directory and logs which is fine of many cluster deployment. Moreover administrator need to set configuration for automatic purging policy of snapshots and logs directories so that we don't end up by filling all the local storage. Please refer below doc for more info. http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html

nsabharwal · ‎03-15-2016

I have installed only zookeeper.

afernandez · ‎03-31-2016

Ambari doesn't support that yet. We have a Jira for Ambari 3.0.0 https://issues.apache.org/jira/browse/AMBARI-14714 It will allow you to have multiple instances of the same service, and potentially at different stack versions, e.g., Spark 1.6.1, 1.7.0, etc.

nsabharwal · ‎03-14-2016

@Sunile Manjee To start with https://community.hortonworks.com/questions/2408/ranger-implementation-hive-impersonation-false.html

sunile_manjee · ‎03-14-2016

@Neeraj Sabharwal I am not sure I completely follow. The sql is being run from phoenix command line. Being so isn't the client should it use epoch? If not how to validate?

bmathew · ‎05-05-2016

We had similar issues with the hive interpreter while trying to run aggregations and gouping by columns: 1. Hive interpreter cannot be declared directly in notebook by using %hive. Interpreter must already be set to hive 2. First line in editor must be blank and Hive QL statements must start on second line, otherwise a NullPointer exception will be thrown after you submit job. This threw us off. Somehow we started the statement on the second line and it executed without errors. Then, when we went back and put the statement on the first line, it failed again. Moved the statement to the second line, with the first line blank, and it executed without any errors. Strange.

mkumar13 · ‎06-09-2016

I'm getting same error in HDP 2.4 sandbox, if use %hive on Zeppelin and then aggregate functions are not working... %hive select count(*) from health_table java.lang.NullPointerException at org.apache.zeppelin.hive.HiveInterpreter.getConnection(HiveInterpreter.java:184) at org.apache.zeppelin.hive.HiveInterpreter.getStatement(HiveInterpreter.java:204) at org.apache.zeppelin.hive.HiveInterpreter.executeSql(HiveInterpreter.java:233) at org.apache.zeppelin.hive.HiveInterpreter.interpret(HiveInterpreter.java:328) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:295) at org.apache.zeppelin.scheduler.Job.run(Job.java:171) at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) This issue resolved when i use %sql. I know your issue is not to related HDP sandbox 2.4 but may be this comment help someone using %hive on HDP sandbox 2.4.

Online	Offline
Last Visited	‎05-25-2022 10:07 AM

Member Since	‎05-30-2018 10:40 PM
Last Visited	‎05-25-2022 10:07 AM
Posts	1,322
Kudos received	713

Cloudera Community

Re: Iterate over ADLS files using spark?

Re: Install NiFi CA service post nifi cluster inst...

Re: Which storage format is optimum for training m...

Re: Ambari custom alert failing

Re: df.cache() is not working on jdbc table

Re: Model training outside of edge node?

Re: HBase Replication - Identify and fix impacted ...

Re: Resynchronize the HBase data betweentwo cluste...

Re: Zookeeper storage calculation

Re: Ambari setup for limited services

Re: Does ambari allow multipule instances of zooke...

Re: Hive Impersonations

Re: Phoenix timestamp dropping digits

Re: Zeppelin HDP 2.4 SparkSql aggregate functions ...

Re: Zeppelin hive null point exception HDP 2.4