About aervits

aervits · ‎02-28-2017

@Adnan Alvee use ORC format with HCatalog integration in Pig, take a look at my article https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-2.html

aervits · ‎02-28-2017

have you tried the following? import org.apache.hadoop.fs._ import org.apache.spark.deploy.SparkHadoopUtil import java.net.URI val hdfs_conf = SparkHadoopUtil.get.newConfiguration(sc.getConf) val hdfs = FileSystem.get(hdfs_conf)

aervits · ‎02-28-2017

Here's the latest Ambari doc with the same http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-security/content/optional_ambari_web_inactivity_timeout.html

aervits · ‎02-28-2017

Please see this https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.2.0/bk_Ambari_Security_Guide/content/_optional_ambari_web_inactivity_timeout.html

aervits · ‎02-27-2017

@Mehrdad Niasari for an example to run Python2 and Python3 please see my articles https://community.hortonworks.com/articles/82967/apache-ambari-workflow-designer-view-for-apache-oo.html https://community.hortonworks.com/articles/82988/apache-ambari-workflow-designer-view-for-apache-oo-1.html this covers a new workflow editing tool called Workflow Manager but same steps can be applied to writing pure XML workflows. Requirement here is that all Python libs should be available on every nodemanager. If you're on Kerborized cluster, Oozie will proxy the user permissions to user executing Oozie process, so paying attention to permissions across the whole workflow life cycle is also important. For good measure here's my article on shell action alone https://community.hortonworks.com/articles/82964/getting-started-with-apache-ambari-workflow-design.html Let me know if you run into any problems.

aervits · ‎02-27-2017

@Avijeet Dash there are a few options, here's a great article by one of our engineers https://community.hortonworks.com/articles/70658/how-to-diagnose-zeppelin.html to see it in action, here's a short article that demonstrates remote debug in action http://lresende.blogspot.com/2016/08/launching-apache-zeppelin-in-debug-mode.html

aervits · ‎02-27-2017

@Param NC you need to build your application with hadoop-client dependency in your pom.xml or sbt, for scope, supply <scope>provided</scope>. http://spark.apache.org/docs/1.6.2/submitting-applications.html Bundling Your Application’s Dependencies If your code depends on other projects, you will need to package them alongside your application in order to distribute the code to a Spark cluster. To do this, create an assembly jar (or “uber” jar) containing your code and its dependencies. Both sbt and Maven have assembly plugins. When creating assembly jars, list Spark and Hadoop as provided dependencies; these need not be bundled since they are provided by the cluster manager at runtime. Once you have an assembled jar you can call the bin/spark-submit script as shown here while passing your jar. For Python, you can use the --py-files argument of spark-submit to add .py , .zip or .egg files to be distributed with your application. If you depend on multiple Python files we recommend packaging them into a .zip or .egg . More info here http://spark.apache.org/docs/1.6.2/running-on-yarn.html Here's a sample pom.xml definition for hadoop-client <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.7.1.2.3.0.0-2557</version> <scope>provided</scope> <type>jar</type> </dependency> </dependencies> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <maven.compiler.source>1.7</maven.compiler.source> <maven.compiler.target>1.7</maven.compiler.target> </properties> <repositories> <repository> <id>HDPReleases</id> <name>HDP Releases</name> <url>http://repo.hortonworks.com/content/repositories/public</url> <layout>default</layout> <releases> <enabled>true</enabled> <updatePolicy>always</updatePolicy> <checksumPolicy>warn</checksumPolicy> </releases> <snapshots> <enabled>false</enabled> <updatePolicy>never</updatePolicy> <checksumPolicy>fail</checksumPolicy> </snapshots> </repository> <repository> <id>HDPJetty</id> <name>Hadoop Jetty</name> <url>http://repo.hortonworks.com/content/repositories/jetty-hadoop/</url> <layout>default</layout> <releases> <enabled>true</enabled> <updatePolicy>always</updatePolicy> <checksumPolicy>warn</checksumPolicy> </releases> <snapshots> <enabled>false</enabled> <updatePolicy>never</updatePolicy> <checksumPolicy>fail</checksumPolicy> </snapshots> </repository> <repository> <snapshots> <enabled>false</enabled> </snapshots> <id>central</id> <name>bintray</name> <url>http://jcenter.bintray.com</url> </repository> </repositories>

aervits · ‎02-27-2017

Use a create external table syntax on top of your data. Main thing to remember is create external and location in the syntax below, the rest depends on your file type and delimiter. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_dataintegration/content/moving_data_from_hdfs_to_hive_external_table_method.html CREATE EXTERNAL TABLE IF NOT EXISTS Cars( Name STRING, Miles_per_Gallon INT, Cylinders INT, Displacement INT, Horsepower INT, Weight_in_lbs INT, Acceleration DECIMAL, Year DATE, Origin CHAR(1)) COMMENT 'Data about cars from a public database' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE location '/user/<username>/visdata';

aervits · ‎02-27-2017

It supports both Spark and Pyspark so you're not missing out on anything

aervits · ‎02-27-2017

For Zeppelin in HDP 2.5 we introduced a new interpreter called Livy and it has its own way of managing dependency. Please look here http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_zeppelin-component-guide/content/zepp-with-spark.html

Online	Offline
Last Visited	‎08-15-2019 06:35 AM

Member Since	‎10-01-2015 11:46 AM
Last Visited	‎08-15-2019 06:35 AM
Posts	3,933
Kudos received	1074

Cloudera Community

Re: Where can I get latest resource_management.c...

Re: How to Kerberize Flume?

Re: Load Hive Table form Pig Output File.

Re: HDP 2.6 Cluster Issues with Hive Metastore

Re: which HDP release will storm 1.1.0 be packaged...

Re: Faster and Better Optimized Storage format in ...

Re: How to add the hadoop and yarn configuration f...

Re: Manage Ambari Users - can we set a timeout for...

Re: Manage Ambari Users - can we set a timeout for...

Re: When one should/shouldn't use Oozie shell acti...

Re: zeppelin debug

Re: How to add the hadoop and yarn configuration f...

Re: how we can get the data from dropped external ...

Re: zeppelin architecture

Re: zeppelin architecture