About sgowda

sgowda · ‎07-27-2017

You can get it with below API spark history server component name is SPARK_JOBHISTORYSERVER curl -XGET -u admin:admin https://hostname:8443/api/v1/clusters/<clusterName>/components

sgowda · ‎06-30-2017

While this article provides a mechanism through which we could setup Spark with HiveContext, there are some limitation that when using Spark with HiveContext. For e.x Hive support writing query result to HDFS using the "INSERT OVERWRITE DIRECTORY" i.e INSERT OVERWRITE DIRECTORY 'hdfs://cl1/tmp/query' SELECT * FROM REGION Above command will result is writing the result of above query to HDFS. However if the same query is passed to Spark with HiveContext, this will fail since "INSERT OVERWRITE DIRECTORY" is not a supported feature when using Spark. This is tracked via this jira. If the same needs to be achieved via spark -- it could achieved by using the Spark CSV library ( required in case of Spark1 ). Below is the code snippet on how to achieve the same. DataFrame df = hiveContext.sql("SELECT * FROM REGION"); df.write() .format("com.databricks.spark.csv") .option("delimiter", "\u0001") .save("hdfs://cl1/tmp/query"); Above command will save the result in HDFS under dir /tmp/query. Please note the delimiter which is used, this is same as what hive currently supports. Also below depedency needs to be added to pom.xml <dependency> <groupId>com.databricks</groupId> <artifactId>spark-csv_2.10</artifactId> <version>1.5.0</version> </dependency>

sgowda · ‎06-27-2017

@Bhushan Rokade Yes beeline expects the HQL file to be local file system.

sgowda · ‎05-22-2017

Hello @Rishabh Oberoi Please refer to below link http://docs.hortonworks.com/HDPDocuments/Ambari-2.5.0.3/bk_ambari-security/content/ch_configuring_amb_hdp_for_kerberos.html

sgowda · ‎05-19-2017

@Saumidh Mhatre Yes this should be possible by specifying multiple actions in single workflow.xml <?xml version="1.0" ?> <workflow-app name="sample" xmlns="uri:oozie:workflow:0.5"> <global> <job-tracker>${RESOURCE_MANAGER}</job-tracker> <name-node>${NAME_NODE}</name-node> <configuration> <property> <name>mapreduce.job.queuename</name> <value>${DEFAULT_QUEUE}</value> </property> </configuration> </global> <start to="query1"/> <action name="query1"> <java> <prepare> <delete path="/tmp/query1"/> </prepare> <job-xml>${APP_PATH}/lib/hbase-site.xml</job-xml> <main-class>com.org.test.TestSample</main-class> <java-opts>-cp $CLASSPATH:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/etc/hbase/conf</java-opts> <archive>${APP_PATH}/lib/test-sample.jar</archive> </java> <ok to="metrics"/> <error to="kill"/> </action> <action name="metrics"> <shell xmlns="uri:oozie:shell-action:0.2"> <job-tracker>${RESOURCE_MANAGER}</job-tracker> <name-node>${NAME_NODE}</name-node> <exec>oozie_hook.py</exec> <file>${APP_PATH}/shell/oozie_hook.py#oozie_hook.py</file> </shell> <ok to="end"/> <error to="kill"/> </action> <kill name="kill"> <message>Workflow action failed, killing workflow</message> </kill> <end name="end"/> </workflow-app>

sgowda · ‎04-28-2017

@Satish Anjaneyappa See if below link helps ! https://community.hortonworks.com/questions/79661/how-to-deletedrop-a-partition-of-an-external-table.html

sgowda · ‎04-27-2017

@Ishvari Dhimmar Please use below links for reference https://www.tutorialspoint.com/apache_oozie/index.htm https://oozie.apache.org/docs/4.2.0/DG_Hive2ActionExtension.html https://oozie.apache.org/docs/4.0.0/DG_SqoopActionExtension.html https://oozie.apache.org/docs/3.2.0-incubating/WorkflowFunctionalSpec.html

sgowda · ‎04-26-2017

@Ishvari Dhimmar Have you evaluated oozie ? I believe you would need to run these repeatedly at some interval. oozie provides support all the above mentioned components i.e pig , hive and sqoop and can be defined as seperate actions in oozie. You do not need to create seperate MR job ( using .NET SDK ) if you go this route.

sgowda · ‎04-06-2017

Yes, it is a must for Kerberos Authentication but why HiveServer2 principal ? and why not any other flags like authType=kerberos ? The reason why I ask this question why are we expecting the "client" making jdbc connection to know HS2 principal name ( which HS2 is already aware since it also resides in secure cluster and would connect to kdc for login with the same principal )

sgowda · ‎04-06-2017

Yes, it is a must for Kerberos Authentication but why HiveServer2 principal ? and why not any other flags like authType=kerberos ? The reason why I ask this question why are we expecting the "client" making jdbc connection to know HS2 principal name ( which HS2 is already aware since it also resides in secure cluster and would connect to kdc for login with the same principal )

Online	Offline
Last Visited	‎10-08-2018 05:03 AM

Member Since	‎05-20-2016 05:56 PM
Last Visited	‎10-08-2018 05:03 AM
Posts	155
Kudos received	220

Cloudera Community

Re: Enabling JMX for Nifi

Re: Can Oozie invoke and RestFul service for notif...

Re: Ambari installation failed through ambari-blu...

Re: Hbase: Failed to become active master

Re: what's the spark history server 's name?

Re: what's the spark history server 's name?

Write Spark HQL Query output to HDFS

Re: Beeline -f command failed - No such file or di...

Re: Kerberos Setup on HDP 2.6

Re: Is there anyway to run multiple oozie jobs in ...

Re: truncate partition external table

Re: Automate the process of Pig, Hive, Sqoop.

Re: Automate the process of Pig, Hive, Sqoop.

Re: hive principal required in hive jdbc url

Re: hive principal required in hive jdbc url