Member since
05-20-2016
155
Posts
220
Kudos Received
30
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5937 | 03-23-2018 04:54 AM | |
2154 | 10-05-2017 02:34 PM | |
1140 | 10-03-2017 02:02 PM | |
7734 | 08-23-2017 06:33 AM | |
2469 | 07-27-2017 10:20 AM |
07-27-2017
10:20 AM
6 Kudos
You can get it with below API spark history server component name is SPARK_JOBHISTORYSERVER curl -XGET -u admin:admin https://hostname:8443/api/v1/clusters/<clusterName>/components
... View more
06-30-2017
09:57 AM
4 Kudos
While this article provides a mechanism through which we could setup Spark with HiveContext, there are some limitation that when using Spark with HiveContext. For e.x Hive support writing query result to HDFS using the "INSERT OVERWRITE DIRECTORY" i.e INSERT OVERWRITE DIRECTORY 'hdfs://cl1/tmp/query'
SELECT * FROM REGION Above command will result is writing the result of above query to HDFS. However if the same query is passed to Spark with HiveContext, this will fail since "INSERT OVERWRITE DIRECTORY" is not a supported feature when using Spark. This is tracked via this jira. If the same needs to be achieved via spark -- it could achieved by using the Spark CSV library ( required in case of Spark1 ). Below is the code snippet on how to achieve the same. DataFrame df = hiveContext.sql("SELECT * FROM REGION");
df.write()
.format("com.databricks.spark.csv")
.option("delimiter", "\u0001")
.save("hdfs://cl1/tmp/query");
Above command will save the result in HDFS under dir /tmp/query. Please note the delimiter which is used, this is same as what hive currently supports. Also below depedency needs to be added to pom.xml <dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-csv_2.10</artifactId>
<version>1.5.0</version>
</dependency>
... View more
Labels:
06-27-2017
12:57 PM
4 Kudos
@Bhushan Rokade Yes beeline expects the HQL file to be local file system.
... View more
05-22-2017
07:04 AM
2 Kudos
Hello @Rishabh Oberoi Please refer to below link http://docs.hortonworks.com/HDPDocuments/Ambari-2.5.0.3/bk_ambari-security/content/ch_configuring_amb_hdp_for_kerberos.html
... View more
05-19-2017
11:02 AM
2 Kudos
@Saumidh Mhatre Yes this should be possible by specifying multiple actions in single workflow.xml <?xml version="1.0" ?>
<workflow-app name="sample" xmlns="uri:oozie:workflow:0.5">
<global>
<job-tracker>${RESOURCE_MANAGER}</job-tracker>
<name-node>${NAME_NODE}</name-node>
<configuration>
<property>
<name>mapreduce.job.queuename</name>
<value>${DEFAULT_QUEUE}</value>
</property>
</configuration>
</global>
<start to="query1"/>
<action name="query1">
<java>
<prepare>
<delete path="/tmp/query1"/>
</prepare>
<job-xml>${APP_PATH}/lib/hbase-site.xml</job-xml>
<main-class>com.org.test.TestSample</main-class>
<java-opts>-cp $CLASSPATH:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/etc/hbase/conf</java-opts>
<archive>${APP_PATH}/lib/test-sample.jar</archive>
</java>
<ok to="metrics"/>
<error to="kill"/>
</action>
<action name="metrics">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${RESOURCE_MANAGER}</job-tracker>
<name-node>${NAME_NODE}</name-node>
<exec>oozie_hook.py</exec>
<file>${APP_PATH}/shell/oozie_hook.py#oozie_hook.py</file>
</shell>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Workflow action failed, killing workflow</message>
</kill>
<end name="end"/>
</workflow-app>
... View more
04-28-2017
08:08 AM
2 Kudos
@Satish Anjaneyappa See if below link helps ! https://community.hortonworks.com/questions/79661/how-to-deletedrop-a-partition-of-an-external-table.html
... View more
04-27-2017
11:23 AM
@Ishvari Dhimmar Please use below links for reference https://www.tutorialspoint.com/apache_oozie/index.htm https://oozie.apache.org/docs/4.2.0/DG_Hive2ActionExtension.html https://oozie.apache.org/docs/4.0.0/DG_SqoopActionExtension.html https://oozie.apache.org/docs/3.2.0-incubating/WorkflowFunctionalSpec.html
... View more
04-26-2017
08:29 AM
1 Kudo
@Ishvari Dhimmar Have you evaluated oozie ? I believe you would need to run these repeatedly at some interval. oozie provides support all the above mentioned components i.e pig , hive and sqoop and can be defined as seperate actions in oozie. You do not need to create seperate MR job ( using .NET SDK ) if you go this route.
... View more
04-06-2017
04:41 AM
1 Kudo
Yes, it is a must for Kerberos Authentication but why HiveServer2 principal ? and why not any other flags like authType=kerberos ? The reason why I ask this question why are we expecting the "client" making jdbc connection to know HS2 principal name ( which HS2 is already aware since it also resides in secure cluster and would connect to kdc for login with the same principal )
... View more
04-06-2017
04:41 AM
1 Kudo
Yes, it is a must for Kerberos Authentication but why HiveServer2 principal ? and why not any other flags like authType=kerberos ? The reason why I ask this question why are we expecting the "client" making jdbc connection to know HS2 principal name ( which HS2 is already aware since it also resides in secure cluster and would connect to kdc for login with the same principal )
... View more