Member since
05-31-2016
89
Posts
14
Kudos Received
8
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4075 | 03-10-2017 07:05 AM | |
5960 | 03-07-2017 09:58 AM | |
3527 | 06-30-2016 12:13 PM | |
5800 | 05-20-2016 09:15 AM | |
27285 | 05-17-2016 02:09 PM |
07-26-2016
05:17 PM
Thank you. I am unaware of rsync. Could you also give me a brief intro on generating the hash from hive tables? Any link to its tutorial?
... View more
07-26-2016
03:40 PM
I have a request where I need to compare two hive tables and get the unique records from each table. Both the tables are in different clusters(Staging and Prod). I am using beeline to query both the table and storing the key column values into two different files for both tables and then I run some unix commands to compare the files to get the unique records. Here is the caveat!!!! It works perfect with small set of data in testing cluster. But when implemented in Staging and Prod the job hangs up and I am not able to get the result. Both the table has 50 gb of data. When I run the application using a shell script it takes a very long time and the putty screen freezes after a while, but the job finishes in the background. But still I cant see the output in the file.I have set the buffer size to 8 gb also. I can't use join operations because I have to compare two tables that are in two different clusters. Can I use spark to solve my problem? If yes, how can I do that? Or, is there some other way with which I can fix the problem that I face now?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
07-26-2016
08:54 AM
No @Sindhu. I have not
... View more
07-26-2016
07:08 AM
I am executing the query `SHOW TABLE STATS sampletable;` in Hive (1.1.0) and it throws the below error. hive> show table stats sampletable;
MismatchedTokenException(26!=105)
at org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
at org.apache.hadoop.hive.ql.parse.HiveParser.showStatement(HiveParser.java:21074)
at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2439)
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1586)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1062)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:201)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1119)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1167)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1055)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:757)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: ParseException line 1:11 mismatched input 'stats' expecting EXTENDED near 'table' in show statement I am using CDH 5.5.2 I have checked cloudera's document on this particular hive statement and looks like I am executing the right query. Here is the link to it. Not sure what the MismatchedTockenException is. Any help is on this, please?
... View more
Labels:
- Labels:
-
Apache Hive
06-30-2016
12:13 PM
Finally, I got the answer from Stackoverflow and wanted to propagate here. There isn't seems to be a straight way to get the last table name however the answer works just using a single line of shell script including the hive query. Here is it. last_table=$(hive -e "show tables 'test_temp_table*';" | sort -r | head -n1)
... View more
06-30-2016
06:15 AM
I have a list of hive tables and want to select the last table for performing some query. Here is what I use to get the list of similar hive tables. show tables 'test_temp_table*';
It displays the below result test_temp_table_1
test_temp_table_2
test_temp_table_3
test_temp_table_4
test_temp_table_5
test_temp_table_6
I need to run some query on test_temp_table_6. I can do this using shell script by writing the output to a temp file and reading the last value from it but is there a simple way using hive query to get the last table that has the maximum number at the end?
... View more
Labels:
- Labels:
-
Apache Hive
05-27-2016
11:08 AM
1 Kudo
I am getting the error while validating a workflow. Error: E0701: XML schema error, /d/app/workflow.xml, org.xml.sax.SAXParseException; lineNumber: 49; columnNumber: 11; cvc-complex-type.2.3: Element 'shell' cannot have character [children], because the type's content type is element-only. Here is the workflow.xml <workflow-app name="FILLED_WF" xmlns="uri:oozie:workflow:0.4">
<start to="read_cutoff"/>
<action name="read_cutoff">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>${cutoff_script}</exec>
<argument>${trigger_location}/${trigger_file}</argument>
<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
<file>${path}/${cutoff_script}#${cutoff_script}</file>
<capture-output/>
</shell>
<ok to="remove_trigger_flag_file_processing"/>
<error to="sendFailureEmail"/>
</action>
<action name="remove_trigger_flag_file_processing">
<fs>
<name-node>${nameNode}</name-node>
<delete path='${trigger_location}/${trigger_file}'/>
</fs>
<ok to="cutoff_values_table" />
<error to="sendFailureEmail" />
</action>
<action name="cutoff_values_table">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>${scriptName}</exec>
<argument>-t</argument>z
<argument>${hive_table}</argument>
<argument>-v</argument>
<argument>${prime_version}</argument>
<argument>-n</argument>
<argument>${hive_namespace}</argument>
<argument>-r</argument>
<argument>${report_flag}</argument>
<argument>-m</argument>
<argument>${memory}</argument>
<argument>-c</argument>
<argument>${wf:actionData('read_cutoff')['cutoff']}</argument>
<argument>-S</argument>
<argument>${deploymentPath}/data-warehouse</argument>
<argument>-f</argument>
<argument>FALSE</argument>
<argument>-l</argument>
<argument>${full_cutoff_list}</argument>
<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
<file>${scriptPath}/generateCutoffValues.sh#${scriptName}</file>
</shell>
<ok to="generateReports" />
<error to="sendFailureEmail" />
</action>
<fork name="generateReports">
<path start="generateCutoffReports"/>
<path start="generateCutoffCountryReports"/>
</fork>
<action name="generateCutoffReports">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>${cutoffScript}</exec>
<argument>${hive_namespace}</argument>
<argument>TRUE</argument>
<argument>${wf:actionData('read_cutoff')['cutoff']}</argument>
<argument>${prime_version}</argument>
<argument>${hive_table}</argument>
<argument>${deploymentPath}/data-warehouse</argument>
<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
<file>${scriptPath}/${cutoffScript}#${cutoffScript}</file>
</shell>
<ok to="joining" />
<error to="sendFailureEmail" />
</action>
<action name="generateCutoffCountryReports">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>${cutoffCountryScript}</exec>
<argument>${hive_namespace}</argument>
<argument>TRUE</argument>
<argument>${wf:actionData('read_cutoff')['cutoff']}</argument>
<argument>${prime_version}</argument>
<argument>${hive_table}</argument>
<argument>${deploymentPath}/data-warehouse</argument>
<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
<file>${scriptPath}/${cutoffCountryScript}#${cutoffCountryScript}</file>
</shell>
<ok to="joining" />
<error to="sendFailureEmail" />
</action>
<join name="joining" to="sendSuccessEmail"/>
<action name="sendSuccessEmail">
<email xmlns="uri:oozie:email-action:0.1">
<to>${failureEmailToAddress}</to>
<subject>Successfully created Filled reports :${wf:actionData('filled_elements_cutoff_report')['${wf:actionData('read_cutoff')['cutoff']}']}</subject>
<body>
Filled Element Cutoff reports created at /data/93-reporting/aspect.
</body>
</email>
<ok to="end"/>
<error to="fail"/>
</action>
<action name="sendFailureEmail">
<email xmlns="uri:oozie:email-action:0.1">
<to>${failureEmailToAddress}</to>
<subject>Unable to Run reports :${wf:actionData('filled_report')['cutoff_value']}</subject>
<body>
The workflow ${wf:name()} with id ${wf:id()} failed [${wf:errorMessage(wf:lastErrorNode())}].
</body>
</email>
<ok to="fail"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Script failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />
</workflow-app> Any help is much appreciated.
... View more
Labels:
- Labels:
-
Apache Oozie
05-24-2016
04:11 PM
My hive queries are taking long time to query the data. We have around 2 million records to which I use to supply the query and get the result after a long wait time. I was looking for an alternative and Spark came to mind at first. I was going through some hortonworks links that has illustrated to query the hive table using SparkSQL (SSQL) but that was quite generic. Here is my requirement. I have hive tables already created and I need to query them using SSQL. How best can I do that? I also like to create new hive tables using SSQL. Would the table be the same as hive table or different? If yes, in what ways are they gonna be different? Would I still be able to query the tables created by SSQL using Hive or Beeline?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
05-23-2016
11:14 AM
I heard about Zeppelin in the past and now I wish to use it. I would like to visualize my data in Hive using Zeppelin. I am using CDH though 🙂 , but I can install and configure it. I just want to know the basic steps to pull the hive tables to Zeppelin.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Zeppelin