About Alexraj84

Alexraj84 · ‎07-26-2016

Thank you. I am unaware of rsync. Could you also give me a brief intro on generating the hash from hive tables? Any link to its tutorial?

Alexraj84 · ‎07-26-2016

I have a request where I need to compare two hive tables and get the unique records from each table. Both the tables are in different clusters(Staging and Prod). I am using beeline to query both the table and storing the key column values into two different files for both tables and then I run some unix commands to compare the files to get the unique records. Here is the caveat!!!! It works perfect with small set of data in testing cluster. But when implemented in Staging and Prod the job hangs up and I am not able to get the result. Both the table has 50 gb of data. When I run the application using a shell script it takes a very long time and the putty screen freezes after a while, but the job finishes in the background. But still I cant see the output in the file.I have set the buffer size to 8 gb also. I can't use join operations because I have to compare two tables that are in two different clusters. Can I use spark to solve my problem? If yes, how can I do that? Or, is there some other way with which I can fix the problem that I face now?

Alexraj84 · ‎07-26-2016

No @Sindhu. I have not

Alexraj84 · ‎07-26-2016

I am executing the query `SHOW TABLE STATS sampletable;` in Hive (1.1.0) and it throws the below error. hive> show table stats sampletable; MismatchedTokenException(26!=105) at org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617) at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115) at org.apache.hadoop.hive.ql.parse.HiveParser.showStatement(HiveParser.java:21074) at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2439) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1586) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1062) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:201) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1119) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1167) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1055) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:757) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) FAILED: ParseException line 1:11 mismatched input 'stats' expecting EXTENDED near 'table' in show statement I am using CDH 5.5.2 I have checked cloudera's document on this particular hive statement and looks like I am executing the right query. Here is the link to it. Not sure what the MismatchedTockenException is. Any help is on this, please?

Alexraj84 · ‎06-30-2016

Finally, I got the answer from Stackoverflow and wanted to propagate here. There isn't seems to be a straight way to get the last table name however the answer works just using a single line of shell script including the hive query. Here is it. last_table=$(hive -e "show tables 'test_temp_table*';" | sort -r | head -n1)

Alexraj84 · ‎06-30-2016

I have a list of hive tables and want to select the last table for performing some query. Here is what I use to get the list of similar hive tables. show tables 'test_temp_table*'; It displays the below result test_temp_table_1 test_temp_table_2 test_temp_table_3 test_temp_table_4 test_temp_table_5 test_temp_table_6 I need to run some query on test_temp_table_6. I can do this using shell script by writing the output to a temp file and reading the last value from it but is there a simple way using hive query to get the last table that has the maximum number at the end?

Alexraj84 · ‎05-27-2016

I am getting the error while validating a workflow. Error: E0701: XML schema error, /d/app/workflow.xml, org.xml.sax.SAXParseException; lineNumber: 49; columnNumber: 11; cvc-complex-type.2.3: Element 'shell' cannot have character [children], because the type's content type is element-only. Here is the workflow.xml <workflow-app name="FILLED_WF" xmlns="uri:oozie:workflow:0.4"> <start to="read_cutoff"/> <action name="read_cutoff"> <shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <exec>${cutoff_script}</exec> <argument>${trigger_location}/${trigger_file}</argument> <env-var>HADOOP_USER_NAME=${wf:user()}</env-var> <file>${path}/${cutoff_script}#${cutoff_script}</file> <capture-output/> </shell> <ok to="remove_trigger_flag_file_processing"/> <error to="sendFailureEmail"/> </action> <action name="remove_trigger_flag_file_processing"> <fs> <name-node>${nameNode}</name-node> <delete path='${trigger_location}/${trigger_file}'/> </fs> <ok to="cutoff_values_table" /> <error to="sendFailureEmail" /> </action> <action name="cutoff_values_table"> <shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <exec>${scriptName}</exec> <argument>-t</argument>z <argument>${hive_table}</argument> <argument>-v</argument> <argument>${prime_version}</argument> <argument>-n</argument> <argument>${hive_namespace}</argument> <argument>-r</argument> <argument>${report_flag}</argument> <argument>-m</argument> <argument>${memory}</argument> <argument>-c</argument> <argument>${wf:actionData('read_cutoff')['cutoff']}</argument> <argument>-S</argument> <argument>${deploymentPath}/data-warehouse</argument> <argument>-f</argument> <argument>FALSE</argument> <argument>-l</argument> <argument>${full_cutoff_list}</argument> <env-var>HADOOP_USER_NAME=${wf:user()}</env-var> <file>${scriptPath}/generateCutoffValues.sh#${scriptName}</file> </shell> <ok to="generateReports" /> <error to="sendFailureEmail" /> </action> <fork name="generateReports"> <path start="generateCutoffReports"/> <path start="generateCutoffCountryReports"/> </fork> <action name="generateCutoffReports"> <shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <exec>${cutoffScript}</exec> <argument>${hive_namespace}</argument> <argument>TRUE</argument> <argument>${wf:actionData('read_cutoff')['cutoff']}</argument> <argument>${prime_version}</argument> <argument>${hive_table}</argument> <argument>${deploymentPath}/data-warehouse</argument> <env-var>HADOOP_USER_NAME=${wf:user()}</env-var> <file>${scriptPath}/${cutoffScript}#${cutoffScript}</file> </shell> <ok to="joining" /> <error to="sendFailureEmail" /> </action> <action name="generateCutoffCountryReports"> <shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <exec>${cutoffCountryScript}</exec> <argument>${hive_namespace}</argument> <argument>TRUE</argument> <argument>${wf:actionData('read_cutoff')['cutoff']}</argument> <argument>${prime_version}</argument> <argument>${hive_table}</argument> <argument>${deploymentPath}/data-warehouse</argument> <env-var>HADOOP_USER_NAME=${wf:user()}</env-var> <file>${scriptPath}/${cutoffCountryScript}#${cutoffCountryScript}</file> </shell> <ok to="joining" /> <error to="sendFailureEmail" /> </action> <join name="joining" to="sendSuccessEmail"/> <action name="sendSuccessEmail"> <email xmlns="uri:oozie:email-action:0.1"> <to>${failureEmailToAddress}</to> <subject>Successfully created Filled reports :${wf:actionData('filled_elements_cutoff_report')['${wf:actionData('read_cutoff')['cutoff']}']}</subject> <body> Filled Element Cutoff reports created at /data/93-reporting/aspect. </body> </email> <ok to="end"/> <error to="fail"/> </action> <action name="sendFailureEmail"> <email xmlns="uri:oozie:email-action:0.1"> <to>${failureEmailToAddress}</to> <subject>Unable to Run reports :${wf:actionData('filled_report')['cutoff_value']}</subject> <body> The workflow ${wf:name()} with id ${wf:id()} failed [${wf:errorMessage(wf:lastErrorNode())}]. </body> </email> <ok to="fail"/> <error to="fail"/> </action> <kill name="fail"> <message>Script failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name='end' /> </workflow-app> Any help is much appreciated.

Alexraj84 · ‎05-24-2016

My hive queries are taking long time to query the data. We have around 2 million records to which I use to supply the query and get the result after a long wait time. I was looking for an alternative and Spark came to mind at first. I was going through some hortonworks links that has illustrated to query the hive table using SparkSQL (SSQL) but that was quite generic. Here is my requirement. I have hive tables already created and I need to query them using SSQL. How best can I do that? I also like to create new hive tables using SSQL. Would the table be the same as hive table or different? If yes, in what ways are they gonna be different? Would I still be able to query the tables created by SSQL using Hive or Beeline?

Alexraj84 · ‎05-23-2016

Thank you @Lester Martin. I am gonna try it in my VM.

Alexraj84 · ‎05-23-2016

I heard about Zeppelin in the past and now I wish to use it. I would like to visualize my data in Hive using Zeppelin. I am using CDH though 🙂 , but I can install and configure it. I just want to know the basic steps to pull the hive tables to Zeppelin.

Online	Offline
Last Visited	‎06-01-2016 10:16 AM

Member Since	‎05-31-2016 11:52 AM
Last Visited	‎06-01-2016 10:16 AM
Posts	89
Kudos received	14

Cloudera Community

Re: Oozie-- java: command not found - shell action

Re: FAILED: SemanticException [Error 10072]: Datab...

Re: How to select the last table from a list of hi...

Re: Warning message in Hive output after upgrading...

Re: How to check the namenode status?

Re: Comparing hive tables with Spark.

Comparing hive tables with Spark.

Re: SHOW TABLE STATS Statement throws error in Hiv...

SHOW TABLE STATS Statement throws error in Hive 1....

Re: How to select the last table from a list of hi...

How to select the last table from a list of hive t...

Error while validating Oozie workflow?

Is SparkSQL faster than Hive or Beeline?

Re: Apache Zeppelin with Hive

Apache Zeppelin with Hive