Member since
05-31-2016
89
Posts
14
Kudos Received
8
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2551 | 03-10-2017 07:05 AM | |
3955 | 03-07-2017 09:58 AM | |
2156 | 06-30-2016 12:13 PM | |
3857 | 05-20-2016 09:15 AM | |
20374 | 05-17-2016 02:09 PM |
06-20-2017
06:13 PM
Perfect..saved my life!!! For cloudera user's like me, if you are using quickstart vm then issue the below command to get it working cd <zeppelin_dir> ./bin/zeppelin-daemon.sh stop sudo chown -R cloudera:cloudera webapps ./bin/zeppelin-daemon.sh start
... View more
05-29-2017
11:58 AM
If I create a plain target table would I be able to alter the table later? What I mean is, create a plain table and then alter the table to add the buckets? If yes, how?
... View more
05-29-2017
07:42 AM
I am trying to create a table with Bucketing from another table using select but it fails. Here is my query create table tmp CLUSTERED BY (key) INTO 256 BUCKETS as select * from link_table limit 10; And I get the below error FAILED: SemanticException [Error 10068]: CREATE-TABLE-AS-SELECT does not support partitioning in the target table link_table is already bucketed. I have also set the property to enforce bucketing. I am not sure if bucketing is supported with CTAS. Is there a way I can get this working?
... View more
Labels:
- Labels:
-
Apache Hive
03-10-2017
07:05 AM
"Issue Fixed" I talked with my DevOps later and found that the classpath for Java was not set in few datanodes in the Cluster. This was stopping the shell action to invoke the JVM at those datanodes. After fixing the Classpath, the job ran successfully
... View more
03-09-2017
03:36 PM
That is carriage return from the log while copying the content. I have created the JAVA_PATH accordingly.
... View more
03-09-2017
02:52 PM
I got the below error after those changes mentioned by you. <a href="http://10.241.1.164:8888/filebrowser/view=/opt/cloudera/parcels/CDH-5.5.2-1.cdh5.5.2.p0.4/lib/hadoop-hdfs/bin/hdfs%3A">/opt/cloudera/parcels/CDH-5.5.2-1.cdh5.5.2.p0.4/bin/../lib/hadoop-hdfs/bin/hdfs:</a> line 309: <a href="http://10.241.1.164:8888/filebrowser/view=/usr/java/jdk1.7.0_67/bin/java%3A">/usr/java/jdk1.7.0_67/bin/java:</a> No such file or directory
/opt/cloudera/parcels/CDH-5.5.2-1.cdh5.5.2.p0.4/bin/../lib/hadoop-hdfs/bin/hdfs: line 309: exec: <a href="http://10.241.1.164:8888/filebrowser/view=/usr/java/jdk1.7.0_67/bin/java%3A">/usr/java/jdk1.7.0_67/bin/java:</a> cannot execute: No such file or directory
./test.sh: line 30: <a href="http://10.241.1.164:8888/filebrowser/view=/usr/java/jdk1.7.0_67/bin/java%3A">/usr/java/jdk1.7.0_67/bin/java:</a> No such file or directory
... View more
03-09-2017
01:52 PM
I am running a java program from a shell script through Oozie and I get the below error java: command not found
When I run the shell script from the edge node I do not find any issues and the java class runs without any error and I get the desired output also. However it is the oozie job that fails to run the java command. All other actions in oozie are executed porperly but when it encounters the java line, it throws the afore said error. I understand that all the nodes in the Hadoop cluster will have Java installed, then why do I get this error? Below is the java command that I have in my shell script ...
...
java -cp $LOCAL_DIR/libs/integration-tools.jar com.audit.reporting.GenerateExcelReport $LOCAL_DIR/input.txt $LOCAL_DIR/
... Please provide your thoughts.
... View more
Labels:
- Labels:
-
Apache Oozie
03-07-2017
09:58 AM
After a tireless research on the internet I was able to crack the solution for the issue.
I have added a configuration to use the metastore server for the Hive job and it worked.
Here is what I did to the Hive action. ....
<hive xmlns='uri:oozie:hive-action:0.2'>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://10.155.1.63:9083</value>
</property>
</configuration>
<script>${dir}/gsrlQery.hql</script>
<param>OutputDir=${jobOutput}</param>
</hive>
.... Note: replace the hive metatore ip accordingly if you are trying to fix a similar problem. To get the metastore details check the hive-site.xml file located in /etc/hive/conf dir. Credit: MapR
... View more
03-07-2017
08:33 AM
I get an error while running an Oozie workflow with Hive queries. Here is the workflow <workflow-app xmlns='uri:oozie:workflow:0.5' name='reporting_W_errorAuditHiveQueryExe'>
<start to="hive_report_fork"/>
<fork name="hive_report_fork">
<path start="hiveGSRLfile"/>
<path start="hiveNGSRLfile"/>
<path start="hiveNGsrlRAfile"/>
</fork>
<action name="hiveGSRLfile">
<hive xmlns='uri:oozie:hive-action:0.2'>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<script>${dir}/gsrlQery.hql</script>
<param>OutputDir=${jobOutput}</param>
</hive>
<ok to="joining"/>
<error to="joining"/>
</action>
<action name="hiveNGSRLfile">
<hive xmlns='uri:oozie:hive-action:0.2'>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<script>${dir}/nongsrlQuery.hql</script>
<param>OutputDir=${jobOutput}</param>
</hive>
<ok to="joining"/>
<error to="joining"/>
</action>
<action name="hiveNGsrlRAfile">
<hive xmlns='uri:oozie:hive-action:0.2'>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<script>${dir}/nongsrlRAQuery.hql</script>
<param>OutputDir=${jobOutput}</param>
</hive>
<ok to="joining"/>
<error to="joining"/>
</action>
<join name= "joining" to="Success"/>
<action name="Success">
<email xmlns="uri:oozie:email-action:0.1">
<to>${failureEmailToAddress}</to>
<subject>Success</subject>
<body>
The workflow ${wf:name()} with id ${wf:id()} failed
[${wf:errorMessage(wf:lastErrorNode())}].
</body>
</email>
<ok to="end" />
<error to="fail" />
</action>
<action name="failure">
<email xmlns="uri:oozie:email-action:0.1">
<to>${failureEmailToAddress}</to>
<subject>Failure</subject>
<body>
The workflow ${wf:name()} with id ${wf:id()} failed
[${wf:errorMessage(wf:lastErrorNode())}].
</body>
</email>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Workflow failed</message>
</kill>
<end name="end"/>
</workflow-app>
And here is the oozie properties file oozie.wf.application.path=${deploymentPath}/workflows/errorAuditHiveQueryExe.xml
deploymentPath=/user/amin/deploy_178
jobTracker=localhost:8032
nameNode=hdfs://nameservice1
dir=${deploymentPath}/data-warehouse/temp
failureEmailToAddress=amin@dnb.com
jobOutput=${dir}
oozie.use.system.libpath=true Here is the error I get: FAILED: SemanticException [Error 10072]: Database does not exist: testnamespace
Intercepting System.exit(10072)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10072] However the namespace exist and I can query the tables inside it. What could be wrong here? Please help.
... View more
Labels:
- Labels:
-
Apache Oozie
02-10-2017
01:07 PM
I have a HBase table that has data in Json format. I have a peculiar requirement where I need to add a new attribute to the existing one. Here is a column that I have queried from the HBase table COLUMN CELL
c:601 timestamp=1, value={"value": "000000005295", "id": 601, "name": "ID", "type": "ATOMIC", "hash": "4
998c929c977272828456cd83aa0fc19635e674a61a67c96e6ccd226d3296db7", "attrs": {"timestamp": "2016-11-30T0919
27Z", "code": "805", "uuid": "c6d82240-87fd-469a-86a7-e8b1e5d1da4b", "source": "online", "appV
ersion": "1.14.40.7", "cutoff": "1", "exportedAt": "20161020T144804Z"}, "globalAttrs": {"source": "online
", "timestamp": 1480497567000, "exportedAt": 1476974884000, "countryCode": 805, "appVersion": "1.14.4
0.7", "uuid": "c6d82240-87fd-469a-86a7-e8b1e5d1da4b", "index": null, "itemId": null, "publish": true}}
For each column we have attributes denoted as attrs . Currently there are seven attributes and I want to add a new attribute called delete with value as yes . After altering the column it should like the below one. COLUMN CELL
c:601 timestamp=1, value={"value": "000000005295", "id": 601, "name": "ID", "type": "ATOMIC", "hash": "4
998c929c977272828456cd83aa0fc19635e674a61a67c96e6ccd226d3296db7", "attrs": {"timestamp": "2016-11-30T0919
27Z", "code": "805", "uuid": "c6d82240-87fd-469a-86a7-e8b1e5d1da4b", "source": "online", "appV
ersion": "1.14.40.7", "cutoff": "1", "exportedAt": "20161020T144804Z", "delete" : "yes"}, "globalAttrs": {"source": "online
", "timestamp": 1480497567000, "exportedAt": 1476974884000, "countryCode": 805, "appVersion": "1.14.4 0.7", "uuid": "c6d82240-87fd-469a-86a7-e8b1e5d1da4b", "index": null, "itemId": null, "publish": true}}
... View more
- Tags:
- Data Processing
- HBase
Labels:
- Labels:
-
Apache HBase
02-07-2017
06:51 AM
I am not sure about Hue but from the terminal it can be fixed by exporting the correct oozie server. Use this command to export the oozie url. export OOZIE_URL=http://someip:11000/oozie To get this oozie url you need to use hue to connect you cluster and navigate to Workflows where you can find a tab called oozie . Inside this you should see gauges where a lot of properties will be listed. Look for the property oozie.servers.
... View more
09-13-2016
06:57 PM
Thanks for your reply however I wanted to run it in a cluster directly and not in local mode.
... View more
09-12-2016
06:38 PM
1 Kudo
I am using Eclipse to build spark applications and every time I need to export the jar and run it from the shell to test the application. I am using a VM running CDH5.5.2 quick start vm in it. I have my eclipse installed in my windows (Host) and I create spark applications which is then exported as Jar file from Eclipse and copied over to Linux(Guest) and then, I run the spark application using spark-submit. This is very annoying sometimes because if you miss something in your program and the build was successful, the application will fail to execute and I need to fix the code and again export the Jar to run and so on. I am wondering if there is a much simpler way to run the job right from eclipse(Please note that I don't want to run spark in local mode) where the input file will be in HDFS? Is this a better way of doing? What are the Industry standards that are followed to develop. test and deploying spark applications in Production?
... View more
Labels:
- Labels:
-
Apache Spark
09-08-2016
11:05 AM
1 Kudo
We have HBase tables where the data is in in Binary Avro format. To query the HBase tables easily, everytime we are creating Hive Tables and then query it, which is a tedious process as the tables are taking a long time for creation and also AdHoc tasks goes for a toss. As Phoenix or Drill can be a best alternative to Hive, a question arouse in me, whether they will support the Avro file format. Will Phoenix or Drill make it in my case?
... View more
Labels:
- Labels:
-
Apache Phoenix
08-08-2016
09:42 AM
Hi Amit, I am using 1.6.0 that is installed in quick start vm from CDH 5.5.7
... View more
08-05-2016
06:49 PM
Great, that fixes the problem but another arises. scala> sqlContext.createDataFrame(sc.textFile("/user/cloudera/data/fruit_fixedwidth.txt").map { x => getRow(x) }, schema)
<console>:31: error: package schema is not a value
sqlContext.createDataFrame(sc.textFile("/user/cloudera/data/fruit_fixedwidth.txt").map { x => getRow(x) }, schema)
^ I am really getting excited now. What is the schema all about in this context?
... View more
08-05-2016
05:33 PM
Thanks Arun however I have a problem while creating getRow function. Not sure what exactly does it refers to. Here is the error <console>:26: error: not found: type Row
def getRow(x : String) : Row={
^
<console>:32: error: not found: value Row
Row.fromSeq(columnArray)
... View more
08-04-2016
04:51 PM
1 Kudo
I have a fixed length file ( a sample is shown below) and I want to read this file using DataFrames API in Spark(1.6.0). 56 apple TRUE 0.56
45 pear FALSE1.34
34 raspberry TRUE 2.43
34 plum TRUE 1.31
53 cherry TRUE 1.4
23 orange FALSE2.34
56 persimmon FALSE23.2 The fixed width of each columns are 3, 10, 5, 4 Please suggest your opinion.
... View more
Labels:
- Labels:
-
Apache Spark
07-27-2016
05:24 AM
Thanks for the reply. Let me try this will come back with a reply.
... View more
07-27-2016
05:04 AM
Hi @Sindhu, thanks for your followup. I was able to get the stats using the below query. Thank you again for your effort. analyze table sampletable partition (year) compute statistics noscan;
... View more
07-26-2016
05:17 PM
Thank you. I am unaware of rsync. Could you also give me a brief intro on generating the hash from hive tables? Any link to its tutorial?
... View more
07-26-2016
03:40 PM
I have a request where I need to compare two hive tables and get the unique records from each table. Both the tables are in different clusters(Staging and Prod). I am using beeline to query both the table and storing the key column values into two different files for both tables and then I run some unix commands to compare the files to get the unique records. Here is the caveat!!!! It works perfect with small set of data in testing cluster. But when implemented in Staging and Prod the job hangs up and I am not able to get the result. Both the table has 50 gb of data. When I run the application using a shell script it takes a very long time and the putty screen freezes after a while, but the job finishes in the background. But still I cant see the output in the file.I have set the buffer size to 8 gb also. I can't use join operations because I have to compare two tables that are in two different clusters. Can I use spark to solve my problem? If yes, how can I do that? Or, is there some other way with which I can fix the problem that I face now?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
07-26-2016
08:54 AM
No @Sindhu. I have not
... View more
07-26-2016
07:08 AM
I am executing the query `SHOW TABLE STATS sampletable;` in Hive (1.1.0) and it throws the below error. hive> show table stats sampletable;
MismatchedTokenException(26!=105)
at org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
at org.apache.hadoop.hive.ql.parse.HiveParser.showStatement(HiveParser.java:21074)
at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2439)
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1586)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1062)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:201)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1119)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1167)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1055)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:757)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: ParseException line 1:11 mismatched input 'stats' expecting EXTENDED near 'table' in show statement I am using CDH 5.5.2 I have checked cloudera's document on this particular hive statement and looks like I am executing the right query. Here is the link to it. Not sure what the MismatchedTockenException is. Any help is on this, please?
... View more
Labels:
- Labels:
-
Apache Hive
06-30-2016
12:13 PM
Finally, I got the answer from Stackoverflow and wanted to propagate here. There isn't seems to be a straight way to get the last table name however the answer works just using a single line of shell script including the hive query. Here is it. last_table=$(hive -e "show tables 'test_temp_table*';" | sort -r | head -n1)
... View more
06-30-2016
06:15 AM
I have a list of hive tables and want to select the last table for performing some query. Here is what I use to get the list of similar hive tables. show tables 'test_temp_table*';
It displays the below result test_temp_table_1
test_temp_table_2
test_temp_table_3
test_temp_table_4
test_temp_table_5
test_temp_table_6
I need to run some query on test_temp_table_6. I can do this using shell script by writing the output to a temp file and reading the last value from it but is there a simple way using hive query to get the last table that has the maximum number at the end?
... View more
- Tags:
- Data Processing
- Hive
Labels:
- Labels:
-
Apache Hive
05-27-2016
11:08 AM
1 Kudo
I am getting the error while validating a workflow. Error: E0701: XML schema error, /d/app/workflow.xml, org.xml.sax.SAXParseException; lineNumber: 49; columnNumber: 11; cvc-complex-type.2.3: Element 'shell' cannot have character [children], because the type's content type is element-only. Here is the workflow.xml <workflow-app name="FILLED_WF" xmlns="uri:oozie:workflow:0.4">
<start to="read_cutoff"/>
<action name="read_cutoff">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>${cutoff_script}</exec>
<argument>${trigger_location}/${trigger_file}</argument>
<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
<file>${path}/${cutoff_script}#${cutoff_script}</file>
<capture-output/>
</shell>
<ok to="remove_trigger_flag_file_processing"/>
<error to="sendFailureEmail"/>
</action>
<action name="remove_trigger_flag_file_processing">
<fs>
<name-node>${nameNode}</name-node>
<delete path='${trigger_location}/${trigger_file}'/>
</fs>
<ok to="cutoff_values_table" />
<error to="sendFailureEmail" />
</action>
<action name="cutoff_values_table">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>${scriptName}</exec>
<argument>-t</argument>z
<argument>${hive_table}</argument>
<argument>-v</argument>
<argument>${prime_version}</argument>
<argument>-n</argument>
<argument>${hive_namespace}</argument>
<argument>-r</argument>
<argument>${report_flag}</argument>
<argument>-m</argument>
<argument>${memory}</argument>
<argument>-c</argument>
<argument>${wf:actionData('read_cutoff')['cutoff']}</argument>
<argument>-S</argument>
<argument>${deploymentPath}/data-warehouse</argument>
<argument>-f</argument>
<argument>FALSE</argument>
<argument>-l</argument>
<argument>${full_cutoff_list}</argument>
<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
<file>${scriptPath}/generateCutoffValues.sh#${scriptName}</file>
</shell>
<ok to="generateReports" />
<error to="sendFailureEmail" />
</action>
<fork name="generateReports">
<path start="generateCutoffReports"/>
<path start="generateCutoffCountryReports"/>
</fork>
<action name="generateCutoffReports">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>${cutoffScript}</exec>
<argument>${hive_namespace}</argument>
<argument>TRUE</argument>
<argument>${wf:actionData('read_cutoff')['cutoff']}</argument>
<argument>${prime_version}</argument>
<argument>${hive_table}</argument>
<argument>${deploymentPath}/data-warehouse</argument>
<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
<file>${scriptPath}/${cutoffScript}#${cutoffScript}</file>
</shell>
<ok to="joining" />
<error to="sendFailureEmail" />
</action>
<action name="generateCutoffCountryReports">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>${cutoffCountryScript}</exec>
<argument>${hive_namespace}</argument>
<argument>TRUE</argument>
<argument>${wf:actionData('read_cutoff')['cutoff']}</argument>
<argument>${prime_version}</argument>
<argument>${hive_table}</argument>
<argument>${deploymentPath}/data-warehouse</argument>
<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
<file>${scriptPath}/${cutoffCountryScript}#${cutoffCountryScript}</file>
</shell>
<ok to="joining" />
<error to="sendFailureEmail" />
</action>
<join name="joining" to="sendSuccessEmail"/>
<action name="sendSuccessEmail">
<email xmlns="uri:oozie:email-action:0.1">
<to>${failureEmailToAddress}</to>
<subject>Successfully created Filled reports :${wf:actionData('filled_elements_cutoff_report')['${wf:actionData('read_cutoff')['cutoff']}']}</subject>
<body>
Filled Element Cutoff reports created at /data/93-reporting/aspect.
</body>
</email>
<ok to="end"/>
<error to="fail"/>
</action>
<action name="sendFailureEmail">
<email xmlns="uri:oozie:email-action:0.1">
<to>${failureEmailToAddress}</to>
<subject>Unable to Run reports :${wf:actionData('filled_report')['cutoff_value']}</subject>
<body>
The workflow ${wf:name()} with id ${wf:id()} failed [${wf:errorMessage(wf:lastErrorNode())}].
</body>
</email>
<ok to="fail"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Script failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />
</workflow-app> Any help is much appreciated.
... View more
- Tags:
- Data Processing
- Oozie
Labels:
- Labels:
-
Apache Oozie
05-24-2016
04:11 PM
My hive queries are taking long time to query the data. We have around 2 million records to which I use to supply the query and get the result after a long wait time. I was looking for an alternative and Spark came to mind at first. I was going through some hortonworks links that has illustrated to query the hive table using SparkSQL (SSQL) but that was quite generic. Here is my requirement. I have hive tables already created and I need to query them using SSQL. How best can I do that? I also like to create new hive tables using SSQL. Would the table be the same as hive table or different? If yes, in what ways are they gonna be different? Would I still be able to query the tables created by SSQL using Hive or Beeline?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark