Member since
05-31-2016
89
Posts
14
Kudos Received
8
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4075 | 03-10-2017 07:05 AM | |
5960 | 03-07-2017 09:58 AM | |
3527 | 06-30-2016 12:13 PM | |
5800 | 05-20-2016 09:15 AM | |
27285 | 05-17-2016 02:09 PM |
02-07-2017
06:51 AM
I am not sure about Hue but from the terminal it can be fixed by exporting the correct oozie server. Use this command to export the oozie url. export OOZIE_URL=http://someip:11000/oozie To get this oozie url you need to use hue to connect you cluster and navigate to Workflows where you can find a tab called oozie . Inside this you should see gauges where a lot of properties will be listed. Look for the property oozie.servers.
... View more
09-13-2016
06:57 PM
Thanks for your reply however I wanted to run it in a cluster directly and not in local mode.
... View more
09-12-2016
06:38 PM
1 Kudo
I am using Eclipse to build spark applications and every time I need to export the jar and run it from the shell to test the application. I am using a VM running CDH5.5.2 quick start vm in it. I have my eclipse installed in my windows (Host) and I create spark applications which is then exported as Jar file from Eclipse and copied over to Linux(Guest) and then, I run the spark application using spark-submit. This is very annoying sometimes because if you miss something in your program and the build was successful, the application will fail to execute and I need to fix the code and again export the Jar to run and so on. I am wondering if there is a much simpler way to run the job right from eclipse(Please note that I don't want to run spark in local mode) where the input file will be in HDFS? Is this a better way of doing? What are the Industry standards that are followed to develop. test and deploying spark applications in Production?
... View more
Labels:
- Labels:
-
Apache Spark
09-08-2016
11:05 AM
1 Kudo
We have HBase tables where the data is in in Binary Avro format. To query the HBase tables easily, everytime we are creating Hive Tables and then query it, which is a tedious process as the tables are taking a long time for creation and also AdHoc tasks goes for a toss. As Phoenix or Drill can be a best alternative to Hive, a question arouse in me, whether they will support the Avro file format. Will Phoenix or Drill make it in my case?
... View more
Labels:
- Labels:
-
Apache Phoenix
08-08-2016
09:42 AM
Hi Amit, I am using 1.6.0 that is installed in quick start vm from CDH 5.5.7
... View more
08-05-2016
06:49 PM
Great, that fixes the problem but another arises. scala> sqlContext.createDataFrame(sc.textFile("/user/cloudera/data/fruit_fixedwidth.txt").map { x => getRow(x) }, schema)
<console>:31: error: package schema is not a value
sqlContext.createDataFrame(sc.textFile("/user/cloudera/data/fruit_fixedwidth.txt").map { x => getRow(x) }, schema)
^ I am really getting excited now. What is the schema all about in this context?
... View more
08-05-2016
05:33 PM
Thanks Arun however I have a problem while creating getRow function. Not sure what exactly does it refers to. Here is the error <console>:26: error: not found: type Row
def getRow(x : String) : Row={
^
<console>:32: error: not found: value Row
Row.fromSeq(columnArray)
... View more
08-04-2016
04:51 PM
1 Kudo
I have a fixed length file ( a sample is shown below) and I want to read this file using DataFrames API in Spark(1.6.0). 56 apple TRUE 0.56
45 pear FALSE1.34
34 raspberry TRUE 2.43
34 plum TRUE 1.31
53 cherry TRUE 1.4
23 orange FALSE2.34
56 persimmon FALSE23.2 The fixed width of each columns are 3, 10, 5, 4 Please suggest your opinion.
... View more
Labels:
- Labels:
-
Apache Spark
07-27-2016
05:24 AM
Thanks for the reply. Let me try this will come back with a reply.
... View more
07-27-2016
05:04 AM
Hi @Sindhu, thanks for your followup. I was able to get the stats using the below query. Thank you again for your effort. analyze table sampletable partition (year) compute statistics noscan;
... View more