Member since
10-04-2016
243
Posts
281
Kudos Received
43
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1173 | 01-16-2018 03:38 PM | |
6141 | 11-13-2017 05:45 PM | |
3035 | 11-13-2017 12:30 AM | |
1520 | 10-27-2017 03:58 AM | |
28431 | 10-19-2017 03:17 AM |
07-05-2017
05:57 PM
1 Kudo
I am using Sandbox with HDP 2.4 (Spark 1.6) When I enter the following in spark-shell, I get the error. import org.apache.spark.streaming.twitter._ Error Message: Object twitter is not a member of org.apache.spark.streaming
... View more
Labels:
- Labels:
-
Apache Spark
07-05-2017
03:49 PM
Alright, I will try to go through this use case and try to replicate this issue.
... View more
07-05-2017
03:11 PM
@Hugo Felix It appears you are using Spark version lower than 2.1. In your code you have the following line: val sc =newSparkConf().setMaster("local[2]").setAppName("tweets").set("spark.cleaner.ttl","2000") spark.cleaner.ttl basically triggers a cleanup after the time ( "2000") that you specify. From the official Spark 1.6 documentation : spark.cleaner.ttl - Duration (seconds) of how long Spark will remember any metadata (stages generated, tasks generated, etc.). Periodic cleanups will ensure that metadata older than this duration will be forgotten. This is useful for running Spark for many hours / days (for example, running 24/7 in case of Spark Streaming applications). Note that any RDD that persists in memory for more than this duration will be cleared as well. Default is infinite. In your case, it is quite possible that the cleanup is being triggered even before your job finishes. Increase the value and try again. Refer this JIRA for an existing discussion to get more insight.
... View more
06-30-2017
08:20 PM
2 Kudos
@Viswa In Tez, there are following types of DataMovements that take place between 2 vertex and is represented via an Edge in the DAG. BROADCAST Output on this edge produced by any source task is available to all destination tasks.
CUSTOM Custom routing defined by the user.
ONE_TO_ONE Output on this edge produced by the i-th source task is available to the i-th destination task.
SCATTER_GATHER The i-th output on this edge produced by all source tasks is available to the same destination task. To answer your question:
SIMPLE_EDGE refers to data movement type - SCATTER_GATHER (example - SHUFFLE JOIN )
BROADCAST_EDGE refers to data movement type - BROADCAST (example - MAP JOIN) I drew the above inference from createEdgeProperty() in source code Hope this helps.
... View more
06-30-2017
03:42 PM
1 Kudo
Yes, you will still need to call createArrayOf function. The ps.setArray expects a value of type java.sql.Array, which is different from Long[].
Please consider accepting the answer if this has helped.
... View more
06-29-2017
06:30 PM
3 Kudos
@Bin Ye Since ed.getUSER_ROUTES() is returning long[], you must first convert it to an array of Object using ArrayUtils from apache-commons-lang jar. Add it using maven to your java project or import the jar. Download jar from here long[] vals = {1234,9876,77878}; //example of user routes long [] being returned.
Long[] obj = ArrayUtils.toObject(vals); Now use the connection object's createArrayOf() method as shown below: Array arrayOfUserRoutes = connection.createArrayOf("long", obj);
ps.setArray(3, arrayOfUserRoutes );
... View more
06-29-2017
05:59 PM
Can you please share the error you are getting ?
... View more
06-29-2017
02:47 PM
2 Kudos
@npandey Your spark job is failing due to LinkageError, this usually happens when there is conflict between RuntimeDelegate from Jersey in yarn client libs and the copy in spark's assembly jar. At runtime, YARN call into ATS code which needs a different version of a class and cannot find it because the version in Spark and the version in YARN have a conflict. To resolve this, set below property using HiveContext: hc =new org.apache.spark.sql.hive.HiveContext(sc);
hc.setConf("yarn.timeline-service.enabled","false") As always, if this answer helps you, please consider accepting it.
... View more
06-29-2017
02:42 PM
What version of Spark, Hive and Yarn are you using ?
... View more
06-29-2017
02:33 PM
3 Kudos
In hive, if you do not specify the database name in your query then it will refer to the default database. The name of the default database itself is 'default'. So the query in the URL you shared : hiveContext.sql("create table yahoo_orc_table (date STRING, open_price FLOAT, high_price FLOAT, low_price FLOAT, close_price FLOAT, volume INT, adj_price FLOAT) stored as orc") This will create yahoo_orc_table under default database. If you want to create it in a specific database say 'hardikdatabase', then you must specify databasename.tablename as shown below (hardikdatabase.yahoo_orc_table): hiveContext.sql("create table hardikdatabase.yahoo_orc_table (date STRING, open_price FLOAT, high_price FLOAT, low_price FLOAT, close_price FLOAT, volume INT, adj_price FLOAT) stored as orc") This same rule applies when you want to read data from hive. You must specify the database in the same way unless it is the default database. As always, if this answer helps you, please consider accepting it.
... View more