About dineshc

dineshc · ‎07-05-2017

I am using Sandbox with HDP 2.4 (Spark 1.6) When I enter the following in spark-shell, I get the error. import org.apache.spark.streaming.twitter._ Error Message: Object twitter is not a member of org.apache.spark.streaming

dineshc · ‎07-05-2017

Alright, I will try to go through this use case and try to replicate this issue.

dineshc · ‎07-05-2017

@Hugo Felix It appears you are using Spark version lower than 2.1. In your code you have the following line: val sc =newSparkConf().setMaster("local[2]").setAppName("tweets").set("spark.cleaner.ttl","2000") spark.cleaner.ttl basically triggers a cleanup after the time ( "2000") that you specify. From the official Spark 1.6 documentation : spark.cleaner.ttl - Duration (seconds) of how long Spark will remember any metadata (stages generated, tasks generated, etc.). Periodic cleanups will ensure that metadata older than this duration will be forgotten. This is useful for running Spark for many hours / days (for example, running 24/7 in case of Spark Streaming applications). Note that any RDD that persists in memory for more than this duration will be cleared as well. Default is infinite. In your case, it is quite possible that the cleanup is being triggered even before your job finishes. Increase the value and try again. Refer this JIRA for an existing discussion to get more insight.

dineshc · ‎06-30-2017

@Viswa In Tez, there are following types of DataMovements that take place between 2 vertex and is represented via an Edge in the DAG. BROADCAST Output on this edge produced by any source task is available to all destination tasks. CUSTOM Custom routing defined by the user. ONE_TO_ONE Output on this edge produced by the i-th source task is available to the i-th destination task. SCATTER_GATHER The i-th output on this edge produced by all source tasks is available to the same destination task. To answer your question: SIMPLE_EDGE refers to data movement type - SCATTER_GATHER (example - SHUFFLE JOIN ) BROADCAST_EDGE refers to data movement type - BROADCAST (example - MAP JOIN) I drew the above inference from createEdgeProperty() in source code Hope this helps.

dineshc · ‎06-30-2017

Yes, you will still need to call createArrayOf function. The ps.setArray expects a value of type java.sql.Array, which is different from Long[]. Please consider accepting the answer if this has helped.

dineshc · ‎06-29-2017

@Bin Ye Since ed.getUSER_ROUTES() is returning long[], you must first convert it to an array of Object using ArrayUtils from apache-commons-lang jar. Add it using maven to your java project or import the jar. Download jar from here long[] vals = {1234,9876,77878}; //example of user routes long [] being returned. Long[] obj = ArrayUtils.toObject(vals); Now use the connection object's createArrayOf() method as shown below: Array arrayOfUserRoutes = connection.createArrayOf("long", obj); ps.setArray(3, arrayOfUserRoutes );

dineshc · ‎06-29-2017

Can you please share the error you are getting ?

dineshc · ‎06-29-2017

@npandey Your spark job is failing due to LinkageError, this usually happens when there is conflict between RuntimeDelegate from Jersey in yarn client libs and the copy in spark's assembly jar. At runtime, YARN call into ATS code which needs a different version of a class and cannot find it because the version in Spark and the version in YARN have a conflict. To resolve this, set below property using HiveContext: hc =new org.apache.spark.sql.hive.HiveContext(sc); hc.setConf("yarn.timeline-service.enabled","false") As always, if this answer helps you, please consider accepting it.

dineshc · ‎06-29-2017

What version of Spark, Hive and Yarn are you using ?

dineshc · ‎06-29-2017

In hive, if you do not specify the database name in your query then it will refer to the default database. The name of the default database itself is 'default'. So the query in the URL you shared : hiveContext.sql("create table yahoo_orc_table (date STRING, open_price FLOAT, high_price FLOAT, low_price FLOAT, close_price FLOAT, volume INT, adj_price FLOAT) stored as orc") This will create yahoo_orc_table under default database. If you want to create it in a specific database say 'hardikdatabase', then you must specify databasename.tablename as shown below (hardikdatabase.yahoo_orc_table): hiveContext.sql("create table hardikdatabase.yahoo_orc_table (date STRING, open_price FLOAT, high_price FLOAT, low_price FLOAT, close_price FLOAT, volume INT, adj_price FLOAT) stored as orc") This same rule applies when you want to read data from hive. You must specify the database in the same way unless it is the default database. As always, if this answer helps you, please consider accepting it.

Online	Offline
Last Visited	‎12-08-2021 02:51 PM

Member Since	‎10-04-2016 05:35 PM
Last Visited	‎12-08-2021 02:51 PM
Posts	243
Kudos received	276

Cloudera Community

Re: Hortonworks HDPCA Practice Exam V3 Task.

Re: Spark 1.6 - Dataframe read json throws org.apa...

Re: Service 'webhcat' check failed: RA080 Can't de...

Re: Unable to see HDFS metrics in Grafana

Re: Spark sort by key with descending order

Spark Streaming Twitter - Object twitter is not a ...

Re: Spark error: org.apache.spark.SparkException: ...

Re: Spark error: org.apache.spark.SparkException: ...

Re: Hive Explain plan Interpretation

Re: apache phoenix jdbctemplate PreparedStatement

Re: apache phoenix jdbctemplate PreparedStatement

Re: apache phoenix jdbctemplate PreparedStatement

Re: Spark job not able to find hive table ,though ...

Re: Spark job not able to find hive table ,though ...

Re: Loda data into hive using spark howvere how do...