About jyadav

jyadav · ‎06-08-2016

AFAIK. We don't support this kind of installation with HDP. For logs location, yes we can change the log directory for each component in config.

jyadav · ‎06-08-2016

@Smart Solutions I have seen application_* files inside /spark-history directory but don't know from where you got ".a5555e556-3301-433e-44de-23311665ed. Can you check the content of this file/dir? Also what if you move this file/dir from other location and restart the spark history server?

jyadav · ‎06-08-2016

@Banana Joe It may be related to the resources in VM, please see the doc for resource requirement. http://hortonworks.com/wp-content/uploads/2016/02/Import_on_Vbox_3_1_2016.pdf RAM - "At least 8 GB of RAM (The more, the better) If you wish to enable services such as Ambari, HBase, Storm, Kafka, or Spark please ensure you have at least 10 Gb of physical RAM in order to run the VM using 8 GB".

jyadav · ‎06-08-2016

@Hamza FRIOUA Best option would be using Mongo hadoop connector with hive external tables but you need to built that jar manually or use prebuilt. https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage CREATE TABLE individuals ( id INT, name STRING, age INT, work STRUCT<title:STRING, hours:INT> ) STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler' WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id","work.title":"job.position"}') TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/test.persons');

jyadav · ‎06-08-2016

@Benjamin Leonhardi how --files is differ from SparkContext.addFile() apart from the way we use them?

jyadav · ‎06-08-2016

Took 65944ms to send a batch of 1 edits (205 bytes) to remote journal 192.168.1.47:8485 2016-06-02 You either have a serious network problem b/w nodes or may be underlining disk write is very slow.

jyadav · ‎06-08-2016

@clukasik I don't see any performance issue if running it on yarn-client mode however as per initial info they needs to use distributed cache kind of thing in spark, which they can achieve through SparkContext.addFile()

jyadav · ‎06-08-2016

@Eric Periard Technically two NN can't be at same status if its happening then either you have configuration issues or hitting some bug.

jyadav · ‎06-08-2016

@akeezhadath Kindly use below API to cache the file on all the nodes. SparkContext.addFile() Add a file to be downloaded with this Spark job on every node. The path passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark jobs, use SparkFiles.get(fileName) to find its download location. A directory can be given if the recursive option is set to true. Currently directories are only supported for Hadoop-supported filesystems.

jyadav · ‎06-08-2016

@akeezhadath You can place the file on HDFS and access the file through "hdfs:///path/file".

Online	Offline
Last Visited	‎06-02-2017 09:42 PM

Member Since	‎02-02-2016 09:29 AM
Last Visited	‎06-02-2017 09:42 PM
Posts	583
Kudos received	518

Cloudera Community

Re: Ambari release versioning

Re: Atlas application log showing below error

Re: failed to start hive from root

Re: corrupted block issue..i have 100+ corrupted b...

Re: What's causing ClassNotFound: RangerHiveAuthor...

Re: Installing Ambari and deploying components in ...

Re: Seeing consider setting spark.io.compression.c...

Re: HDP 2.4, virtualbox and vmware timeout errors

Re: How to import data from MongoDB to Hive or Hba...

Re: Loading Local File to Apache Spark

Re: Name Node instability : flush failed for requi...

Re: Loading Local File to Apache Spark

Re: NameNode HA - Active / Standby Switch - Force ...

Re: Loading Local File to Apache Spark

Re: Loading Local File to Apache Spark