About ed_day

ed_day · ‎07-04-2017

Can someone explain to me what I need to do to get Stanford CoreNLP wrapper for Apache Spark to work in Zeppelin/Spark please? I have done this: %spark.dep z.reset() // clean up previously added artifact and repository // add artifact recursively z.load("databricks:spark-corenlp:0.2.0-s_2.10") and this: import com.databricks.spark.corenlp.functions._ val dfLemmas= filteredDF.withColumn("lemmas", lemmas('noURL)).select("racist", "filtered","noURL", "lemmas") dfLemmas.show(20, false) but I get this <console>:42: error: not found: value lemmas val dfLemmas= filteredDF.withColumn("lemmas", lemmas('noURL)).select("racist", "filtered","noURL", "lemmas") Do I have to download the files and build them or something? If so how do I do that? Or is there an easier way? TIA!!!!

ed_day · ‎06-27-2017

Yes fair enough. Your answer now accepted, thanks for your help.

ed_day · ‎06-27-2017

I cretaed the folder under /user/admin for which I have permissions. Makes sense I suppose.

ed_day · ‎06-27-2017

Thanks Jay, but I was trying to add the folder, not write into it. I was adding it to the root folder /.

ed_day · ‎06-26-2017

Hi, So I log in to Ambari as admin, then I try to add a folder in files view: Permission denied: user=admin, access=WRITE, inode="/ml-in-a-nutshell":hdfs:hdfs:drwxr-xr-x I am accessing Ambari via Chrome under another user, 'ed'. So should I be logging into Ambari as hdfs? Or maybe changing admin permissions? But if I login as hdfs then will it see my existing cluster and what would be the passowrd? This user thing is quite confusing.

ed_day · ‎06-26-2017

Yes it did, thanks!

ed_day · ‎06-21-2017

Hi I tried moving my data to a different directory (/data/hdfs/data) by adding the new directory to datanode dir in HDP configs and then copying the data, but I get this error: 2017-06-21 15:29:53,432 ERROR impl.FsDatasetImpl (FsDatasetImpl.java:activateVolume(398)) - Found duplicated storage UUID: DS-011fd6ee-105d-4c21-ba03-8f43bc75f0b2 in /data/hdfs/data/current/VERSION. 2017-06-21 15:29:53,432 ERROR datanode.DataNode (BPServiceActor.java:run(772)) - Initialization failed for Block pool <registering> (Datanode Uuid 18224fd5-7fbe-4700-b22b-64352741f4a7) service to master.royble.co.uk/192.168.1.1:8020. Exiting. java.io.IOException: Found duplicated storage UUID: DS-011fd6ee-105d-4c21-ba03-8f43bc75f0b2 in /data/hdfs/data/current/VERSION. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.activateVolume(FsDatasetImpl.java:399) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.addVolume(FsDatasetImpl.java:425) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.<init>(FsDatasetImpl.java:329) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1556) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1504) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:319) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:269) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:760) at java.lang.Thread.run(Thread.java:745) Should I just delete the original copied data files (/hadoop/hdfs/data)? TIA!!

ed_day · ‎06-07-2017

Actually presumably hive can't find the 112 version, hence the error. I updated to 112 but the error is still there 😞

ed_day · ‎06-07-2017

Thanks, very helpful! the Hive is 112 as opposed to 111. Does that sound like it would be a problem?

ed_day · ‎06-07-2017

I am trying to use Hive View 2.0 in my cluster (HDP 2.6.0.3 (Ambari 2.5.0.3)) But I get the java.lang.NullPointerException error, and as per this question I have tried everything except checking Java versions. Can anyone tell me how to check the Java use by HDP/Hive? And how do I upgrade my java to match? TIA! My java: java -version openjdk version "1.8.0_111" OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-3~14.04.1-b14) OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)

Online	Offline
Last Visited	‎05-08-2018 12:07 PM

Member Since	‎06-23-2016 11:52 AM
Last Visited	‎05-08-2018 12:07 PM
Posts	136
Kudos received	8

Cloudera Community

Re: Why is Spark2 running on only one node?

Re: Hive CLI filenotfoundexception

Re: How do I add github dependency to spark?

Re: Can someone confirm these repos exist for HDP ...

Re: Hive INSERT OVERWRITE struct NoMatchingMethodE...

How do I add github dependency to spark?

Re: Ambari users permissions file view

Re: Ambari users permissions file view

Re: Ambari users permissions file view

Ambari users permissions file view

Re: Move data gives Found duplicated storage UUID ...

Move data gives Found duplicated storage UUID erro...

Re: Java version of HDP2.6 conflict?

Re: Java version of HDP2.6 conflict?

Java version of HDP2.6 conflict?