Member since
02-08-2016
36
Posts
18
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
760 | 12-14-2017 03:09 PM | |
1701 | 08-03-2016 02:49 PM | |
2788 | 07-26-2016 10:52 AM | |
1993 | 03-07-2016 12:47 PM |
12-14-2017
03:09 PM
Thank you @Matt Andruff for your reply. I resolved the issue. I had another .jar in the /lib directory containing the same code but with another file name. I'm not sure how it does affect the execution of the job. But after removing it every thing works fine, for now at least.
... View more
12-13-2017
02:38 PM
Hi, I've a prolem with running a jar using an oozie shell action in a kerberized cluster. My jar has the following code for authentification: Configuration conf = new Configuration();
conf.set("hadoop.security.authentication","kerberos");
UserGroupInformation.setConfiguration(conf);
try {
UserGroupInformation.loginUserFromKeytab(principal, keytabPath);
} catch (IOException e) {
e.printStackTrace();
} My workflow.xml as following: <shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${resourceManager}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>hadoop</exec>
<argument>jar</argument>
<argument>jarfile</argument>
<argument>x.x.x.x.UnzipFile</argument>
<argument>keytab</argument>
<argument>${kerberosPrincipal}</argument>
<argument>${nameNode}</argument>
<argument>${zipFilePath}</argument>
<argument>${unzippingDir}</argument>
<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
<file>${workdir}/lib/[keytabFileName]#keytab</file>
<file>${workdir}/lib/[JarFileName]#jarfile</file>
</shell> The jar file and the keytab are located in HDFS in the /lib directory of the directory where the .xml is located. The problem is that on various identical run of the oozie workflow I sometime get this error: java.io.IOException: Incomplete HDFS URI, no host: hdfs://[name_bode_URI]:8020keytab
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:154)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at x.x.x.x.CompressedFilesUtilities.unzip(CompressedFilesUtilities.java:54)
at x.x.x.x.UnzipFile.main(UnzipFile.java:13)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
... View more
Labels:
- Labels:
-
Apache Oozie
08-03-2016
02:49 PM
Okay, I found a workaround, I added: -Duser.timezone=GMT which changes the the JVM timezon. The final Flume-ng command will be as following: flume-ng agent --conf-file spool1.properties --name agent1 --conf $FLUME_HOME/conf -Duser.timezone=GMT The needed directory for the oozie coordiantor is now being created.
... View more
08-03-2016
08:43 AM
Hi all, I've created an Oozie coordinator with synchronous dataset. The time in the cluster is set to CEST (GMT+2). I'm using flume to collect data and create a directory in HDFS in this format: /flume/%Y/%m/%d/%H coordinator.properties: nameNode=hdfs://vm1.local:8020
jobTracker=vm1.local:8050
queueName=default
exampleDir=${nameNode}/user/root/oozie-wait
oozie.use.system.libpath = true
start=2016-08-03T08:01Z
end=2016-08-03T12:06Z
workflowAppUri=${exampleDir}/app
oozie.coord.application.path=${exampleDir}/app coordiantor.xml: <coordinator-app name="every-hour-waitForData" frequency="${coord:hours(1)}" start="${start}" end="${end}" timezone="UTC"
xmlns="uri:oozie:coordinator:0.1">
<datasets>
<dataset name="ratings" frequency="${coord:hours(1)}" initial-instance="${start}" timezone="Europe/Paris">
<uri-template>hdfs://vm1.local:8020/user/root/flume/${YEAR}/${MONTH}/${DAY}/${HOUR}</uri-template>
</dataset>
</datasets>
<input-events>
<data-in name="coordInput1" dataset="ratings">
<instance>${coord:current(0)}</instance>
</data-in>
</input-events>
<action>
<workflow>
<app-path>${workflowAppUri}</app-path>
<configuration>
<property>
<name>wfInput</name>
<value>${coord:dataIn('coordInput1')}</value>
</property>
<property>
<name>jobTracker</name>
<value>${jobTracker}</value>
</property>
<property>
<name>nameNode</name>
<value>${nameNode}</value>
</property>
<property>
<name>queueName</name>
<value>${queueName}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>
When running this example flume creates the directory /user/root/flume/2016/08/03/10/ But the coordinator is waiting for /user/root/flume/2016/08/03/08 Does any one knows how to make Flume creates the directory in UTC or the coordinator reads the correct directory . Thanks.
... View more
Labels:
- Labels:
-
Apache Flume
-
Apache Oozie
07-27-2016
09:17 AM
Thank you @Michael M and @Alexander Bij for your valuable help.
... View more
07-26-2016
10:52 AM
Problem solved, I changed the channel type from file to memory agent1.channels.channel2.type = memory Answers about how to make it work with a channel type file are welcome.
... View more
07-26-2016
09:28 AM
Hi, I'm using Flume to collect data from a Spool Directory. My configuration is as follows: agent1.sources = source1
agent1.sinks = sink1
agent1.channels = channel2
agent1.sources.source1.channels = channel2
agent1.sinks.sink1.channel = channel2
agent1.sources.source1.type = spooldir
agent1.sources.source1.basenameHeader = true
agent1.sources.source1.spoolDir = /root/flume_example/spooldir
agent1.sinks.sink1.type = hdfs
agent1.sinks.sink1.hdfs.path = /user/root/flume
agent1.sinks.sink1.hdfs.filePrefix = %{basename}
agent1.sinks.sink1.hdfs.fileSuffix = .csv
agent1.sinks.sink1.hdfs.idleTimeout = 5
agent1.sinks.sink1.hdfs.rollSize = 0
agent1.sinks.sink1.hdfs.rollCount = 100000
agent1.sinks.sink1.hdfs.fileType = DataStream
agent1.channels.channel2.type = file When placing 43MB file in spooldir, flume starts writing files into HDFS Directory /user/root/flume: -rw-r--r-- 3 root hdfs 7.9 M 2016-07-26 11:10 /user/root/flume/filename.csv.1469524239209.csv
-rw-r--r-- 3 root hdfs 7.6 M 2016-07-26 11:11 /user/root/flume/filename.csv.1469524239210.csv But a java.lang.OutOfMemoryError: Java heap space error is raised: ERROR channel.ChannelProcessor: Error while writing to required channel: FileChannel channel2 { dataDirs: [/root/.flume/file-channel/data] }
java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.resize(HashMap.java:703)
at java.util.HashMap.putVal(HashMap.java:662)
at java.util.HashMap.put(HashMap.java:611)
at org.apache.flume.channel.file.EventQueueBackingStoreFile.put(EventQueueBackingStoreFile.java:338)
at org.apache.flume.channel.file.FlumeEventQueue.set(FlumeEventQueue.java:287)
at org.apache.flume.channel.file.FlumeEventQueue.add(FlumeEventQueue.java:317)
at org.apache.flume.channel.file.FlumeEventQueue.addTail(FlumeEventQueue.java:211)
at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doCommit(FileChannel.java:553)
at org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151)
at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:192)
at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:235)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/07/26 11:10:59 ERROR source.SpoolDirectorySource: FATAL: Spool Directory source source1: { spoolDir: /root/flume_example/spooldir }: Uncaught exception in SpoolDirectorySource thread. Restart or reconfigure Flume to continue processing.
java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.resize(HashMap.java:703)
at java.util.HashMap.putVal(HashMap.java:662)
at java.util.HashMap.put(HashMap.java:611)
at org.apache.flume.channel.file.EventQueueBackingStoreFile.put(EventQueueBackingStoreFile.java:338)
at org.apache.flume.channel.file.FlumeEventQueue.set(FlumeEventQueue.java:287)
at org.apache.flume.channel.file.FlumeEventQueue.add(FlumeEventQueue.java:317)
at org.apache.flume.channel.file.FlumeEventQueue.addTail(FlumeEventQueue.java:211)
at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doCommit(FileChannel.java:553)
at org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151)
at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:192)
at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:235)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Any idea how can I fix this issue ? Thanks.
... View more
Labels:
- Labels:
-
Apache Flume
07-20-2016
11:06 AM
Okay, I installed the NodeManger on the 3 remaining nodes and I have now all the nodes active.
... View more
07-20-2016
10:41 AM
Hi, I have a cluster with 4 nodes (NameNode: 8Gb RAM, 3 Data nodes with 4GB RAM). In the Ressource Manager UI i'm getting only one Active Node: Is this Normal ? Thanks.
... View more
Labels:
- Labels:
-
Cloudera Manager
05-18-2016
03:17 PM
1 Kudo
Hi, I want to update a Hbase table using Pig, I have an empty column family and I want to add columns to it from a pig object. I've done some researches but in vain ... Is this feature exists or not ? Thanks
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Pig
04-18-2016
01:12 PM
Hi @Joy
The solution you proposed worked, I'm getting a new error not related to a jar dependency. To make this fix permanent should I add that line to hadoop-env ?
... View more
04-18-2016
11:10 AM
1 Kudo
Hi,
I'm trying to connect from A java Application to Hbase like this:
Configuration config = HBaseConfiguration.create();
config.set("hbase.zookeeper.quorum", "localhost");
config.set("hbase.zookeeper.property.clientPort", "2181");
config.set("zookeeper.znode.parent", "/hbase-unsecure");
config.set("hbase.client.retries.number", Integer.toString(0));
config.set("zookeeper.session.timeout", Integer.toString(60000));
config.set("zookeeper.recovery.retry", Integer.toString(0));
Connection conn = ConnectionFactory.createConnection(config);
TableName TABLE_NAME = TableName.valueOf("weblog");
Table table = conn.getTable(TABLE_NAME);
Result r = table.get(new Get(Bytes.toBytes("row1")));
System.out.println(r);
I builded the App into a JAR but when running it on the cluster with :
hadoop jar hbaseConnect-0.0.1-SNAPSHOT.jar com.packagename.hbaseConnect.HbaseConnect
i get the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
at com.DigiMerket.hbaseConnect.HbaseConnect.main(HbaseConnect.java:23)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
I tried to add HBASE_CLASSPATH to HADOOP_CLASSPATH in hadoop-env like this post suggests but I get the same error..
... View more
Labels:
- Labels:
-
Apache HBase
04-08-2016
12:41 PM
@Benjamin Leonhardi Thank you for your answer, Do you have an idea about the best way to access the cluster (hdfs, Hbase, ...) and retrieve data easily ?
... View more
04-08-2016
09:44 AM
Hi all, I'd like to develop a front end to run various algorithms I developed and visualise results. I need my Front End to be personalised that's why I didn't use Hue. To achieve this I thought about developing a RESTful API using Java Jersey and Hive JDBC to be called from AngularJS. Is this a good choice ? or I've other alternatives (suggestions are welcome)? Does Hive JDBC support concurrency and simultaneous queries ?
... View more
Labels:
- Labels:
-
Apache Hive
03-07-2016
12:49 PM
@Neeraj Sabharwal Even in 0.11 it still exist .. see my answer below.
... View more
03-07-2016
12:47 PM
Resolved this ... Instead of using relative path like this : new Path("/testdata/points") you have to put the absolute Path of the directory in your cluster: new Path("hdfs://vm1.local:8020/user/root/testdata/points")
... View more
03-04-2016
10:58 AM
2 Kudos
I'm trying to use Mahout to do a clustering job, I've been struggling with it and maven for a week now ... My code works fine on eclipse on local machine but when i build it in a jar and send it to the cluster i get some errors reading from HDFS i guess. First I created the directory under /user/root/testdata hadoop fs -mkdir /user/root/testdata then put the downloaded file synthetic_control.data into it hadoop fs -put synthetic_control.data /user/root/testdata/ Finally run the example using: -mahout examples jar from mahout 0.9 downloaded from website: hadoop jar mahout-examples-1.0-SNAPSHOT-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job -and the mahout-examples-0.9.0.2.3.4.0-3485-job.jar file which is found in the mahout directory in the node: hadoop jar /usr/hdp/2.3.4.0-3485/mahout/mahout-examples-0.9.0.2.3.4.0-3485-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job and in both cases i get this error : WARNING: Use "yarn jar" to launch YARN applications.
16/03/04 11:57:03 INFO kmeans.Job: Running with default arguments
16/03/04 11:57:05 INFO common.HadoopUtil: Deleting output
16/03/04 11:57:05 INFO kmeans.Job: Preparing Input
16/03/04 11:57:05 INFO impl.TimelineClientImpl: Timeline service address: http://vm2.local:8188/ws/v1/timeline/
16/03/04 11:57:05 INFO client.RMProxy: Connecting to ResourceManager at vm1.local/10.10.10.1:8050
16/03/04 11:57:06 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/03/04 11:57:07 INFO input.FileInputFormat: Total input paths to process : 1
16/03/04 11:57:07 INFO mapreduce.JobSubmitter: number of splits:1
16/03/04 11:57:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1456915204500_0029
16/03/04 11:57:07 INFO impl.YarnClientImpl: Submitted application application_1456915204500_0029
16/03/04 11:57:07 INFO mapreduce.Job: The url to track the job: http://vm1.local:8088/proxy/application_1456915204500_0029/
16/03/04 11:57:07 INFO mapreduce.Job: Running job: job_1456915204500_0029
16/03/04 11:57:14 INFO mapreduce.Job: Job job_1456915204500_0029 running in uber mode : false
16/03/04 11:57:14 INFO mapreduce.Job: map 0% reduce 0%
16/03/04 11:57:20 INFO mapreduce.Job: map 100% reduce 0%
16/03/04 11:57:20 INFO mapreduce.Job: Job job_1456915204500_0029 completed successfully
16/03/04 11:57:20 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=129757
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=288502
HDFS: Number of bytes written=335470
HDFS: Number of read operations=5
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=3457
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=3457
Total vcore-seconds taken by all map tasks=3457
Total megabyte-seconds taken by all map tasks=3539968
Map-Reduce Framework
Map input records=600
Map output records=600
Input split bytes=128
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=76
CPU time spent (ms)=590
Physical memory (bytes) snapshot=113729536
Virtual memory (bytes) snapshot=2723696640
Total committed heap usage (bytes)=62324736
File Input Format Counters
Bytes Read=288374
File Output Format Counters
Bytes Written=335470
16/03/04 11:57:20 INFO kmeans.Job: Running random seed to get initial clusters
16/03/04 11:57:20 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
16/03/04 11:57:20 INFO compress.CodecPool: Got brand-new compressor [.deflate]
16/03/04 11:57:21 INFO kmeans.RandomSeedGenerator: Wrote 6 Klusters to output/random-seeds/part-randomSeed
16/03/04 11:57:21 INFO kmeans.Job: Running KMeans with k = 6
16/03/04 11:57:21 INFO kmeans.KMeansDriver: Input: output/data Clusters In: output/random-seeds/part-randomSeed Out: output
16/03/04 11:57:21 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10
16/03/04 11:57:21 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
16/03/04 11:57:21 INFO impl.TimelineClientImpl: Timeline service address: http://vm2.local:8188/ws/v1/timeline/
16/03/04 11:57:21 INFO client.RMProxy: Connecting to ResourceManager at vm1.local/10.10.10.1:8050
16/03/04 11:57:21 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/03/04 11:57:22 INFO input.FileInputFormat: Total input paths to process : 1
16/03/04 11:57:22 INFO mapreduce.JobSubmitter: number of splits:1
16/03/04 11:57:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1456915204500_0030
16/03/04 11:57:22 INFO impl.YarnClientImpl: Submitted application application_1456915204500_0030
16/03/04 11:57:22 INFO mapreduce.Job: The url to track the job: http://vm1.local:8088/proxy/application_1456915204500_0030/
16/03/04 11:57:22 INFO mapreduce.Job: Running job: job_1456915204500_0030
16/03/04 11:57:33 INFO mapreduce.Job: Job job_1456915204500_0030 running in uber mode : false
16/03/04 11:57:33 INFO mapreduce.Job: map 0% reduce 0%
16/03/04 11:57:37 INFO mapreduce.Job: Task Id : attempt_1456915204500_0030_m_000000_0, Status : FAILED
Error: java.lang.IllegalStateException: output/clusters-0
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:78)
at org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:208)
at org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.io.FileNotFoundException: File output/clusters-0 does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:574)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:70)
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:76)
... 10 more
16/03/04 11:57:42 INFO mapreduce.Job: Task Id : attempt_1456915204500_0030_m_000000_1, Status : FAILED
Error: java.lang.IllegalStateException: output/clusters-0
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:78)
at org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:208)
at org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.io.FileNotFoundException: File output/clusters-0 does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:574)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:70)
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:76)
... 10 more
16/03/04 11:57:46 INFO mapreduce.Job: Task Id : attempt_1456915204500_0030_m_000000_2, Status : FAILED
Error: java.lang.IllegalStateException: output/clusters-0
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:78)
at org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:208)
at org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.io.FileNotFoundException: File output/clusters-0 does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:574)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:70)
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:76)
... 10 more
16/03/04 11:57:52 INFO mapreduce.Job: map 100% reduce 100%
16/03/04 11:57:53 INFO mapreduce.Job: Job job_1456915204500_0030 failed with state FAILED due to: Task failed task_1456915204500_0030_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
16/03/04 11:57:53 INFO mapreduce.Job: Counters: 13
Job Counters
Failed map tasks=4
Killed reduce tasks=1
Launched map tasks=4
Other local map tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=11687
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=11687
Total time spent by all reduce tasks (ms)=0
Total vcore-seconds taken by all map tasks=11687
Total vcore-seconds taken by all reduce tasks=0
Total megabyte-seconds taken by all map tasks=11967488
Total megabyte-seconds taken by all reduce tasks=0
Exception in thread "main" java.lang.InterruptedException: Cluster Iteration 1 failed processing output/clusters-1
at org.apache.mahout.clustering.iterator.ClusterIterator.iterateMR(ClusterIterator.java:183)
at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:224)
at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:147)
at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.run(Job.java:135)
at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:60)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
I guess it's a version problem ... Thanks.
... View more
Labels:
03-04-2016
10:42 AM
That solved the problem but the Map job stacked, and even after killing it the Yarn container still exists I had to kill it manually. i'll be back to this shortly.
... View more
03-01-2016
10:10 AM
2 Kudos
I'm following this tutorial: http://hortonworks.com/blog/using-r-and-other-non-java-languages-in-mapreduce-and-hive/ I put cities.txt in /user/root/ and the R script as following : #!/usr/bin/env Rscript
f <- file("stdin")
open(f)
state_data = read.table(f)
summary(state_data) and then run the command: hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming-2.7.1.2.3.4.0-3485.jar -input /user/root/cities.txt -output /user/root/streamer -mapper /bin/cat -reducer script.R -numReduceTasks 2 -file script.R
Map works till 100% and reduce shows this error: 16/03/01 11:06:30 INFO mapreduce.Job: map 100% reduce 50%
16/03/01 11:06:34 INFO mapreduce.Job: Task Id : attempt_1456773989186_0009_r_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Does any one have any idea or encountered this before ? Thanks.
... View more
Labels:
- Labels:
-
Apache Hadoop
03-01-2016
01:01 AM
When testing the example I got this error 😕 : 16/03/01 01:57:29 INFO mapreduce.Job: Task Id : attempt_1456773989186_0006_r_000001_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
... View more
03-01-2016
12:59 AM
1 Kudo
@Neeraj Sabharwal Thank u for your answer
... View more
03-01-2016
12:57 AM
@Artem Ervits thanks 🙂
... View more
02-29-2016
10:18 PM
3 Kudos
Hi all, I was following this tutorial: http://hortonworks.com/blog/using-r-and-other-non-java-languages-in-mapreduce-and-hive/ and I couldn't find hadoop-streamingxxxx.jar. I'm using a cluster with hdp-2.3.4.0-3485. Does any know where to find it or how to add it ? Thanks 🙂
... View more
- Tags:
- hadoop
- Hadoop Core
Labels:
- Labels:
-
Apache Hadoop
02-29-2016
08:26 AM
1 Kudo
Thanks all, I used this method to setup the hostname and it worked ! My cluster is running now 🙂
... View more
02-24-2016
12:24 PM
1 Kudo
I added this lines to /etc/hosts in every VM
10.10.10.1 VM1.local VM1
10.10.10.2 VM2.local VM2
10.10.10.3 VM3.local VM3
10.10.10.4 VM4.local VM4
Now i'm able to use "ping VMx" from every VM and ssh user@VMx works too. hostname -f returns VMx.local is this fine ?
... View more
02-24-2016
11:55 AM
1 Kudo
Hi all, I've 4 VMs on a vps with internet access through a rebond but not visible from the internet. My problem is that i couldn't set the FQDN for each VM. I added this line to /etc/hosts for every VM: "10.10.10.x VMx.local VMx" where x is the number of the VM Any one have an idea about how to set the FQDN ? and if not is it possible to use IP instead of FQDN ? i've this config: Public repond: IP : 151.xx.xx.xx from the rebond with ssh: VM | eth0 | | eth1 |
VM1 : 192.168.1.10 10.10.10.1
VM2 : 192.168.1.11 10.10.10.2
VM3 : 192.168.1.12 10.10.10.3
VM4 : 192.168.1.13 10.10.10.4 Thanks.
... View more
Labels:
- Labels:
-
Apache Ambari
02-14-2016
09:19 PM
1 Kudo
@Neeraj Sabharwal Do you recommend using Ambari or manual installation ? in the case of using Ambari, i went through documentation and didn’t find where I can allocate space for logs. Thanks.
... View more
02-13-2016
02:33 PM
1 Kudo
@Neeraj Sabharwal Thank you for your answer. This is a development and test environment 🙂
... View more
02-13-2016
02:07 PM
2 Kudos
Hi all, I'm new to Hadoop and i'm currently working on a project using HDP. I've an OVH server with the following config: - 4 CPUs x Intel(R) Xeon(R) CPU E3-1231 v3 @ 3.40GHz
- RAM : 32 GB
- Storage : 2 TB with ESXi installed. My question is about the best partitioning schema and the number of Nodes. I'm not sure if 4 nodes with 1 cpu, 8 GB of RAM and 500 GB HDD is good, at least for development (1 Namenode and 3 Datanodes)? I'm working with Data from a middle sized retail. Thanks.
... View more
Labels: