Member since
02-08-2016
36
Posts
18
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
277 | 12-14-2017 03:09 PM | |
1109 | 08-03-2016 02:49 PM | |
1602 | 07-26-2016 10:52 AM | |
992 | 03-07-2016 12:47 PM |
04-02-2019
03:03 PM
Hello, I'm trying to lunch a Sqoop export action in oozie. The job exists with the following error: The action is defined as follows: <sqoop xmlns="uri:oozie:sqoop-action:0.3">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<arg>export</arg>
<arg>--connect</arg>
<arg>${NZ_JDBC_url}</arg>
<arg>--table</arg>
<arg>${NZ_table_name}</arg>
<arg>--username</arg>
<arg>${NZ_username}</arg>
<arg>--password-file</arg>
<arg>${passwordfile_path}</arg>
<arg>--hcatalog-database</arg>
<arg>${hcat_db}</arg>
<arg>--hcatalog-table</arg>
<arg>${hcat_table}</arg>
<arg>--batch</arg>
<file>${passwordfile_path}#passwordfile</file>
</sqoop> The error : Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], main() threw exception, org/apache/hadoop/hive/metastore/IMetaStoreClient java.lang.NoClassDefFoundError: org/apache/hadoop/hive/metastore/IMetaStoreClient at org.apache.sqoop.tool.BaseSqoopTool.validateHCatalogOptions(BaseSqoopTool.java:1706) I also tested with the hive example provided in Oozie source. it works fine, and I have noticed in the launch_container.sh logs the number of loaded jars is not the same. In the case of Hive Action there's more than 203 jars loaded but for sqoop action only 36 and there'es no hive jar in them.. I guess it's a sharelib problem but I'm not sure how to fix it.
... View more
Labels:
12-14-2017
03:09 PM
Thank you @Matt Andruff for your reply. I resolved the issue. I had another .jar in the /lib directory containing the same code but with another file name. I'm not sure how it does affect the execution of the job. But after removing it every thing works fine, for now at least.
... View more
12-13-2017
02:38 PM
Hi, I've a prolem with running a jar using an oozie shell action in a kerberized cluster. My jar has the following code for authentification: Configuration conf = new Configuration();
conf.set("hadoop.security.authentication","kerberos");
UserGroupInformation.setConfiguration(conf);
try {
UserGroupInformation.loginUserFromKeytab(principal, keytabPath);
} catch (IOException e) {
e.printStackTrace();
} My workflow.xml as following: <shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${resourceManager}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>hadoop</exec>
<argument>jar</argument>
<argument>jarfile</argument>
<argument>x.x.x.x.UnzipFile</argument>
<argument>keytab</argument>
<argument>${kerberosPrincipal}</argument>
<argument>${nameNode}</argument>
<argument>${zipFilePath}</argument>
<argument>${unzippingDir}</argument>
<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
<file>${workdir}/lib/[keytabFileName]#keytab</file>
<file>${workdir}/lib/[JarFileName]#jarfile</file>
</shell> The jar file and the keytab are located in HDFS in the /lib directory of the directory where the .xml is located. The problem is that on various identical run of the oozie workflow I sometime get this error: java.io.IOException: Incomplete HDFS URI, no host: hdfs://[name_bode_URI]:8020keytab
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:154)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at x.x.x.x.CompressedFilesUtilities.unzip(CompressedFilesUtilities.java:54)
at x.x.x.x.UnzipFile.main(UnzipFile.java:13)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
... View more
Labels:
08-03-2016
02:49 PM
Okay, I found a workaround, I added: -Duser.timezone=GMT which changes the the JVM timezon. The final Flume-ng command will be as following: flume-ng agent --conf-file spool1.properties --name agent1 --conf $FLUME_HOME/conf -Duser.timezone=GMT The needed directory for the oozie coordiantor is now being created.
... View more
08-03-2016
08:54 AM
Hi all, I've created an Oozie coordinator with synchronous dataset with a default done-flag.
According to the oozie documentation: done-flag: The done file for the data set. If done-flag is not specified, then Oozie configures Hadoop to create a _SUCCESS file in the output directory. I'm using flume to collect data into directory in this format: /flume/%Y/%m/%d/%H The problem is that _SUCCESS is not being created.
... View more
Labels:
08-03-2016
08:43 AM
Hi all, I've created an Oozie coordinator with synchronous dataset. The time in the cluster is set to CEST (GMT+2). I'm using flume to collect data and create a directory in HDFS in this format: /flume/%Y/%m/%d/%H coordinator.properties: nameNode=hdfs://vm1.local:8020
jobTracker=vm1.local:8050
queueName=default
exampleDir=${nameNode}/user/root/oozie-wait
oozie.use.system.libpath = true
start=2016-08-03T08:01Z
end=2016-08-03T12:06Z
workflowAppUri=${exampleDir}/app
oozie.coord.application.path=${exampleDir}/app coordiantor.xml: <coordinator-app name="every-hour-waitForData" frequency="${coord:hours(1)}" start="${start}" end="${end}" timezone="UTC"
xmlns="uri:oozie:coordinator:0.1">
<datasets>
<dataset name="ratings" frequency="${coord:hours(1)}" initial-instance="${start}" timezone="Europe/Paris">
<uri-template>hdfs://vm1.local:8020/user/root/flume/${YEAR}/${MONTH}/${DAY}/${HOUR}</uri-template>
</dataset>
</datasets>
<input-events>
<data-in name="coordInput1" dataset="ratings">
<instance>${coord:current(0)}</instance>
</data-in>
</input-events>
<action>
<workflow>
<app-path>${workflowAppUri}</app-path>
<configuration>
<property>
<name>wfInput</name>
<value>${coord:dataIn('coordInput1')}</value>
</property>
<property>
<name>jobTracker</name>
<value>${jobTracker}</value>
</property>
<property>
<name>nameNode</name>
<value>${nameNode}</value>
</property>
<property>
<name>queueName</name>
<value>${queueName}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>
When running this example flume creates the directory /user/root/flume/2016/08/03/10/ But the coordinator is waiting for /user/root/flume/2016/08/03/08 Does any one knows how to make Flume creates the directory in UTC or the coordinator reads the correct directory . Thanks.
... View more
Labels:
07-27-2016
09:17 AM
Thank you @Michael M and @Alexander Bij for your valuable help.
... View more
07-26-2016
10:52 AM
Problem solved, I changed the channel type from file to memory agent1.channels.channel2.type = memory Answers about how to make it work with a channel type file are welcome.
... View more
07-26-2016
09:28 AM
Hi, I'm using Flume to collect data from a Spool Directory. My configuration is as follows: agent1.sources = source1
agent1.sinks = sink1
agent1.channels = channel2
agent1.sources.source1.channels = channel2
agent1.sinks.sink1.channel = channel2
agent1.sources.source1.type = spooldir
agent1.sources.source1.basenameHeader = true
agent1.sources.source1.spoolDir = /root/flume_example/spooldir
agent1.sinks.sink1.type = hdfs
agent1.sinks.sink1.hdfs.path = /user/root/flume
agent1.sinks.sink1.hdfs.filePrefix = %{basename}
agent1.sinks.sink1.hdfs.fileSuffix = .csv
agent1.sinks.sink1.hdfs.idleTimeout = 5
agent1.sinks.sink1.hdfs.rollSize = 0
agent1.sinks.sink1.hdfs.rollCount = 100000
agent1.sinks.sink1.hdfs.fileType = DataStream
agent1.channels.channel2.type = file When placing 43MB file in spooldir, flume starts writing files into HDFS Directory /user/root/flume: -rw-r--r-- 3 root hdfs 7.9 M 2016-07-26 11:10 /user/root/flume/filename.csv.1469524239209.csv
-rw-r--r-- 3 root hdfs 7.6 M 2016-07-26 11:11 /user/root/flume/filename.csv.1469524239210.csv But a java.lang.OutOfMemoryError: Java heap space error is raised: ERROR channel.ChannelProcessor: Error while writing to required channel: FileChannel channel2 { dataDirs: [/root/.flume/file-channel/data] }
java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.resize(HashMap.java:703)
at java.util.HashMap.putVal(HashMap.java:662)
at java.util.HashMap.put(HashMap.java:611)
at org.apache.flume.channel.file.EventQueueBackingStoreFile.put(EventQueueBackingStoreFile.java:338)
at org.apache.flume.channel.file.FlumeEventQueue.set(FlumeEventQueue.java:287)
at org.apache.flume.channel.file.FlumeEventQueue.add(FlumeEventQueue.java:317)
at org.apache.flume.channel.file.FlumeEventQueue.addTail(FlumeEventQueue.java:211)
at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doCommit(FileChannel.java:553)
at org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151)
at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:192)
at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:235)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/07/26 11:10:59 ERROR source.SpoolDirectorySource: FATAL: Spool Directory source source1: { spoolDir: /root/flume_example/spooldir }: Uncaught exception in SpoolDirectorySource thread. Restart or reconfigure Flume to continue processing.
java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.resize(HashMap.java:703)
at java.util.HashMap.putVal(HashMap.java:662)
at java.util.HashMap.put(HashMap.java:611)
at org.apache.flume.channel.file.EventQueueBackingStoreFile.put(EventQueueBackingStoreFile.java:338)
at org.apache.flume.channel.file.FlumeEventQueue.set(FlumeEventQueue.java:287)
at org.apache.flume.channel.file.FlumeEventQueue.add(FlumeEventQueue.java:317)
at org.apache.flume.channel.file.FlumeEventQueue.addTail(FlumeEventQueue.java:211)
at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doCommit(FileChannel.java:553)
at org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151)
at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:192)
at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:235)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Any idea how can I fix this issue ? Thanks.
... View more
Labels:
07-20-2016
11:06 AM
Okay, I installed the NodeManger on the 3 remaining nodes and I have now all the nodes active.
... View more
07-20-2016
10:41 AM
Hi, I have a cluster with 4 nodes (NameNode: 8Gb RAM, 3 Data nodes with 4GB RAM). In the Ressource Manager UI i'm getting only one Active Node: Is this Normal ? Thanks.
... View more
Labels:
05-18-2016
03:17 PM
1 Kudo
Hi, I want to update a Hbase table using Pig, I have an empty column family and I want to add columns to it from a pig object. I've done some researches but in vain ... Is this feature exists or not ? Thanks
... View more
Labels:
04-26-2016
08:22 AM
Thank's @Matt Foley for your answer, The issue here that none of my nodes have a public IP address, I've an intermediate machine that I ssh into and i ssh again into the cluster. My cluster is not accessible directly ..
... View more
04-25-2016
05:17 PM
The Nodes doesn’t have a public IP address, the command returned tcp 0 0 10.10.10.1:50070 0.0.0.0:* LISTEN 6754/java The IP address 10.10.10.1 is visible only from within the cluster, if I use the other IP 192.168.1.10 i get
curl: (7) couldn't connect to host even if I'm on a node... the only way is to use 10.10.10.1
... View more
04-25-2016
04:23 PM
Hi, I've a 4 nodes cluster with on public IP (Nodes have access to the internet and they have a unique ip) usually I use a double ssh tunnel to access nodes and services by port forwarding. My question is how I could use WebHDFS from my local machine, I found a problem with how to specify the ip address oh the VM1.local node. From inside the cluster I use this command: curl -i "http://vm1.local:50070/webhdfs/v1/user/root/?op=LISTSTATUS" ( or 10.10.10.1 in the place of vm1.local ) My config is like this: Public repond: IP : 151.xx.xx.xx with VM | eth0 | | eth1 |
VM1 : 192.168.1.10 10.10.10.1
VM2 : 192.168.1.11
10.10.10.2
VM3 : 192.168.1.12 10.10.10.3
VM4 : 192.168.1.13 10.10.10.4 Thanks,
... View more
Labels:
04-18-2016
01:12 PM
Hi @Joy
The solution you proposed worked, I'm getting a new error not related to a jar dependency. To make this fix permanent should I add that line to hadoop-env ?
... View more
04-18-2016
11:10 AM
1 Kudo
Hi,
I'm trying to connect from A java Application to Hbase like this:
Configuration config = HBaseConfiguration.create();
config.set("hbase.zookeeper.quorum", "localhost");
config.set("hbase.zookeeper.property.clientPort", "2181");
config.set("zookeeper.znode.parent", "/hbase-unsecure");
config.set("hbase.client.retries.number", Integer.toString(0));
config.set("zookeeper.session.timeout", Integer.toString(60000));
config.set("zookeeper.recovery.retry", Integer.toString(0));
Connection conn = ConnectionFactory.createConnection(config);
TableName TABLE_NAME = TableName.valueOf("weblog");
Table table = conn.getTable(TABLE_NAME);
Result r = table.get(new Get(Bytes.toBytes("row1")));
System.out.println(r);
I builded the App into a JAR but when running it on the cluster with :
hadoop jar hbaseConnect-0.0.1-SNAPSHOT.jar com.packagename.hbaseConnect.HbaseConnect
i get the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
at com.DigiMerket.hbaseConnect.HbaseConnect.main(HbaseConnect.java:23)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
I tried to add HBASE_CLASSPATH to HADOOP_CLASSPATH in hadoop-env like this post suggests but I get the same error..
... View more
Labels:
04-11-2016
08:32 AM
1 Kudo
Hi All, I would like to develop a custom web application to visualise results from HDP. Is there any component that allows me to access data (from Hbase, HDFS) in real-time (or near real time).
... View more
04-08-2016
12:41 PM
@Benjamin Leonhardi Thank you for your answer, Do you have an idea about the best way to access the cluster (hdfs, Hbase, ...) and retrieve data easily ?
... View more
04-08-2016
09:44 AM
Hi all, I'd like to develop a front end to run various algorithms I developed and visualise results. I need my Front End to be personalised that's why I didn't use Hue. To achieve this I thought about developing a RESTful API using Java Jersey and Hive JDBC to be called from AngularJS. Is this a good choice ? or I've other alternatives (suggestions are welcome)? Does Hive JDBC support concurrency and simultaneous queries ?
... View more
Labels:
03-07-2016
12:49 PM
@Neeraj Sabharwal Even in 0.11 it still exist .. see my answer below.
... View more
03-07-2016
12:47 PM
Resolved this ... Instead of using relative path like this : new Path("/testdata/points") you have to put the absolute Path of the directory in your cluster: new Path("hdfs://vm1.local:8020/user/root/testdata/points")
... View more
03-04-2016
10:58 AM
2 Kudos
I'm trying to use Mahout to do a clustering job, I've been struggling with it and maven for a week now ... My code works fine on eclipse on local machine but when i build it in a jar and send it to the cluster i get some errors reading from HDFS i guess. First I created the directory under /user/root/testdata hadoop fs -mkdir /user/root/testdata then put the downloaded file synthetic_control.data into it hadoop fs -put synthetic_control.data /user/root/testdata/ Finally run the example using: -mahout examples jar from mahout 0.9 downloaded from website: hadoop jar mahout-examples-1.0-SNAPSHOT-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job -and the mahout-examples-0.9.0.2.3.4.0-3485-job.jar file which is found in the mahout directory in the node: hadoop jar /usr/hdp/2.3.4.0-3485/mahout/mahout-examples-0.9.0.2.3.4.0-3485-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job and in both cases i get this error : WARNING: Use "yarn jar" to launch YARN applications.
16/03/04 11:57:03 INFO kmeans.Job: Running with default arguments
16/03/04 11:57:05 INFO common.HadoopUtil: Deleting output
16/03/04 11:57:05 INFO kmeans.Job: Preparing Input
16/03/04 11:57:05 INFO impl.TimelineClientImpl: Timeline service address: http://vm2.local:8188/ws/v1/timeline/
16/03/04 11:57:05 INFO client.RMProxy: Connecting to ResourceManager at vm1.local/10.10.10.1:8050
16/03/04 11:57:06 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/03/04 11:57:07 INFO input.FileInputFormat: Total input paths to process : 1
16/03/04 11:57:07 INFO mapreduce.JobSubmitter: number of splits:1
16/03/04 11:57:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1456915204500_0029
16/03/04 11:57:07 INFO impl.YarnClientImpl: Submitted application application_1456915204500_0029
16/03/04 11:57:07 INFO mapreduce.Job: The url to track the job: http://vm1.local:8088/proxy/application_1456915204500_0029/
16/03/04 11:57:07 INFO mapreduce.Job: Running job: job_1456915204500_0029
16/03/04 11:57:14 INFO mapreduce.Job: Job job_1456915204500_0029 running in uber mode : false
16/03/04 11:57:14 INFO mapreduce.Job: map 0% reduce 0%
16/03/04 11:57:20 INFO mapreduce.Job: map 100% reduce 0%
16/03/04 11:57:20 INFO mapreduce.Job: Job job_1456915204500_0029 completed successfully
16/03/04 11:57:20 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=129757
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=288502
HDFS: Number of bytes written=335470
HDFS: Number of read operations=5
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=3457
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=3457
Total vcore-seconds taken by all map tasks=3457
Total megabyte-seconds taken by all map tasks=3539968
Map-Reduce Framework
Map input records=600
Map output records=600
Input split bytes=128
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=76
CPU time spent (ms)=590
Physical memory (bytes) snapshot=113729536
Virtual memory (bytes) snapshot=2723696640
Total committed heap usage (bytes)=62324736
File Input Format Counters
Bytes Read=288374
File Output Format Counters
Bytes Written=335470
16/03/04 11:57:20 INFO kmeans.Job: Running random seed to get initial clusters
16/03/04 11:57:20 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
16/03/04 11:57:20 INFO compress.CodecPool: Got brand-new compressor [.deflate]
16/03/04 11:57:21 INFO kmeans.RandomSeedGenerator: Wrote 6 Klusters to output/random-seeds/part-randomSeed
16/03/04 11:57:21 INFO kmeans.Job: Running KMeans with k = 6
16/03/04 11:57:21 INFO kmeans.KMeansDriver: Input: output/data Clusters In: output/random-seeds/part-randomSeed Out: output
16/03/04 11:57:21 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10
16/03/04 11:57:21 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
16/03/04 11:57:21 INFO impl.TimelineClientImpl: Timeline service address: http://vm2.local:8188/ws/v1/timeline/
16/03/04 11:57:21 INFO client.RMProxy: Connecting to ResourceManager at vm1.local/10.10.10.1:8050
16/03/04 11:57:21 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/03/04 11:57:22 INFO input.FileInputFormat: Total input paths to process : 1
16/03/04 11:57:22 INFO mapreduce.JobSubmitter: number of splits:1
16/03/04 11:57:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1456915204500_0030
16/03/04 11:57:22 INFO impl.YarnClientImpl: Submitted application application_1456915204500_0030
16/03/04 11:57:22 INFO mapreduce.Job: The url to track the job: http://vm1.local:8088/proxy/application_1456915204500_0030/
16/03/04 11:57:22 INFO mapreduce.Job: Running job: job_1456915204500_0030
16/03/04 11:57:33 INFO mapreduce.Job: Job job_1456915204500_0030 running in uber mode : false
16/03/04 11:57:33 INFO mapreduce.Job: map 0% reduce 0%
16/03/04 11:57:37 INFO mapreduce.Job: Task Id : attempt_1456915204500_0030_m_000000_0, Status : FAILED
Error: java.lang.IllegalStateException: output/clusters-0
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:78)
at org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:208)
at org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.io.FileNotFoundException: File output/clusters-0 does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:574)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:70)
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:76)
... 10 more
16/03/04 11:57:42 INFO mapreduce.Job: Task Id : attempt_1456915204500_0030_m_000000_1, Status : FAILED
Error: java.lang.IllegalStateException: output/clusters-0
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:78)
at org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:208)
at org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.io.FileNotFoundException: File output/clusters-0 does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:574)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:70)
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:76)
... 10 more
16/03/04 11:57:46 INFO mapreduce.Job: Task Id : attempt_1456915204500_0030_m_000000_2, Status : FAILED
Error: java.lang.IllegalStateException: output/clusters-0
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:78)
at org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:208)
at org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.io.FileNotFoundException: File output/clusters-0 does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:574)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:70)
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:76)
... 10 more
16/03/04 11:57:52 INFO mapreduce.Job: map 100% reduce 100%
16/03/04 11:57:53 INFO mapreduce.Job: Job job_1456915204500_0030 failed with state FAILED due to: Task failed task_1456915204500_0030_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
16/03/04 11:57:53 INFO mapreduce.Job: Counters: 13
Job Counters
Failed map tasks=4
Killed reduce tasks=1
Launched map tasks=4
Other local map tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=11687
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=11687
Total time spent by all reduce tasks (ms)=0
Total vcore-seconds taken by all map tasks=11687
Total vcore-seconds taken by all reduce tasks=0
Total megabyte-seconds taken by all map tasks=11967488
Total megabyte-seconds taken by all reduce tasks=0
Exception in thread "main" java.lang.InterruptedException: Cluster Iteration 1 failed processing output/clusters-1
at org.apache.mahout.clustering.iterator.ClusterIterator.iterateMR(ClusterIterator.java:183)
at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:224)
at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:147)
at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.run(Job.java:135)
at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:60)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
I guess it's a version problem ... Thanks.
... View more
Labels:
03-04-2016
10:42 AM
That solved the problem but the Map job stacked, and even after killing it the Yarn container still exists I had to kill it manually. i'll be back to this shortly.
... View more
03-01-2016
10:10 AM
2 Kudos
I'm following this tutorial: http://hortonworks.com/blog/using-r-and-other-non-java-languages-in-mapreduce-and-hive/ I put cities.txt in /user/root/ and the R script as following : #!/usr/bin/env Rscript
f <- file("stdin")
open(f)
state_data = read.table(f)
summary(state_data) and then run the command: hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming-2.7.1.2.3.4.0-3485.jar -input /user/root/cities.txt -output /user/root/streamer -mapper /bin/cat -reducer script.R -numReduceTasks 2 -file script.R
Map works till 100% and reduce shows this error: 16/03/01 11:06:30 INFO mapreduce.Job: map 100% reduce 50%
16/03/01 11:06:34 INFO mapreduce.Job: Task Id : attempt_1456773989186_0009_r_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Does any one have any idea or encountered this before ? Thanks.
... View more
Labels:
03-01-2016
01:01 AM
When testing the example I got this error 😕 : 16/03/01 01:57:29 INFO mapreduce.Job: Task Id : attempt_1456773989186_0006_r_000001_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
... View more
03-01-2016
12:59 AM
1 Kudo
@Neeraj Sabharwal Thank u for your answer
... View more
03-01-2016
12:57 AM
@Artem Ervits thanks 🙂
... View more
02-29-2016
10:18 PM
3 Kudos
Hi all, I was following this tutorial: http://hortonworks.com/blog/using-r-and-other-non-java-languages-in-mapreduce-and-hive/ and I couldn't find hadoop-streamingxxxx.jar. I'm using a cluster with hdp-2.3.4.0-3485. Does any know where to find it or how to add it ? Thanks 🙂
... View more
- Tags:
- hadoop
- Hadoop Core
Labels: