Member since
02-08-2016
36
Posts
18
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1434 | 12-14-2017 03:09 PM | |
2386 | 08-03-2016 02:49 PM | |
4413 | 07-26-2016 10:52 AM | |
3777 | 03-07-2016 12:47 PM |
05-18-2016
03:17 PM
1 Kudo
Hi, I want to update a Hbase table using Pig, I have an empty column family and I want to add columns to it from a pig object. I've done some researches but in vain ... Is this feature exists or not ? Thanks
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Pig
04-18-2016
01:12 PM
Hi @Joy
The solution you proposed worked, I'm getting a new error not related to a jar dependency. To make this fix permanent should I add that line to hadoop-env ?
... View more
04-18-2016
11:10 AM
1 Kudo
Hi,
I'm trying to connect from A java Application to Hbase like this:
Configuration config = HBaseConfiguration.create();
config.set("hbase.zookeeper.quorum", "localhost");
config.set("hbase.zookeeper.property.clientPort", "2181");
config.set("zookeeper.znode.parent", "/hbase-unsecure");
config.set("hbase.client.retries.number", Integer.toString(0));
config.set("zookeeper.session.timeout", Integer.toString(60000));
config.set("zookeeper.recovery.retry", Integer.toString(0));
Connection conn = ConnectionFactory.createConnection(config);
TableName TABLE_NAME = TableName.valueOf("weblog");
Table table = conn.getTable(TABLE_NAME);
Result r = table.get(new Get(Bytes.toBytes("row1")));
System.out.println(r);
I builded the App into a JAR but when running it on the cluster with :
hadoop jar hbaseConnect-0.0.1-SNAPSHOT.jar com.packagename.hbaseConnect.HbaseConnect
i get the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
at com.DigiMerket.hbaseConnect.HbaseConnect.main(HbaseConnect.java:23)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
I tried to add HBASE_CLASSPATH to HADOOP_CLASSPATH in hadoop-env like this post suggests but I get the same error..
... View more
Labels:
- Labels:
-
Apache HBase
04-08-2016
12:41 PM
@Benjamin Leonhardi Thank you for your answer, Do you have an idea about the best way to access the cluster (hdfs, Hbase, ...) and retrieve data easily ?
... View more
04-08-2016
09:44 AM
Hi all, I'd like to develop a front end to run various algorithms I developed and visualise results. I need my Front End to be personalised that's why I didn't use Hue. To achieve this I thought about developing a RESTful API using Java Jersey and Hive JDBC to be called from AngularJS. Is this a good choice ? or I've other alternatives (suggestions are welcome)? Does Hive JDBC support concurrency and simultaneous queries ?
... View more
Labels:
- Labels:
-
Apache Hive
03-07-2016
12:49 PM
@Neeraj Sabharwal Even in 0.11 it still exist .. see my answer below.
... View more
03-07-2016
12:47 PM
Resolved this ... Instead of using relative path like this : new Path("/testdata/points") you have to put the absolute Path of the directory in your cluster: new Path("hdfs://vm1.local:8020/user/root/testdata/points")
... View more
03-04-2016
10:58 AM
2 Kudos
I'm trying to use Mahout to do a clustering job, I've been struggling with it and maven for a week now ... My code works fine on eclipse on local machine but when i build it in a jar and send it to the cluster i get some errors reading from HDFS i guess. First I created the directory under /user/root/testdata hadoop fs -mkdir /user/root/testdata then put the downloaded file synthetic_control.data into it hadoop fs -put synthetic_control.data /user/root/testdata/ Finally run the example using: -mahout examples jar from mahout 0.9 downloaded from website: hadoop jar mahout-examples-1.0-SNAPSHOT-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job -and the mahout-examples-0.9.0.2.3.4.0-3485-job.jar file which is found in the mahout directory in the node: hadoop jar /usr/hdp/2.3.4.0-3485/mahout/mahout-examples-0.9.0.2.3.4.0-3485-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job and in both cases i get this error : WARNING: Use "yarn jar" to launch YARN applications.
16/03/04 11:57:03 INFO kmeans.Job: Running with default arguments
16/03/04 11:57:05 INFO common.HadoopUtil: Deleting output
16/03/04 11:57:05 INFO kmeans.Job: Preparing Input
16/03/04 11:57:05 INFO impl.TimelineClientImpl: Timeline service address: http://vm2.local:8188/ws/v1/timeline/
16/03/04 11:57:05 INFO client.RMProxy: Connecting to ResourceManager at vm1.local/10.10.10.1:8050
16/03/04 11:57:06 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/03/04 11:57:07 INFO input.FileInputFormat: Total input paths to process : 1
16/03/04 11:57:07 INFO mapreduce.JobSubmitter: number of splits:1
16/03/04 11:57:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1456915204500_0029
16/03/04 11:57:07 INFO impl.YarnClientImpl: Submitted application application_1456915204500_0029
16/03/04 11:57:07 INFO mapreduce.Job: The url to track the job: http://vm1.local:8088/proxy/application_1456915204500_0029/
16/03/04 11:57:07 INFO mapreduce.Job: Running job: job_1456915204500_0029
16/03/04 11:57:14 INFO mapreduce.Job: Job job_1456915204500_0029 running in uber mode : false
16/03/04 11:57:14 INFO mapreduce.Job: map 0% reduce 0%
16/03/04 11:57:20 INFO mapreduce.Job: map 100% reduce 0%
16/03/04 11:57:20 INFO mapreduce.Job: Job job_1456915204500_0029 completed successfully
16/03/04 11:57:20 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=129757
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=288502
HDFS: Number of bytes written=335470
HDFS: Number of read operations=5
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=3457
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=3457
Total vcore-seconds taken by all map tasks=3457
Total megabyte-seconds taken by all map tasks=3539968
Map-Reduce Framework
Map input records=600
Map output records=600
Input split bytes=128
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=76
CPU time spent (ms)=590
Physical memory (bytes) snapshot=113729536
Virtual memory (bytes) snapshot=2723696640
Total committed heap usage (bytes)=62324736
File Input Format Counters
Bytes Read=288374
File Output Format Counters
Bytes Written=335470
16/03/04 11:57:20 INFO kmeans.Job: Running random seed to get initial clusters
16/03/04 11:57:20 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
16/03/04 11:57:20 INFO compress.CodecPool: Got brand-new compressor [.deflate]
16/03/04 11:57:21 INFO kmeans.RandomSeedGenerator: Wrote 6 Klusters to output/random-seeds/part-randomSeed
16/03/04 11:57:21 INFO kmeans.Job: Running KMeans with k = 6
16/03/04 11:57:21 INFO kmeans.KMeansDriver: Input: output/data Clusters In: output/random-seeds/part-randomSeed Out: output
16/03/04 11:57:21 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10
16/03/04 11:57:21 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
16/03/04 11:57:21 INFO impl.TimelineClientImpl: Timeline service address: http://vm2.local:8188/ws/v1/timeline/
16/03/04 11:57:21 INFO client.RMProxy: Connecting to ResourceManager at vm1.local/10.10.10.1:8050
16/03/04 11:57:21 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/03/04 11:57:22 INFO input.FileInputFormat: Total input paths to process : 1
16/03/04 11:57:22 INFO mapreduce.JobSubmitter: number of splits:1
16/03/04 11:57:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1456915204500_0030
16/03/04 11:57:22 INFO impl.YarnClientImpl: Submitted application application_1456915204500_0030
16/03/04 11:57:22 INFO mapreduce.Job: The url to track the job: http://vm1.local:8088/proxy/application_1456915204500_0030/
16/03/04 11:57:22 INFO mapreduce.Job: Running job: job_1456915204500_0030
16/03/04 11:57:33 INFO mapreduce.Job: Job job_1456915204500_0030 running in uber mode : false
16/03/04 11:57:33 INFO mapreduce.Job: map 0% reduce 0%
16/03/04 11:57:37 INFO mapreduce.Job: Task Id : attempt_1456915204500_0030_m_000000_0, Status : FAILED
Error: java.lang.IllegalStateException: output/clusters-0
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:78)
at org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:208)
at org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.io.FileNotFoundException: File output/clusters-0 does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:574)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:70)
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:76)
... 10 more
16/03/04 11:57:42 INFO mapreduce.Job: Task Id : attempt_1456915204500_0030_m_000000_1, Status : FAILED
Error: java.lang.IllegalStateException: output/clusters-0
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:78)
at org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:208)
at org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.io.FileNotFoundException: File output/clusters-0 does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:574)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:70)
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:76)
... 10 more
16/03/04 11:57:46 INFO mapreduce.Job: Task Id : attempt_1456915204500_0030_m_000000_2, Status : FAILED
Error: java.lang.IllegalStateException: output/clusters-0
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:78)
at org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:208)
at org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.io.FileNotFoundException: File output/clusters-0 does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:574)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:70)
at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:76)
... 10 more
16/03/04 11:57:52 INFO mapreduce.Job: map 100% reduce 100%
16/03/04 11:57:53 INFO mapreduce.Job: Job job_1456915204500_0030 failed with state FAILED due to: Task failed task_1456915204500_0030_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
16/03/04 11:57:53 INFO mapreduce.Job: Counters: 13
Job Counters
Failed map tasks=4
Killed reduce tasks=1
Launched map tasks=4
Other local map tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=11687
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=11687
Total time spent by all reduce tasks (ms)=0
Total vcore-seconds taken by all map tasks=11687
Total vcore-seconds taken by all reduce tasks=0
Total megabyte-seconds taken by all map tasks=11967488
Total megabyte-seconds taken by all reduce tasks=0
Exception in thread "main" java.lang.InterruptedException: Cluster Iteration 1 failed processing output/clusters-1
at org.apache.mahout.clustering.iterator.ClusterIterator.iterateMR(ClusterIterator.java:183)
at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:224)
at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:147)
at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.run(Job.java:135)
at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:60)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
I guess it's a version problem ... Thanks.
... View more
Labels:
03-04-2016
10:42 AM
That solved the problem but the Map job stacked, and even after killing it the Yarn container still exists I had to kill it manually. i'll be back to this shortly.
... View more
03-01-2016
10:10 AM
2 Kudos
I'm following this tutorial: http://hortonworks.com/blog/using-r-and-other-non-java-languages-in-mapreduce-and-hive/ I put cities.txt in /user/root/ and the R script as following : #!/usr/bin/env Rscript
f <- file("stdin")
open(f)
state_data = read.table(f)
summary(state_data) and then run the command: hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming-2.7.1.2.3.4.0-3485.jar -input /user/root/cities.txt -output /user/root/streamer -mapper /bin/cat -reducer script.R -numReduceTasks 2 -file script.R
Map works till 100% and reduce shows this error: 16/03/01 11:06:30 INFO mapreduce.Job: map 100% reduce 50%
16/03/01 11:06:34 INFO mapreduce.Job: Task Id : attempt_1456773989186_0009_r_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Does any one have any idea or encountered this before ? Thanks.
... View more
Labels:
- Labels:
-
Apache Hadoop