About zaher_mahdhi

zaher_mahdhi · ‎05-18-2016

Hi, I want to update a Hbase table using Pig, I have an empty column family and I want to add columns to it from a pig object. I've done some researches but in vain ... Is this feature exists or not ? Thanks

zaher_mahdhi · ‎04-18-2016

Hi @Joy The solution you proposed worked, I'm getting a new error not related to a jar dependency. To make this fix permanent should I add that line to hadoop-env ?

zaher_mahdhi · ‎04-18-2016

Hi, I'm trying to connect from A java Application to Hbase like this: Configuration config = HBaseConfiguration.create(); config.set("hbase.zookeeper.quorum", "localhost"); config.set("hbase.zookeeper.property.clientPort", "2181"); config.set("zookeeper.znode.parent", "/hbase-unsecure"); config.set("hbase.client.retries.number", Integer.toString(0)); config.set("zookeeper.session.timeout", Integer.toString(60000)); config.set("zookeeper.recovery.retry", Integer.toString(0)); Connection conn = ConnectionFactory.createConnection(config); TableName TABLE_NAME = TableName.valueOf("weblog"); Table table = conn.getTable(TABLE_NAME); Result r = table.get(new Get(Bytes.toBytes("row1"))); System.out.println(r); I builded the App into a JAR but when running it on the cluster with : hadoop jar hbaseConnect-0.0.1-SNAPSHOT.jar com.packagename.hbaseConnect.HbaseConnect i get the following error: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration at com.DigiMerket.hbaseConnect.HbaseConnect.main(HbaseConnect.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 7 more I tried to add HBASE_CLASSPATH to HADOOP_CLASSPATH in hadoop-env like this post suggests but I get the same error..

zaher_mahdhi · ‎04-08-2016

@Benjamin Leonhardi Thank you for your answer, Do you have an idea about the best way to access the cluster (hdfs, Hbase, ...) and retrieve data easily ?

zaher_mahdhi · ‎04-08-2016

Hi all, I'd like to develop a front end to run various algorithms I developed and visualise results. I need my Front End to be personalised that's why I didn't use Hue. To achieve this I thought about developing a RESTful API using Java Jersey and Hive JDBC to be called from AngularJS. Is this a good choice ? or I've other alternatives (suggestions are welcome)? Does Hive JDBC support concurrency and simultaneous queries ?

zaher_mahdhi · ‎03-07-2016

@Neeraj Sabharwal Even in 0.11 it still exist .. see my answer below.

zaher_mahdhi · ‎03-07-2016

Resolved this ... Instead of using relative path like this : new Path("/testdata/points") you have to put the absolute Path of the directory in your cluster: new Path("hdfs://vm1.local:8020/user/root/testdata/points")

zaher_mahdhi · ‎03-04-2016

I'm trying to use Mahout to do a clustering job, I've been struggling with it and maven for a week now ... My code works fine on eclipse on local machine but when i build it in a jar and send it to the cluster i get some errors reading from HDFS i guess. First I created the directory under /user/root/testdata hadoop fs -mkdir /user/root/testdata then put the downloaded file synthetic_control.data into it hadoop fs -put synthetic_control.data /user/root/testdata/ Finally run the example using: -mahout examples jar from mahout 0.9 downloaded from website: hadoop jar mahout-examples-1.0-SNAPSHOT-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job -and the mahout-examples-0.9.0.2.3.4.0-3485-job.jar file which is found in the mahout directory in the node: hadoop jar /usr/hdp/2.3.4.0-3485/mahout/mahout-examples-0.9.0.2.3.4.0-3485-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job and in both cases i get this error : WARNING: Use "yarn jar" to launch YARN applications. 16/03/04 11:57:03 INFO kmeans.Job: Running with default arguments 16/03/04 11:57:05 INFO common.HadoopUtil: Deleting output 16/03/04 11:57:05 INFO kmeans.Job: Preparing Input 16/03/04 11:57:05 INFO impl.TimelineClientImpl: Timeline service address: http://vm2.local:8188/ws/v1/timeline/ 16/03/04 11:57:05 INFO client.RMProxy: Connecting to ResourceManager at vm1.local/10.10.10.1:8050 16/03/04 11:57:06 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 16/03/04 11:57:07 INFO input.FileInputFormat: Total input paths to process : 1 16/03/04 11:57:07 INFO mapreduce.JobSubmitter: number of splits:1 16/03/04 11:57:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1456915204500_0029 16/03/04 11:57:07 INFO impl.YarnClientImpl: Submitted application application_1456915204500_0029 16/03/04 11:57:07 INFO mapreduce.Job: The url to track the job: http://vm1.local:8088/proxy/application_1456915204500_0029/ 16/03/04 11:57:07 INFO mapreduce.Job: Running job: job_1456915204500_0029 16/03/04 11:57:14 INFO mapreduce.Job: Job job_1456915204500_0029 running in uber mode : false 16/03/04 11:57:14 INFO mapreduce.Job: map 0% reduce 0% 16/03/04 11:57:20 INFO mapreduce.Job: map 100% reduce 0% 16/03/04 11:57:20 INFO mapreduce.Job: Job job_1456915204500_0029 completed successfully 16/03/04 11:57:20 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=129757 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=288502 HDFS: Number of bytes written=335470 HDFS: Number of read operations=5 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=3457 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=3457 Total vcore-seconds taken by all map tasks=3457 Total megabyte-seconds taken by all map tasks=3539968 Map-Reduce Framework Map input records=600 Map output records=600 Input split bytes=128 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=76 CPU time spent (ms)=590 Physical memory (bytes) snapshot=113729536 Virtual memory (bytes) snapshot=2723696640 Total committed heap usage (bytes)=62324736 File Input Format Counters Bytes Read=288374 File Output Format Counters Bytes Written=335470 16/03/04 11:57:20 INFO kmeans.Job: Running random seed to get initial clusters 16/03/04 11:57:20 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 16/03/04 11:57:20 INFO compress.CodecPool: Got brand-new compressor [.deflate] 16/03/04 11:57:21 INFO kmeans.RandomSeedGenerator: Wrote 6 Klusters to output/random-seeds/part-randomSeed 16/03/04 11:57:21 INFO kmeans.Job: Running KMeans with k = 6 16/03/04 11:57:21 INFO kmeans.KMeansDriver: Input: output/data Clusters In: output/random-seeds/part-randomSeed Out: output 16/03/04 11:57:21 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 16/03/04 11:57:21 INFO compress.CodecPool: Got brand-new decompressor [.deflate] 16/03/04 11:57:21 INFO impl.TimelineClientImpl: Timeline service address: http://vm2.local:8188/ws/v1/timeline/ 16/03/04 11:57:21 INFO client.RMProxy: Connecting to ResourceManager at vm1.local/10.10.10.1:8050 16/03/04 11:57:21 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 16/03/04 11:57:22 INFO input.FileInputFormat: Total input paths to process : 1 16/03/04 11:57:22 INFO mapreduce.JobSubmitter: number of splits:1 16/03/04 11:57:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1456915204500_0030 16/03/04 11:57:22 INFO impl.YarnClientImpl: Submitted application application_1456915204500_0030 16/03/04 11:57:22 INFO mapreduce.Job: The url to track the job: http://vm1.local:8088/proxy/application_1456915204500_0030/ 16/03/04 11:57:22 INFO mapreduce.Job: Running job: job_1456915204500_0030 16/03/04 11:57:33 INFO mapreduce.Job: Job job_1456915204500_0030 running in uber mode : false 16/03/04 11:57:33 INFO mapreduce.Job: map 0% reduce 0% 16/03/04 11:57:37 INFO mapreduce.Job: Task Id : attempt_1456915204500_0030_m_000000_0, Status : FAILED Error: java.lang.IllegalStateException: output/clusters-0 at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:78) at org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:208) at org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.io.FileNotFoundException: File output/clusters-0 does not exist at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555) at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:574) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555) at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:70) at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:76) ... 10 more 16/03/04 11:57:42 INFO mapreduce.Job: Task Id : attempt_1456915204500_0030_m_000000_1, Status : FAILED Error: java.lang.IllegalStateException: output/clusters-0 at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:78) at org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:208) at org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.io.FileNotFoundException: File output/clusters-0 does not exist at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555) at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:574) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555) at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:70) at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:76) ... 10 more 16/03/04 11:57:46 INFO mapreduce.Job: Task Id : attempt_1456915204500_0030_m_000000_2, Status : FAILED Error: java.lang.IllegalStateException: output/clusters-0 at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:78) at org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:208) at org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.io.FileNotFoundException: File output/clusters-0 does not exist at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555) at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:574) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555) at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:70) at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:76) ... 10 more 16/03/04 11:57:52 INFO mapreduce.Job: map 100% reduce 100% 16/03/04 11:57:53 INFO mapreduce.Job: Job job_1456915204500_0030 failed with state FAILED due to: Task failed task_1456915204500_0030_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0 16/03/04 11:57:53 INFO mapreduce.Job: Counters: 13 Job Counters Failed map tasks=4 Killed reduce tasks=1 Launched map tasks=4 Other local map tasks=3 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=11687 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=11687 Total time spent by all reduce tasks (ms)=0 Total vcore-seconds taken by all map tasks=11687 Total vcore-seconds taken by all reduce tasks=0 Total megabyte-seconds taken by all map tasks=11967488 Total megabyte-seconds taken by all reduce tasks=0 Exception in thread "main" java.lang.InterruptedException: Cluster Iteration 1 failed processing output/clusters-1 at org.apache.mahout.clustering.iterator.ClusterIterator.iterateMR(ClusterIterator.java:183) at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:224) at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:147) at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.run(Job.java:135) at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:60) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) I guess it's a version problem ... Thanks.

zaher_mahdhi · ‎03-04-2016

That solved the problem but the Map job stacked, and even after killing it the Yarn container still exists I had to kill it manually. i'll be back to this shortly.

zaher_mahdhi · ‎03-01-2016

I'm following this tutorial: http://hortonworks.com/blog/using-r-and-other-non-java-languages-in-mapreduce-and-hive/ I put cities.txt in /user/root/ and the R script as following : #!/usr/bin/env Rscript f <- file("stdin") open(f) state_data = read.table(f) summary(state_data) and then run the command: hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming-2.7.1.2.3.4.0-3485.jar -input /user/root/cities.txt -output /user/root/streamer -mapper /bin/cat -reducer script.R -numReduceTasks 2 -file script.R Map works till 100% and reduce shows this error: 16/03/01 11:06:30 INFO mapreduce.Job: map 100% reduce 50% 16/03/01 11:06:34 INFO mapreduce.Job: Task Id : attempt_1456773989186_0009_r_000001_0, Status : FAILED Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535) at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134) at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Does any one have any idea or encountered this before ? Thanks.

Online	Offline
Last Visited	‎04-02-2019 03:03 PM

Member Since	‎02-08-2016 11:50 PM
Last Visited	‎04-02-2019 03:03 PM
Posts	36
Kudos received	18

Cloudera Community

Re: Problem running a jar in kerberized cluster

Re: Oozie coordinator timezone

Re: java.lang.OutOfMemoryError: Java heap space wi...

Re: Running Mahout examples problem

update a Hbase table using PIG

Re: Hbase Java API connection error

Hbase Java API connection error

Re: Develop a RESTful API for a Front End

Develop a RESTful API for a Front End

Re: Running Mahout examples problem

Re: Running Mahout examples problem

Running Mahout examples problem

Re: Error when using MapReduce Streaming

Error when using MapReduce Streaming