Member since
10-01-2015
3933
Posts
1150
Kudos Received
374
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3649 | 05-03-2017 05:13 PM | |
| 3004 | 05-02-2017 08:38 AM | |
| 3261 | 05-02-2017 08:13 AM | |
| 3215 | 04-10-2017 10:51 PM | |
| 1680 | 03-28-2017 02:27 AM |
03-18-2016
06:46 PM
1 Kudo
@Sunile Manjee this is the tool I was looking for to help you https://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/replication/regionserver/ReplicationSyncUp.html
... View more
03-18-2016
06:22 PM
you can also identify inconsistencies using this tool https://hbase.apache.org/book.html#_verifying_replicated_data
... View more
03-18-2016
06:21 PM
1 Kudo
what version of HBase is it? Here's a sync tool introduced in 1.2 https://issues.apache.org/jira/browse/HBASE-13639
... View more
03-18-2016
06:13 PM
@Benjamin Leonhardi it doesn't matter whether table is text or ORC, percentage for tablesample is not working. @gopal is this a bug? hive> SELECT * FROM medicare_part_b.medicare_part_b_2013_orc TABLESAMPLE(20 percent);
FAILED: SemanticException 1:67 Percentage sampling is not supported in org.apache.hadoop.hive.ql.io.HiveInputFormat. Error encountered near token '20'
hive> SELECT * FROM medicare_part_b.medicare_part_b_2013_text TABLESAMPLE(20 percent);
FAILED: SemanticException 1:68 Percentage sampling is not supported in org.apache.hadoop.hive.ql.io.HiveInputFormat. Error encountered near token '20'
hive> SELECT * FROM medicare_part_b.medicare_part_b_2013_raw TABLESAMPLE(20 percent);
FAILED: SemanticException 1:67 Percentage sampling is not supported in org.apache.hadoop.hive.ql.io.HiveInputFormat. Error encountered near token '20'
... View more
03-18-2016
04:21 PM
@sivasaravanakumar k I don't see hadoop included. Why not use maven instead of including each jar manually? Take a look at my example https://github.com/dbist/HBaseNewApi.git specifically pom.xml that's all you need.
... View more
03-18-2016
02:34 PM
1 Kudo
@sivasaravanakumar k you need to include hbase-client and hadoop-client in your dependencies. What version of HBase and Hadoop are you using?
... View more
03-18-2016
03:00 AM
excellent, please accept the best answer.
... View more
03-18-2016
02:06 AM
here's a full script, piggybank is both in pig-client/lib and in pig-client directory REGISTER /usr/hdp/current/pig-client/piggybank.jar;
A = LOAD 'data2' USING PigStorage() as (url, count);
fs -rm -R output;
STORE A INTO 'output' USING org.apache.pig.piggybank.storage.MultiStorage('output', '0');
my dataset is 1
2
3
4
5 output would be -rw-r--r-- 3 root hdfs 3 2016-03-18 01:51 /user/root/output/1/1-0,000
Found 1 items
-rw-r--r-- 3 root hdfs 3 2016-03-18 01:51 /user/root/output/2/2-0,000
Found 1 items
-rw-r--r-- 3 root hdfs 3 2016-03-18 01:51 /user/root/output/3/3-0,000
Found 1 items
-rw-r--r-- 3 root hdfs 3 2016-03-18 01:51 /user/root/output/4/4-0,000
Found 1 items
-rw-r--r-- 3 root hdfs 3 2016-03-18 01:51 /user/root/output/5/5-0,000
-rw-r--r-- 3 root hdfs 0 2016-03-18 01:51 /user/root/output/_SUCCESS
and each file would contain one line [root@sandbox ~]# hdfs dfs -cat /user/root/output/5/5-0,000
5
in case of @Rich Raposa example the output directory would look like so: [root@sandbox ~]# hdfs dfs -ls output3
Found 6 items
-rw-r--r-- 3 root hdfs 0 2016-03-18 01:59 output3/_SUCCESS
-rw-r--r-- 3 root hdfs 3 2016-03-18 01:59 output3/part-v003-o000-r-00000
-rw-r--r-- 3 root hdfs 3 2016-03-18 01:59 output3/part-v003-o000-r-00001
-rw-r--r-- 3 root hdfs 3 2016-03-18 01:59 output3/part-v003-o000-r-00002
-rw-r--r-- 3 root hdfs 3 2016-03-18 01:59 output3/part-v003-o000-r-00003
-rw-r--r-- 3 root hdfs 3 2016-03-18 01:59 output3/part-v003-o000-r-00004
which means with PARALLEL it creates multiple files within the same directory. In terms of MultiStorage, it created a separate directory and separate file. Additionally with MultiStorage you can pass compression, granted it's bz2, gz, no snappy and delimiter. It's clunky and documentation is not the best but if you need that type of control, it's an option.
... View more
03-17-2016
05:33 PM
yep, Java is the way to go. Try mapreduce with Java, it's not too bad, https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html
... View more
03-17-2016
05:29 PM
@Marco Lanza you can take a look at hadoop pipes or hadoop streaming to leverage a different language than Java. I think if you plan to learn MapReduce on Hortonworks platform, then invest into Java. There's also http://www.cascading.org/, then there are a couple of higher level languages like Apache Pig or Apache Hive that have smaller learning curve. You can also look at Apache Spark as that's where Big Data industry is going and there you have multiple language support including C# http://research.microsoft.com/en-us/projects/spark-clr/ hadoop streaming reference below https://hadoop.apache.org/docs/current/hadoop-streaming/HadoopStreaming.html
... View more