Member since
12-14-2015
27
Posts
22
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
11899 | 03-17-2016 08:39 AM |
06-09-2017
12:00 PM
Hi @Rob Ketcherside, Finally the support of Hortonworks find a solution. We had to copy the hbase-site.xml file in the folder /etc/spark/conf on the node that submit the job. Only on that node. For me it's a little bit a "magic solution" which I don't understand but it's working....
If you have an explaination why, I'm really interested :)
Michel
... View more
04-24-2017
08:54 AM
Hi Josh,
Should it also work when we use the function saveAsNewAPIHadoopDataset over a rdd of "JavaPairRDD<ImmutableBytesWritable, Put>"? I tried with an without the doas and I was not able to make it work. I don't get any errors just nothing happen.
Any idea?
Thanks,
Michel
... View more
04-12-2017
12:14 PM
Noone had this issue? Or know how to handle that?
... View more
04-04-2017
12:23 PM
Hi, I have a Java code that read a file and insert the data in a hbase table. Without Kerberos it's working well.
I tried my code on a cluster with isilon and kerberos enable. The part of the code that read the file and count the number of lines work well but once it come the code where he use the function saveAsNewAPIHadoopDataset, nothing happen. The appliction seams to be stuck, no log are print in the shell or in any executor. Here you have the code use: public static void ingestProcodacFolderKerberosDev(String file, int numP, String table) {
try {
String zookeeper = "zk0001.bc,zk0002.bc,zk0003.bc";
Configuration confFS = new Configuration();
confFS.addResource("/etc/hadoop/conf/core-site.xml");
confFS.addResource("/etc/hadoop/conf/hdfs-site.xml");
FileSystem dfs = FileSystem.get(confFS);
SparkConf conf = new SparkConf().setAppName("TEST - ingest Procodac Dev Ker");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> procodac = sc.textFile(file, numP);
Configuration config = null;
try {
config = HBaseConfiguration.create();
config.set("hbase.zookeeper.quorum", zookeeper);
config.set("zookeeper.znode.parent", "/hbase-secure");
config.addResource("/user/hdp/current/hbase-client/config/core-site.xml");
config.addResource("/user/hdp/current/hbase-client/config/hbase-site.xml");
config.addResource("/user/hdp/current/hbase-client/config/hdfs-site.xml");
} catch (Exception ce) {
ce.printStackTrace();
}
JavaRDD<String> procodacFiltre = procodac.filter(new Function<String, Boolean>() {
public Boolean call(String s) {
return (s.split("\t").length >= 4);
}
});
System.out.println("!!!!!!!!!!!! " + procodacFiltre.count());
config.set(TableOutputFormat.OUTPUT_TABLE, table);
Job newAPIJobConfiguration1 = Job.getInstance(config);
newAPIJobConfiguration1.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, table);
newAPIJobConfiguration1.setOutputFormatClass(org.apache.hadoop.hbase.mapreduce.TableOutputFormat.class);
JavaPairRDD<ImmutableBytesWritable, Put> hbasePuts = procodacFiltre.mapToPair(
new PairFunction<String, ImmutableBytesWritable, Put>() {
@Override
public Tuple2<ImmutableBytesWritable, Put> call(String t) throws Exception {
String[] elements = t.split("\t");
String id;
String colFam;
id = elements[0] + "_" + elements[2];
if (t.contains("DOWNstream")) {
colFam = "downstream";
} else {
colFam = "upstream";
}
String fields[] = new String[4];
fields[0] = "phone";
fields[1] = "port";
fields[2] = "date";
fields[3] = "type";
Put put = new Put(Bytes.toBytes(id));
for (int g = 0; g < elements.length; g++) {
if (fields[g] != null) {
put.add(Bytes.toBytes(colFam), Bytes.toBytes(fields[g]), Bytes.toBytes(elements[g]));
}
}
return new Tuple2<>(new ImmutableBytesWritable(id.getBytes()), put);
}
});
// save to HBase
hbasePuts.saveAsNewAPIHadoopDataset(newAPIJobConfiguration1.getConfiguration());
} catch (IOException ex) {
Logger.getLogger(procodac.class.getName()).log(Level.SEVERE, null, ex);
}
} Before starting the application I'm doing a kinit command, then the spark-submit. I know htat the kerberos authentification is working because the spark jobis able to read the file on hdfs and to count hte number of line as you can see in the following screenshot (second line:" !!!!!!!!! 4247031") At the end of the screenshot you can see that it start several task, but nothing happen anywhere. For info, I'm using HDP2.5 with Spark 1.6.2 Does any one have an idea how to solve that? Thanks in advance for your help 🙂 Michel
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Spark
08-11-2016
01:18 PM
1 Kudo
Hello, I create a phoenix view on an existing Hbase table wher all hte col family name are lowercas. the to be able to query it I have to put all the time " before and after every column name. It's a little bit annoysing, do you have a workaroung? 🙂 Thanks,
Michel
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Phoenix
08-09-2016
07:31 PM
Ok super thanks, I didn't try because it was not refer in the documentation that the function like work. 🙂
... View more
08-09-2016
07:14 PM
Hello, I have a phoenix view on an existing hbase table. I don't find in the documentation how to select every lines of a table where the column "name" match with a specific regex or function on string like contain, start with, etc.
For example in "traditional sql": select name,phone from table1 where like "%ab%;
or every ligne with a rowkey that start with "blabla". Thanks in advance to point me how to do it easily (I hope that's not needed to use UDF...) Michel
... View more
Labels:
- Labels:
-
Apache Phoenix
06-21-2016
10:05 AM
Thanks for the info.
Then what will be your best advice to improve performance of hive over hbase?
... View more
06-21-2016
09:59 AM
Hi, I have Hive running over HBase table. Is it interesting in this case to analyze the table for the CBO? Or is it only interesting whe nyou have hive with orc file? Thanks in advance, Michel
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Hive
05-09-2016
07:40 PM
Hello, I have "stupid" question of a beginner in SparkSQL. 🙂 Just to correctly understand how it works: If I have a HBase table with 28Tera of data (90Billions of lines) and only a few tera of memory in the cluster, what's going to happen? The dataframe will be bigger that the available memory, so is it going to swap? It will crash? I would like to clearly understand the mecanisme that spark used to manage that. Also do you have any recommendation in term of infrastructure to handle this kind of database? Thanks in advance for all the info 🙂 Michel
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Spark
04-28-2016
02:30 PM
2 Kudos
Hi, I would like additional information about encryption in HBase. HBase is working over HDFS and HDFS support encryption so when the data is store, they are encrypted, right? I would like also the encrypt the content into a cell. So encryption into the cell + encryption of the HFile (hdfs) is those features available? Can you point me nice documentation, examples on how to encrypt data into the cell? I would like also to change the key encryption every hours, any idea how to manage that in a "easy" way? 🙂 Thanks in advance, Michel
... View more
Labels:
03-17-2016
08:39 AM
1 Kudo
The problem is solve by making the following change in the spark config: Thanks for the help guys!
... View more
03-14-2016
10:31 AM
@Artem Ervits which property?
... View more
02-24-2016
08:59 AM
1 Kudo
@Neeraj Sabharwal Thanks for the reply, In my case it's not a solution because when I'm doing hadoop checknative -a
I see that the snappy lib is true located at / usr/hdp/2.3.4.0-3485/hadoop/lib/native/libsnappy.so.1.
... View more
02-24-2016
08:52 AM
2 Kudos
@Artem Ervits I just made the test with the example of the definitve guide and I still have exactly the same error: Exception in thread "main" java.lang.RuntimeException: native snappy
library not available: this version of libhadoop was built without
snappy support. Any idea?
... View more
02-23-2016
03:22 PM
1 Kudo
Hi Artem, Thanks for the fast reply. I don't really understand how it will work without the compressedOutput.write(myLine.getBytes());
compressedOutput.write('\n'); } }
compressedOutput.flush();
compressedOutput.close();
How it will write to hdfs? Also if I remove the first part, when the configuration will be use? Can you give me an example because, I don't see how it works without the part that you specify :s Thanks in advance
... View more
02-23-2016
02:44 PM
2 Kudos
here's the piece of code: Path outFile = new Path(destPathFolder.toString() + "/" + listFolder[i].getName() + "_" + listFiles[b].getName() + ".txt");
FSDataOutputStream fin = dfs.create(outFile);
Configuration conf = new Configuration();
conf.setBoolean("mapreduce.map.output.compress", true);
conf.set("mapreduce.map.output.compress.codec", "org.apache.hadoop.io.compress.SnappyCodec");
CompressionCodecFactory codecFactory = new CompressionCodecFactory(conf);
CompressionCodec codec = codecFactory.getCodecByName("SnappyCodec");
CompressionOutputStream compressedOutput = codec.createOutputStream(fin);
FileReader input = new FileReader(listFiles[b]);
BufferedReader bufRead = new BufferedReader(input);
String myLine = null;
while ((myLine = bufRead.readLine()) != null) {
if (!myLine.isEmpty()) {
compressedOutput.write(myLine.getBytes());
compressedOutput.write('\n'); } }
compressedOutput.flush();
compressedOutput.close();
... View more
02-23-2016
02:37 PM
3 Kudos
Hi, I hope it's the right place to ask the following question 🙂 I try to put in hdfs a file with snappy compression. I write a Java code for that and when I try to run it on my cluster I got the following exception: Exception
in thread "main" java.lang.RuntimeException: native snappy library
not available: this version of libhadoop was built without snappy support.
at
org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
at
org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:134)
at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150)
at
org.apache.hadoop.io.compress.CompressionCodec$Util.createOutputStreamWithCodecPool(CompressionCodec.java:131)
at
org.apache.hadoop.io.compress.SnappyCodec.createOutputStream(SnappyCodec.java:99) Apparently the snappy library is not available... I check on the os with the following cmd "rpm -qa | less | grep snappy" and snappy and snappy-devel is present. In the configuration of hdfs (core-site.xml) org.apache.hadoop.io.compress.SnappyCodec is present in the field io.compression.codecs. Does anyone has a idea why it's not working? Thanks in advance
... View more
Labels:
- Labels:
-
Apache Hadoop
-
HDFS
02-16-2016
09:49 AM
1 Kudo
Thanks for the fast reply guys! 🙂
... View more
02-12-2016
02:05 PM
3 Kudos
Hi, Is there a way to specify how much ressource an user will be able to use for his HBase query? The objectif is to be able to define group A that can use 40 % of the ressouce and group B 60%, thoses 2 groups have to query the same hbase cluster. Some thing like we can do with the yarn queue manager but on every query lunch by every user? Thanks for the info, Michel
... View more
Labels:
- Labels:
-
Apache HBase
-
Cloudera Manager
02-01-2016
01:22 PM
Okok. Stupid question: How to show the list of the table in phoenix from zeppelin. I try: show table, show tables, list table, !table Noone is working, is it normal? Thanks 🙂
... View more
02-01-2016
01:03 PM
2 Kudos
Hi, I setup Zeppelin, %spark and %hive is workign perfectly but I have somethin special with %sql For the context, I have hbase table and hive is able to query the hbase table. When I'm doing %sql show tables I can see all the table name but when I doing hte following query: %sql Select * from table1 Then I got the following error: MetaException(message:java.lang.ClassNotFoundException Class
org.apache.hadoop.hive.hbase.HBaseSerDe not found)
at
org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:346) at
org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:288) at
org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:281)
at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:631)
at
org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:189)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1017)
at
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1.apply(ClientWrapper.scala:202) at
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1.apply(ClientWrapper.scala:198) at
org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:156) at
org.apache.spark.sql.hive.client.ClientWrapper.getTableOption(ClientWrapper.scala:198) at
org.apache.spark.sql.hive.client.ClientInterface$class.getTable(ClientInterface.scala:112) at
org.apache.spark.sql.hive.client.ClientWrapper.getTable(ClientWrapper.scala:61) at
org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:227) at .................... Any idea? Thanks in advance, Michel
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Hive
-
Apache Zeppelin
01-27-2016
03:29 PM
Hi, I'm new to Nifi, I would like to know how to download file from multiple servers in parallel (SFTP)? The number of servers can change over the time, the list of server(hostname) is store in Hive. So my second question is, how to have as the input of the getSFP the result of the hive query that may content several hostname? I don't clear see how to do that in the documentation, anyone can help me? Thanks in advance, Michel
... View more
Labels:
- Labels:
-
Apache NiFi
01-24-2016
07:39 PM
Thanks guys for the fast reply! @nsabharwal : If I correctly understand the slides, I should expect a raise of the CPU usage between 5% and 60% depending of the compression algorythm. That can be really important! @aervits : do you have some benchmarks, test results to have a idea? Many thanks guys! Michel
... View more
01-24-2016
02:31 PM
1 Kudo
Hi, I'm asking myself how to have a good idea of the impact on the performance to use compression in HDFS? This question is important because, if I implement the compression, should I considere increasing the CPU need of 10%, 20 %, 30% for the same performance? I know that I can win on performance because less IOPS will be needed but what's about CPU? I would like also ask, what will be the impact on HBase performance? Many thanks in advance!
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache HBase
12-14-2015
10:08 AM
1 Kudo
hello, I would like to perform and upgrade of the java 1.7 to java 1.8 of my cluster. I find the following documentation: http://docs.hortonworks.com/HDPDocuments/Ambari-2.... For the ambari-server, it's very clear, the document says: If you choose Oracle JDK 1.8 or Oracle JDK 1.7, the JDK you choose downloads and
installs automatically on the Ambari Server host. This option requires that you have an
internet connection. You must install this JDK on all hosts in the cluster to this same
path. Can someone explain me clearly what I have to do, and how, on the others hosts? I understand that I have to install manually the same jdk on the others hosts but what will be the path? Should I uninstall previous version? Should I also change any configuration file on the others hosts? Many thanks in advance, Michel
... View more
Labels:
- Labels:
-
Apache Ambari