Member since
05-15-2017
7
Posts
0
Kudos Received
0
Solutions
08-02-2017
10:13 AM
Hi Bob, The Data in the RDD needs to use the hive-streaming API (Which internally uses map-reduce) which uses following classes: org.apache.hive.hcatalog.streaming.{HiveEndPoint, StreamingConnection, StrictJsonWriter, TransactionBatch} so is this possible to be invoked from a spawned hive context, instead of being invoked as a separate JVM ? Thanks.
... View more
07-31-2017
12:20 PM
Hi, I have this requirement of initiating another map-red job (using hive in the same cluster) using the info from each RDD. I need to know if I would be able to run a JVM (doing the same hive mapred job) inside a SPARK job, while processing each RDD. If this is possible what is the procedure to achieve this. Any sample code/ documentation would be helpful. I use the HDP sandbox with: Hadoop version : 2.7.3.2.6.0.3-8 Spark Version : 2.1.0.2.6.0.3-8 Hive version: 2.1.0.2.6.0.3-8 Thanks.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
-
Apache Spark
07-18-2017
02:45 PM
Hello, I am using secured hive streaming , in a cluster with Kerberos Authentication. I use hadoop : 2.7.3.2.6.1.0-129 hive : 1.2.1000.2.6.1.0-129 I am able to get the Ugi authenticated with my kerberos id and keytab at ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI("user@XX.YYYY.NET", "/home/xx/user.keytab"); but later when the fetchTransactionBatch is called, Kerberos IO failure occurs at TransactionBatch txnBatch = secureConn.fetchTransactionBatch(numTrx, writer); code snippet: HiveEndPoint hiveEP = new HiveEndPoint(hiveConf.getVar(ConfVars.METASTOREURIS), database, table, partArray);
Configuration conf = new Configuration();
conf.set("hadoop.security.authentication", "kerberos");
System.setProperty( "java.security.krb5.conf", "//etc//krb5.conf");
UserGroupInformation ugi;
UserGroupInformation.setConfiguration(conf);
ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI("user@XX.YYYY.NET", "/home/xx/user.keytab");
if (ugi==null)
{
return;
}
UserGroupInformation.setLoginUser(ugi);
StreamingConnection secureConn = hiveEP.newConnection(true, hiveConf, ugi);
int numTrx = 2;
String[] fieldNames = {"number", "name"};
boolean ifc = ugi.hasKerberosCredentials();
DelimitedInputWriter writer = new DelimitedInputWriter(fieldNames,",", hiveEP);
TransactionBatch txnBatch = secureConn.fetchTransactionBatch(numTrx, writer); Exception at : TransactionBatch txnBatch = secureConn.fetchTransactionBatch(numTrx, writer); java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1884)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.fetchTransactionBatch(HiveEndPoint.java:424)
at ns.KClass.<init>(KClass.java:112)
at ns.KClass.main(KClass.java:45)
Caused by: org.apache.hive.hcatalog.streaming.StreamingIOFailure: Failed creating RecordUpdaterS for hdfs://MAIN/hdfs/path/to/database.db/acid1 txnIds[113,114]
at org.apache.hive.hcatalog.streaming.AbstractRecordWriter.newBatch(AbstractRecordWriter.java:193) caused by: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "lh.net/1.2.3.4"; destination host is: "yyy.net":8020; Can someone tell me how to solve this exception and get the Transaction batch. Thanks.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
06-13-2017
01:52 PM
Earlier it was 2, but when I increase this the throughput improves, but then standard hive api can do these number of rows in single transaction itself, how come this takes more transactions in Nifi
... View more
06-13-2017
09:18 AM
Hello, I am using NIFI's -- PutHiveSteaming processor with all the recommended settings, but am getting a very low performance-- few hundred rows per second. I m using single avro files with thousands of record for each insertion as 2 transactions. Also tried to increase the number of concurrent tasks, but this also doesnt work. Can some one shed some light on how to improve performace -- (to around hundred thoursand in one second) in this regard please..
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache NiFi
05-16-2017
09:14 AM
I need to do hive streaming on a huge volume of data. But I have only one Nifi instance installed outside the cluster. So I intended to call that nifi hive streaming within spark job in a cluster mode. If this is not feasible, then is there any way I can improve the hive streaming throughput using nifi ..
... View more