Member since
01-03-2017
181
Posts
44
Kudos Received
24
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1872 | 12-02-2018 11:49 PM | |
2511 | 04-13-2018 06:41 AM | |
2075 | 04-06-2018 01:52 AM | |
2373 | 01-07-2018 09:04 PM | |
5733 | 12-20-2017 10:58 PM |
04-06-2018
01:52 AM
1 Kudo
Hi @Matt Andruff, The approach I come got to know while on the DML operation is that, when we insert the data, it creates a temp table under the database you resides / rather the database you are trying to insert. in your case, since you have not switch any database it is defaulted to "default" database (or you may selected default database) hence its trying to create the table underneath. rather provisioning ranger policies for access on default database, use database to target/scratch database will avoid additional roles. for the same issue you have raised there is an open Jria linked - https://issues.apache.org/jira/browse/HIVE-15317 Hope this helps !!
... View more
03-06-2018
01:50 AM
Hi @tomoya yoshida, the container is allocated with bigger sizes but I believe the mapper and reducer memory is not increased to utilize the entire container memory you have allocated. could you please look at the values for following two variables mapreduce.map.memory.mb and mapreduce.reduce.memory.mb and you could make the vaules as same as the container size. there is nice HCC article explaining how this allocations work here alternately if you don't have any more resources you may use the mr-engine instead of Tez ( though it is slow will complete the tasks with lesser concurrent memory utilization) // to change the engine run "hive.execution.engine=mr" for this query. Hope this helps !!
... View more
03-06-2018
01:27 AM
Hi @yogesh turkane, As I was across, We can achieve this with two ways. Post the load of the data or with schedule intervals run the "ALTER TABLE <table_name> CONCATENATE" on the table in SQL api this will merge all the small orc files associated to that table. - Please not that this is specific to ORC Use the data frame to load the data and re-partition write back with overwrite in spark. The code snippet would be val tDf = hiveContext.table("table_name")
tdf.rePartition(<num_Files>).write.mode("overwrite").saveAsTable("targetDB.targetTbale") the second option will work with any type of files. Hope this helps !!
... View more
03-01-2018
04:00 AM
Hi @Manjunath Patel, SparkListenerBus has already stopped!
Is due to the interruption of the program without proper shutdown of the context, implies program died before notifying all the other executors in the platform. This occurs if you handle the errors by terminating the program with sys.exit , so that the context jvm died without notifying other agents. best you could do is stop the context (sc.stop or spark.stop) gracefully before you terminate the jvm, so that it is easy for you to debug any other errors in program. In case of over commiting the resources (memory) without swap also may cause this as the OS abruptly kill the JVM. Hope this helps !!
... View more
02-26-2018
12:27 AM
1 Kudo
Hi @Abdou B., you need key-store only in case if you configure two way SSL from Kafka. In regards to your trust-store you can have a common trust store across all the services you are using. ( as long as nifi - service user which runs the nifi service in Linux/Windows have read access to that trust store ) the best thing to make consistent is to have common truststore and have your keys defined with different aliases to make it more organized. In case if you are using the two way SSL you need to configure the keystore as well, even that can be configured to use common key-store, however to keep the privates keys in secure you need to set the keypassword ( in along with the storepassword), this will ensure to use the same store across multiple teams but will not have access(use) to other team certs. hope this will be helps !!
... View more
02-26-2018
12:06 AM
1 Kudo
Hi @Fernando Lopez Bello, I did come across the same situation, by making the following changes I am able to connect through proxy. first location : Under Settings --> system Settings --> HTTP proxy and provide your proxy details second location : under Build,Execution,Deployment --> Build Tools --> SBT under the JVM section in VM Parameters provide the proxy details -Dhttp.proxyHost=***
-Dhttp.proxyPort=***
-Dhttp.proxyUser=***
-Dhttp.proxyPassword=***
-Dhttps.proxyHost=***
-Dhttps.proxyPort=***
-Dhttps.proxyUser=***
-Dhttps.proxyPassword=*** once done don't forget to restart the IDE, then it should connect to the external wold/proxy.
... View more
01-07-2018
09:04 PM
1 Kudo
Hi @Rachel Rui Liu, This can perform this with two solutions. 1. Using the log back filter mechanism, For
the Audit logs which has forbidden access -> you can see “result”:1 in the
response. Which
mean we can configure the log back settings in nifi properties (where as log4j
in kafka ). Here
I am giving the code snippet for the same ( may need to modify accordingly) <filter
class="ch.qos.logback.core.filter.EvaluatorFilter"> <evaluator> <!-- defaults to
type ch.qos.logback.classic.boolex.JaninoEventEvaluator --> <expression>return
message.contains('"result":1');</expression> </evaluator> <OnMismatch>DENY</OnMismatch> <OnMatch>NEUTRAL</OnMatch> </filter> so
your nifi-node-logback-env file will have the following snippet <appender
name="RANGER_AUDIT"
class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${org.apache.nifi.bootstrap.config.log.dir}/ranger_nifi_audit.log</file> <rollingPolicy
class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
<fileNamePattern>${org.apache.nifi.bootstrap.config.log.dir}/ranger_nifi_audit_%d{yyyy-MM-dd_HH}.%i.log</fileNamePattern>
<maxFileSize>100MB</maxFileSize>
<maxHistory>30</maxHistory> </rollingPolicy>
<immediateFlush>true</immediateFlush> <filter
class="ch.qos.logback.core.filter.EvaluatorFilter"> <evaluator> <!-- defaults to
type ch.qos.logback.classic.boolex.JaninoEventEvaluator --> <expression>return
message.contains('"result":1');</expression> </evaluator> <OnMismatch>DENY</OnMismatch> <OnMatch>NEUTRAL</OnMatch> </filter> <encoder
class="ch.qos.logback.classic.encoder.PatternLayoutEncoder"> <pattern>%date %level
[%thread] %logger{40} %msg%n</pattern> </encoder> </appender> in case of log4j that would be regular expression filter <RegexFilter regex=".*\"result\" \: 1.*" onMatch="ACCEPT" onMismatch="DENY"/> More on This can be found at log4j and logback 2. Using the out of the box solution
with simple shell script whchi will grep the result:1 lines and remev rest of
all on periodic interval sed
'/”result”:1/!d' <logfile> Hope this helps !!
... View more
12-29-2017
01:28 AM
1 Kudo
Hi @Muneesh, Hive Client do support to connect via the jdbc, here is the sample code ( can be easily converted to Scala), in this example illustrate loading and selecting the data into hive. Hope this helps!! import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;
public class HiveJdbcClient {
private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";
public static void main(String[] args) throws SQLException {
try {
Class.forName(driverName);
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
System.exit(1);
}
Connection con = DriverManager.getConnection("jdbc:hive://localhost:10000/default", "", "");
Statement stmt = con.createStatement();
String tableName = "testHiveDriverTable";
stmt.executeQuery("drop table " + tableName);
ResultSet res = stmt.executeQuery("create table " + tableName + " (key int, value string)");
// show tables
String sql = "show tables '" + tableName + "'";
System.out.println("Running: " + sql);
res = stmt.executeQuery(sql);
if (res.next()) {
System.out.println(res.getString(1));
}
// describe table
sql = "describe " + tableName;
System.out.println("Running: " + sql);
res = stmt.executeQuery(sql);
while (res.next()) {
System.out.println(res.getString(1) + "\t" + res.getString(2));
}
// load data into table
// NOTE: filepath has to be local to the hive server
// NOTE: /tmp/a.txt is a ctrl-A separated file with two fields per line
String filepath = "/tmp/a.txt";
sql = "load data local inpath '" + filepath + "' into table " + tableName;
System.out.println("Running: " + sql);
res = stmt.executeQuery(sql);
// select * query
sql = "select * from " + tableName;
System.out.println("Running: " + sql);
res = stmt.executeQuery(sql);
while (res.next()) {
System.out.println(String.valueOf(res.getInt(1)) + "\t" + res.getString(2));
}
// regular hive query
sql = "select count(1) from " + tableName;
System.out.println("Running: " + sql);
res = stmt.executeQuery(sql);
while (res.next()) {
System.out.println(res.getString(1));
}
}
}
... View more
12-21-2017
12:29 AM
Hi @PJ, '$' signifies end of line in regular expression, that would be the reason to get the content as one split, you can use the escape sequence to handle that with split(all_comments,'\\$'). Hope this helps !!
... View more
12-21-2017
12:24 AM
1 Kudo
To secure
the Spark Thrift server first we need to change the mode from binary to http
then secure the channel with the certificates. Login to Ambari-> Spark(2)-> Configs -> Custom
spark-hive-site-override: Set the following parameters : hive.server2.transport.mode : http
hive.server2.thrift.http.port : 10015 / 10016 ( in case of spark 2)
hive.server2.http.endpoint : cliservice #Enabling the SSL mode hive.server2.use.SSL : true
hive.server2.keystore.path : </path/to/your/keystore/jks>
hive.server2.keystore.password : <keystorepassword> in case of
server certs are not available process to create self-signed certs (from Hive
Wiki page) Setting
up SSL with self-signed certificates Use the
following steps to create and verify self-signed SSL certificates for use with
HiveServer2:
Create the self-signed
certificate and add it to a keystore file using: keytool -genkey -alias example.com
-keyalg RSA -keystore keystore.jks -keysize 2048 Ensure the name used in the
self signed certificate matches the hostname where Thrift server will run.
List the keystore entries to
verify that the certificate was added. Note that a keystore can contain
multiple such certificates: keytool
-list -keystore keystore.jks
Export this certificate from
keystore.jks to a certificate file: keytool -export
-alias example.com -file example.com.crt -keystore
keystore.jks
Add this certificate to the
client's truststore to establish trust: keytool -import -trustcacerts -alias example.com -file example.com.crt
-keystore truststore.jks
Verify that the certificate
exists in truststore.jks: keytool
-list -keystore truststore.jks
Then start Spark Thrift server,
use spark-sql form spark bin or try to connect with beeline using: jdbc:hive2://<host>:<port>/<database>;ssl=true;sslTrustStore=<path-to-truststore>;trustStorePassword=<truststore-password>
... View more
Labels: