About bkosaraju

bkosaraju · ‎04-06-2018

Hi @Matt Andruff, The approach I come got to know while on the DML operation is that, when we insert the data, it creates a temp table under the database you resides / rather the database you are trying to insert. in your case, since you have not switch any database it is defaulted to "default" database (or you may selected default database) hence its trying to create the table underneath. rather provisioning ranger policies for access on default database, use database to target/scratch database will avoid additional roles. for the same issue you have raised there is an open Jria linked - https://issues.apache.org/jira/browse/HIVE-15317 Hope this helps !!

bkosaraju · ‎03-06-2018

Hi @tomoya yoshida, the container is allocated with bigger sizes but I believe the mapper and reducer memory is not increased to utilize the entire container memory you have allocated. could you please look at the values for following two variables mapreduce.map.memory.mb and mapreduce.reduce.memory.mb and you could make the vaules as same as the container size. there is nice HCC article explaining how this allocations work here alternately if you don't have any more resources you may use the mr-engine instead of Tez ( though it is slow will complete the tasks with lesser concurrent memory utilization) // to change the engine run "hive.execution.engine=mr" for this query. Hope this helps !!

bkosaraju · ‎03-06-2018

Hi @yogesh turkane, As I was across, We can achieve this with two ways. Post the load of the data or with schedule intervals run the "ALTER TABLE <table_name> CONCATENATE" on the table in SQL api this will merge all the small orc files associated to that table. - Please not that this is specific to ORC Use the data frame to load the data and re-partition write back with overwrite in spark. The code snippet would be val tDf = hiveContext.table("table_name") tdf.rePartition(<num_Files>).write.mode("overwrite").saveAsTable("targetDB.targetTbale") the second option will work with any type of files. Hope this helps !!

bkosaraju · ‎03-01-2018

Hi @Manjunath Patel, SparkListenerBus has already stopped! Is due to the interruption of the program without proper shutdown of the context, implies program died before notifying all the other executors in the platform. This occurs if you handle the errors by terminating the program with sys.exit , so that the context jvm died without notifying other agents. best you could do is stop the context (sc.stop or spark.stop) gracefully before you terminate the jvm, so that it is easy for you to debug any other errors in program. In case of over commiting the resources (memory) without swap also may cause this as the OS abruptly kill the JVM. Hope this helps !!

bkosaraju · ‎02-26-2018

Hi @Abdou B., you need key-store only in case if you configure two way SSL from Kafka. In regards to your trust-store you can have a common trust store across all the services you are using. ( as long as nifi - service user which runs the nifi service in Linux/Windows have read access to that trust store ) the best thing to make consistent is to have common truststore and have your keys defined with different aliases to make it more organized. In case if you are using the two way SSL you need to configure the keystore as well, even that can be configured to use common key-store, however to keep the privates keys in secure you need to set the keypassword ( in along with the storepassword), this will ensure to use the same store across multiple teams but will not have access(use) to other team certs. hope this will be helps !!

bkosaraju · ‎02-26-2018

Hi @Fernando Lopez Bello, I did come across the same situation, by making the following changes I am able to connect through proxy. first location : Under Settings --> system Settings --> HTTP proxy and provide your proxy details second location : under Build,Execution,Deployment --> Build Tools --> SBT under the JVM section in VM Parameters provide the proxy details -Dhttp.proxyHost=*** -Dhttp.proxyPort=*** -Dhttp.proxyUser=*** -Dhttp.proxyPassword=*** -Dhttps.proxyHost=*** -Dhttps.proxyPort=*** -Dhttps.proxyUser=*** -Dhttps.proxyPassword=*** once done don't forget to restart the IDE, then it should connect to the external wold/proxy.

bkosaraju · ‎01-07-2018

Hi @Rachel Rui Liu, This can perform this with two solutions. 1. Using the log back filter mechanism, For the Audit logs which has forbidden access -> you can see “result”:1 in the response. Which mean we can configure the log back settings in nifi properties (where as log4j in kafka ). Here I am giving the code snippet for the same ( may need to modify accordingly) <filter class="ch.qos.logback.core.filter.EvaluatorFilter"> <evaluator>  <expression>return message.contains('"result":1');</expression> </evaluator> <OnMismatch>DENY</OnMismatch> <OnMatch>NEUTRAL</OnMatch> </filter> so your nifi-node-logback-env file will have the following snippet <appender name="RANGER_AUDIT" class="ch.qos.logback.core.rolling.RollingFileAppender"> <file>${org.apache.nifi.bootstrap.config.log.dir}/ranger_nifi_audit.log</file> <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy"> <fileNamePattern>${org.apache.nifi.bootstrap.config.log.dir}/ranger_nifi_audit_%d{yyyy-MM-dd_HH}.%i.log</fileNamePattern> <maxFileSize>100MB</maxFileSize> <maxHistory>30</maxHistory> </rollingPolicy> <immediateFlush>true</immediateFlush> <filter class="ch.qos.logback.core.filter.EvaluatorFilter"> <evaluator>  <expression>return message.contains('"result":1');</expression> </evaluator> <OnMismatch>DENY</OnMismatch> <OnMatch>NEUTRAL</OnMatch> </filter> <encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder"> <pattern>%date %level [%thread] %logger{40} %msg%n</pattern> </encoder> </appender> in case of log4j that would be regular expression filter <RegexFilter regex=".*\"result\" \: 1.*" onMatch="ACCEPT" onMismatch="DENY"/> More on This can be found at log4j and logback 2. Using the out of the box solution with simple shell script whchi will grep the result:1 lines and remev rest of all on periodic interval sed '/”result”:1/!d' <logfile> Hope this helps !!

bkosaraju · ‎12-29-2017

Hi @Muneesh, Hive Client do support to connect via the jdbc, here is the sample code ( can be easily converted to Scala), in this example illustrate loading and selecting the data into hive. Hope this helps!! import java.sql.SQLException; import java.sql.Connection; import java.sql.ResultSet; import java.sql.Statement; import java.sql.DriverManager; public class HiveJdbcClient { private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver"; public static void main(String[] args) throws SQLException { try { Class.forName(driverName); } catch (ClassNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); System.exit(1); } Connection con = DriverManager.getConnection("jdbc:hive://localhost:10000/default", "", ""); Statement stmt = con.createStatement(); String tableName = "testHiveDriverTable"; stmt.executeQuery("drop table " + tableName); ResultSet res = stmt.executeQuery("create table " + tableName + " (key int, value string)"); // show tables String sql = "show tables '" + tableName + "'"; System.out.println("Running: " + sql); res = stmt.executeQuery(sql); if (res.next()) { System.out.println(res.getString(1)); } // describe table sql = "describe " + tableName; System.out.println("Running: " + sql); res = stmt.executeQuery(sql); while (res.next()) { System.out.println(res.getString(1) + "\t" + res.getString(2)); } // load data into table // NOTE: filepath has to be local to the hive server // NOTE: /tmp/a.txt is a ctrl-A separated file with two fields per line String filepath = "/tmp/a.txt"; sql = "load data local inpath '" + filepath + "' into table " + tableName; System.out.println("Running: " + sql); res = stmt.executeQuery(sql); // select * query sql = "select * from " + tableName; System.out.println("Running: " + sql); res = stmt.executeQuery(sql); while (res.next()) { System.out.println(String.valueOf(res.getInt(1)) + "\t" + res.getString(2)); } // regular hive query sql = "select count(1) from " + tableName; System.out.println("Running: " + sql); res = stmt.executeQuery(sql); while (res.next()) { System.out.println(res.getString(1)); } } }

bkosaraju · ‎12-21-2017

Hi @PJ, '$' signifies end of line in regular expression, that would be the reason to get the content as one split, you can use the escape sequence to handle that with split(all_comments,'\\$'). Hope this helps !!

bkosaraju · ‎12-21-2017

To secure the Spark Thrift server first we need to change the mode from binary to http then secure the channel with the certificates. Login to Ambari-> Spark(2)-> Configs -> Custom spark-hive-site-override: Set the following parameters : hive.server2.transport.mode : http hive.server2.thrift.http.port : 10015 / 10016 ( in case of spark 2) hive.server2.http.endpoint : cliservice #Enabling the SSL mode hive.server2.use.SSL : true hive.server2.keystore.path : </path/to/your/keystore/jks> hive.server2.keystore.password : <keystorepassword> in case of server certs are not available process to create self-signed certs (from Hive Wiki page) Setting up SSL with self-signed certificates Use the following steps to create and verify self-signed SSL certificates for use with HiveServer2: Create the self-signed certificate and add it to a keystore file using: keytool -genkey -alias example.com -keyalg RSA -keystore keystore.jks -keysize 2048 Ensure the name used in the self signed certificate matches the hostname where Thrift server will run. List the keystore entries to verify that the certificate was added. Note that a keystore can contain multiple such certificates: keytool -list -keystore keystore.jks Export this certificate from keystore.jks to a certificate file: keytool -export -alias example.com -file example.com.crt -keystore keystore.jks Add this certificate to the client's truststore to establish trust: keytool -import -trustcacerts -alias example.com -file example.com.crt -keystore truststore.jks Verify that the certificate exists in truststore.jks: keytool -list -keystore truststore.jks Then start Spark Thrift server, use spark-sql form spark bin or try to connect with beeline using: jdbc:hive2://<host>:<port>/<database>;ssl=true;sslTrustStore=<path-to-truststore>;trustStorePassword=<truststore-password>

Online	Offline
Last Visited	‎04-09-2019 11:41 AM

Member Since	‎01-03-2017 05:05 AM
Last Visited	‎04-09-2019 11:41 AM
Posts	181
Kudos received	44

Cloudera Community

Re: Api to help pull yarn metrics and RM metrics

Re: NiFi Cluster Setup

Re: Hive LLAP ranger insert issue (requires defaul...

Re: Ranger Audit Log (Add filter)

Re: HDFS is not rebalancing after adding new DataN...

Re: Hive LLAP ranger insert issue (requires defaul...

Re: Hive memory error:beyond the cluster container...

Re: how to append data in part file while insert i...

Re: ERROR LiveListenerBus: SparkListenerBus has al...

Re: NIFi : SSL StandardSSLContextService and Keyst...

Re: Problem with proxy settings for SBT

Re: Ranger Audit Log (Add filter)

Re: Connecting Hive using JDBC connector from Scal...

Re: split column by regex and create a table

Secure(SSL encryption) Spark Thrift server