About bkosaraju

bkosaraju · ‎01-15-2019

Hi @Rahul Bhargava, looks the Polling interval is causing the problem as it is waiting for 60s for next fetch but the current batch is sitll processing that went on to stale, could you please increase that to longer times (as this is the one off migration you can keep the larger value for test). on the other note, you can go with listSftp followed by fetchSftp will do the same. Hope this helps !!

bkosaraju · ‎01-09-2019

Hi @Rahul Bhargava, can you please reduce the remote poll batch size to fewer or leave it to default vaule (which is 5000) from documentation - The value specifies how many file paths to find in a given directory on the remote system when doing a file listing. This value in general should not need to be modified but when polling against a remote system with a tremendous number of files this value can be critical. Setting this value too high can result very poor performance and setting it too low can cause the flow to be slower than normal. I strongly presume that sftp is timing out from source end for the open session and causing data transfer to stale. In addition to that could you please set the parameter : send keep alive on timeout to true and increase all other timeout settings Hope this helps !!

bkosaraju · ‎01-04-2019

Hi @Dinesh Singh, If I understand correctly you need to select value which ever has non null out of two columns, for instance if your data has this manner. Col1 Col2 A1 B1 NULL B2 A3 NULL and you would like to select the data as (A1,B2,A3) you may use COALESCE(Col1,Col2,'<Your Default value>') and on top of this table you may create a view so that end user directly view the result as single columns NB: If you are convenient you may use case statement for complex logic implementation to retrieve as single column. Thanks,

bkosaraju · ‎12-02-2018

Hi @akash sharma, Requested functionality is implemented in smartsence Activity explorer, which is like a out of the box solution, please have a look at the functionality as that will have some pre-build reporting trend analysis reports will help to asses the capacity planing. However, if you still wish to implement in-house solution you can get the data using the REST api from yarn and get the metrics out of the cluster. following URL will have REST specs so that you can get the required information. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html flowing code is scala snippet to extract the same in json format and eventually can be loaded to database or other location. def getURL (rmHost1: String , rmHost2: String , urlExtension : String ): URL = { val url1 = new URL(rmHost1 + urlExtension) val url2 = new URL(rmHost2 + urlExtension) try { url1.openConnection().asInstanceOf[HttpURLConnection].getResponseCode url1 } catch { case e: Exception => { logger.info("Unable to Connect to primary RM Host Trying Secondary RM Host") try { url2.openConnection().asInstanceOf[HttpURLConnection].getResponseCode url2 } catch { case f: Exception => { logger.info("Unable to connect to eaither of the RM Hostes hence terminating ..") logger.error("primary host Stack trace !!", e) logger.error("secondary host Stack trace !!", f) Throw f } } } } } def loadRM (props : Properties ,urlExtension : String ): String = { val url = getURL(props.getProperty("resoucemanagerHost1") , props.getProperty("resoucemanagerHost2"), urlExtension) if ( url == null ) { return null } try { val urlContext = url.openConnection().asInstanceOf[HttpURLConnection] val resStr = IOUtils.toString(urlContext.getInputStream, StandardCharsets.UTF_8) urlContext.disconnect() resStr } catch { case e : Exception => logger.error("Unable to load the URL : "+url.toString, e) Throw e } } val dat = appFunctions.loadRM(properties, "/ws/v1/cluster/appstatistics") if (dat == null) { logger.error("Could not get payload") return } val payload = try { new JSONObject(dat).getJSONObject("appStatInfo").getJSONArray("statItem") } catch { case e: Exception => { logger.error ("Unable to extract the content from Json for Cluster Metrics" + dat) return } }

bkosaraju · ‎08-02-2018

Hi @Jeongmin Ryu, Apparently spark could not be retrieve/visible underline file system which is seen by the meta store. can I request to run ALTER TABLE table_name RECOVER PARTITIONS; or MSCK REPAIR TBALE table_name; this will ensure that Meta store updated with relevant partitions. at the same time can you please ensure that your spark-conf (of zeppelin version) has the most latest hive-site.xml, core-site.xml and hdfs-site.xml files as these are important for spark to determine the underline file system.

bkosaraju · ‎04-16-2018

Hi Eric, In response to "No Subject Alternative name", as part of cert validation this becomes a vital element, and that can be generated using the extention whle you create certificate " -ext san=dns:<your server Nama> " or " -ext san=ip:<your Server Ip> or both -ext san=dns:server.com,ip:11.11.11.11" this will be used on validating the certificate, against the host so that certificate legitimacy will be tested. keytool -keystore server.keystore.jks -alias localhost -validity 365-keyalg RSA -genkey -ext san=dns:<server.com>,ip:<11.11.11.11> please ensure you repeat all remaining steps that sign a cert and import into keystore and truststore as the key changed. this will clear the error you are facing now. in response to your other questions : 1/2. You may leave other fields as unknown, but those are to define where the certificate really belongs to. 3. some application process will directly relay on the cacert (keystore) as default store and may not be possible in application to pass the certstore externally, so it is good practice to have the certs added into the default java cacerts. for instance if you dont want to use the parameteres trust store location and truststore password that will be fall back to cacerts to look up the certificate in the jdbc URL (I suppose but not tested this senario) 4. Please use the same host name what you have provided in certificate CN, as that will performing the validation against that cert. Hope this helps !!

bkosaraju · ‎04-16-2018

Hi Eric, looks you have not signed the cert yet, just use the private key and public key - can you please sign the certificates with following command or use the tinycert.org utility to generate your server(keystore)/client(truststore) certs. # Step 1 - Generate Key keytool -keystore server.keystore.jks -alias localhost -validity 365 -keyalg RSA -genkey # Step 2 - Create CA & upload the same into Trust Store. openssl req -new -x509 -keyout ca-key -out ca-cert -days 365 keytool -keystore server.truststore.jks -alias CARoot -import -file ca-cert keytool -keystore client.truststore.jks -alias CARoot -import -file ca-cert # Step 3 - CA to singn the Certificate keytool -keystore server.keystore.jks -alias localhost -certreq -file cert-file openssl x509 -req -CA ca-cert -CAkey ca-key -in cert-file -out cert-signed -days 365 -CAcreateserial -passin pass:test1234 keytool -keystore server.keystore.jks -alias CARoot -import -file ca-cert keytool -keystore server.keystore.jks -alias localhost -import -file cert-signed %above commands are example extracted from Kafka documentation to generate self signed certificates - is same for any self signed certificate generationthen perform the same thing. Hope this helps !!

bkosaraju · ‎04-13-2018

Hi @Jay, could you please update the parameter "nifi.zookeeper.root.node" to have different in both the clusters in nifi.properties nifi.zookeeper.root.node=/nifi_cluster1 and for second one to some thing like /nifi_cluster2 if you configure them from ambari you can update the same under Advanced nifi-ambari-config -- Zookeeper Znode property. Hope this helps !!

bkosaraju · ‎04-11-2018

Hi @Scott McDowell, When you initiate a interpreter, that starts a yarn application in cluster, there are couple of ways to clear the sessions 1. Terminating from Yarn (most effective) list the yarn application find out the application id running from zeppelin and terminate it. you may use the Yarn UI (running applications --> select the application --> kill the application - right side top corner of app details) if you have yarn ACLS and kerberized the cluster obtain the ticket (kinit <user>) yarn application -list yarn application -kill <appId> 2. By killing the interpreter processor (from terminal) you can celan-up the PIDs of the Interpreter process in Zeppelin server and kill the pid, wil ensures that AM get aborted(might not be graceful). Hope this helps !!

bkosaraju · ‎04-11-2018

Hi @Gaurang Shah, we need to note three main things on this character-encoding issue, that 1. What type of data we have in HDFS/Hive On this context if the data is originally UTF8 encoded and stored as UTF8 coded data, there should not be an issue, however in some cases we load the linguistic encoding into Hive (it supports) and try to read the data in different encoding technique, such cases you will visualize the data with some weird characters, such cases we must ensure that we have provided the proper configuration to the de-serialize so that it can extract the accurate data(with out making any traslations) for that you must specify at hive table level with serde properties ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe" WITH SERDEPROPERTIES("serialization.encoding"='UTF-8'); UTF-8 can be any other char-set which is supported by serde library. 2. Letting the Sqoop know your character set This will ensure that the character set is encoded and decoded with same encoding module. on the sqoop import/export following property will ensure that you are not translating from one charector set to other and causing the untranslatable / any other junk mocked-up characters(described here). --default-character-set=utf8 3. Target Character Set Ensure that your, target table (in netezza/Teradata/Oracle) has the same character set defined for the column properties, so that it wont reject while loading the data, in most of the casess this is the root-cause for the failures, on the other note - though you did not check first and second points which mentioned above you still will be able to load the data into target by making sure that target (Netezza )support rich-character set, but that doesnt mean that we have loaded the data as is ( instead we truncated and load) while exporting the data you may use the hcat/query so that it can enforce the serde properties while extracting the data. Hope this helps !!

Online	Offline
Last Visited	‎04-09-2019 11:41 AM

Member Since	‎01-03-2017 05:05 AM
Last Visited	‎04-09-2019 11:41 AM
Posts	181
Kudos received	44

Cloudera Community

Re: Api to help pull yarn metrics and RM metrics

Re: NiFi Cluster Setup

Re: Hive LLAP ranger insert issue (requires defaul...

Re: Ranger Audit Log (Add filter)

Re: HDFS is not rebalancing after adding new DataN...

Re: Copy files from SFTP server to HDFS using Nifi

Re: Copy files from SFTP server to HDFS using Nifi

Re: How to create hive table with different column...

Re: Api to help pull yarn metrics and RM metrics

Re: SparkSQL returns empty result when accessing H...

Re: Unable to connect with beeline after enabling ...

Re: Unable to connect with beeline after enabling ...

Re: NiFi Cluster Setup

Re: How to kill Spark jobs running in Zeppelin

Re: hadoop where to provide connection encoding