About bkosaraju

bkosaraju · ‎12-20-2017

Hi @PJ, Could you please let me know what the separator you are using split returns an array of elements, you could test with the following sql select Ticket, split(All_Comments,' <separator you specified to split>') from <your table> Hope this helps !!

bkosaraju · ‎12-20-2017

Hi @Robert Jonczy, The report you got is accurate, as I would like to stress on the parameter you have used "threshold" -threshold <threshold>Percentage of disk capacity. this is the value that balancer considered to have + or - of the percent of the "average DFS usage" to be moved which is : % of DFS Used / total capacity In your scenario it is < almost 1%, the threshold you specified (5%) which only works if there is a difference of 10%( +/- 5%) [not in your case ], hence it is not balancing anymore the data. Hope this clarifies !!

bkosaraju · ‎12-20-2017

Hi @PJ, To get the spilts you need to pas two arguments first one is the column name and the 2nd one is the regular expression to split the content of the column. the output column is the Array of strings ( the 2nd value can be viewed by specifying the index ex: res[2] ), However explode takes array as input and convert that into the rows (the above pseudo code I have ran in my environment and able to achieve the output you mentioned) on the other node expTBL is syntax notation and you can keep anything in place of that(need not to be expTBL). Hope this helps !!

bkosaraju · ‎12-19-2017

Hi @jyothi k, While I was doing a migration from RDBMS to Hive I did come across the same scenario, that BLOB, CLOB data. I did approch in a manner that covert the BLOB and CLOB data using bese64 encoding (convert any kind of binary data into readable text format) store in Hive. select UTL_ENCODE.BASE64_ENCODE(blob_column) from oracle_tbl; -- from Orcale This gives the base64 encoding string, so that you can store this as String in Hive/hdfs, which fed to Sqoop as string. on the other hand to convert back to BLOB you can use hive unbase64 (). or the java Base64 package (can be used in either native java Apps or Spark etc etc..) example : select unbase64(converted_blob_column) from hive_table; for native apps you may refer the java docs for Base64 conversion here Hope this helps !!

bkosaraju · ‎12-19-2017

Hi @PJ, you can perform the same in Hive (under the hive context SQL) using the lateral view explode SELECT Ticket, pld FROM <your Table> LATERAL VIEW explode(split(All_Comments,"<expression to split>")) expTBL AS pld; the split will convert the All_elements into Array of Strings(you can use the Regex what you are after to split the time between timestamp and comments). now the explode convert the uneven column length ( array ) into each element into a row. I have tested the same in spark aswell and did get the output in desired manner. yourDf.registerTempTable("someData") hqlContext.sql("""SELECT Ticket, pld FROM someData LATERAL VIEW explode(split(All_Comments,"<expression to split>")) expTBL AS pld;""") Hope this helps !!

bkosaraju · ‎12-19-2017

Hi @Eric H, could you please check the complete class name with the package name --class "org.apache.spark.examples.sql.hive.JavaSparkHiveExample" as that particular class under the package it couldn't reference directly. Hope this helps !!

bkosaraju · ‎12-18-2017

Hi @Mario Borys Glad that it helped !!, by accepting the solution other HCC users find the answer directly. now on to your other question, Yes it is possible by adding the spark.jars argument in interpreter configuration with ojdbc dirver jar file. after you can create the context with same process how you did for the command line more on how to configure Interpreter can be found at here

bkosaraju · ‎12-14-2017

Hi @JT Ng, Yes, That is possible with "Netcat TCP Source" by not installing the agent on application server. however you may need to tail the log and pass on to the listener from the server where you want to feed the logs. which means start the log push process on the source server with (on the appliaction server) tail -f <application Log file>.log |nc <flume_agent_host> <configured_netcat_Sync_port> before you trigger this make sure that you initiated the fulme agent in the HDP cluster(or where the flume agent can be installed) a1.sources = r1 a1.channels = c1 a1.sources.r1.type = netcat a1.sources.r1.bind = 0.0.0.0 a1.sources.r1.port = 6666 a1.sources.r1.channels = c1 Ref on the other side you can configure the HDFS Sync to pump this HDFS file system with the following command a1.channels = c1 a1.sinks = k1 a1.sinks.k1.type = hdfs a1.sinks.k1.channel = c1 a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S a1.sinks.k1.hdfs.filePrefix = events- a1.sinks.k1.hdfs.round = true a1.sinks.k1.hdfs.roundValue = 10 a1.sinks.k1.hdfs.roundUnit = minute Ref NB : Make sure that you handle the tali and nc process while your server stops or completely shuts down your application, however you can manage the tail process with proper shell includes restartability as a service in the linux host. Hope this helps !!

bkosaraju · ‎12-13-2017

Hi @Mario Borys, The error is oiginated from Database front as the Connection URL doesn't consists of the SID information. as per the driver URL documentation, we must provide the SID information in along with the port number. so the new URL look like "url" -> "jdbc:oracle:thin:<userName>/<Password>@<ip/hostname>:<port num ex: 1521>:<SID name>" or "url" -> "jdbc:oracle:thin:@<ip/hostname>:<port num ex: 1521>:<SID name>?user=USERNAME&password=PASSWORD" //for password with special charectorys Hope this helps!!

bkosaraju · ‎12-12-2017

hi @Sudheer Velagapudi, To update the policy you need to specify the policy ID(At the end of the URL) where as in creation time, it automatically increment the policy Id. ex: http://hostname:6080/service/public/api/policy/{id} Hope this helps!!

Online	Offline
Last Visited	‎04-09-2019 11:41 AM

Member Since	‎01-03-2017 05:05 AM
Last Visited	‎04-09-2019 11:41 AM
Posts	181
Kudos received	44

Cloudera Community

Re: Api to help pull yarn metrics and RM metrics

Re: NiFi Cluster Setup

Re: Hive LLAP ranger insert issue (requires defaul...

Re: Ranger Audit Log (Add filter)

Re: HDFS is not rebalancing after adding new DataN...

Re: split column by regex and create a table

Re: HDFS is not rebalancing after adding new DataN...

Re: split column by regex and create a table

Re: How can I convert the BLOB data to the actual ...

Re: split column by regex and create a table

Re: Trouble running Java Spark Hive Example

Re: Connect SPARK with Oracle Databank

Re: Flume without agents on application server log...

Re: Connect SPARK with Oracle Databank

Re: Ranger 0.7 Rest API