About arunak

arunak · ‎01-09-2020

Hi Matt, The case was bit different than in the screenshot. This was a multi node cluster and instead of "localhost" @VijaySankar had one of the hostnames configured in the hostname field. The processor was however configured to run on all nodes. This was causing the Error messages. Cleared off the hostname field so that the processor is able to spin up a HTTP service on each host:port and the error doesn't occur anymore.

arunak · ‎11-09-2018

Tested against HDF Version 3.1.0

arunak · ‎11-09-2018

Hi, In this article, let us take a look at how to delete a schema from the Hortonworks Schema Registry. Let me start with a word of caution that the approach is not recommended for Production systems and use these steps at your own risk. Also, would like to thank Brian Goerlitz for his ideas towards this post. Currently it is not possible to delete a schema from the UI. So the steps below shows how to delete the schema from its backend datastore. I am using MySQL as my backend datastore for the schema registry and the queries will be related to MySQL. You should change them according to your database type. Step 1 Verify that the two tables schema_version_info and schema_field_info have CASCADE ON UPDATE and CASCADE ON DELETE enabled. This can be done by the below queries on information_schema database select UPDATE_RULE,DELETE_RULE,REFERENCED_TABLE_NAME from REFERENTIAL_CONSTRAINTS where table_name='schema_version_info'; and select UPDATE_RULE,DELETE_RULE,REFERENCED_TABLE_NAME from REFERENTIAL_CONSTRAINTS where table_name='schema_field_info'; Step 2 Stop Schema Registry Service from Ambari Step 3 Backup the database Below is the content of my schema registry before the delete operation and I am interested in deleting the person.demographic.details schema Step 4 Identify the id of the schema to be deleted. For this, you need to switch to the database provisioned to store the schema registry information. In my case it is 'registry' and issue the select query. select id from schema_metadata_info where name ='person.demographic.details'; Step 5 Delete the schema from schema_serdes_mapping based on the id we queried in step 4 above delete from schema_serdes_mapping where schemaMetadataId=1; Step 6 Delete the schema from schema_metadata_info based on the id we queried in step 4 above delete from schema_metadata_info where id =1; We observe that the schema has been deleted from the tables. Step 7 Start the schema registry service via Ambari, and verify that the schema is deleted. Optionally we can recreate the schema with the same name on the UI and explore the front-end and back-end to ensure the schema can be re-created with no issues. We observe that the new schema was created with the same name and a different id. Thanks -Arun A K-

arunak · ‎10-18-2018

It was an access issue on the Buckets. Right permission settings on the bucket fixed it.

arunak · ‎06-14-2018

Output Data of the form

arunak · ‎06-14-2018

May not be the best approach, but we could do this in a 2 step process. Step 1 Load the content to a data frame Apply an UDF to derive a set of period_end_date for the given row Explode the row based on the period_end_date Step 2 Derive the period_start_date for the period_end_date based on the pa_start_date You can either derive end date first and start date next or vice versa. Below is a code snippet. Can be optimized further import org.apache.spark.sql.types.{StructType,StructField,StringType,IntegerType}; import org.apache.spark.sql.Row; import java.util.Date import scala.collection.mutable.ListBuffer import java.util.GregorianCalendar import java.util.Calendar import java.text.SimpleDateFormat val csv = sc.textFile("/user/hdfs/ak/spark/197905/") val rows = csv.map(line => line.split(",").map(_.trim)) val rdd = rows.map(row => Row(row(0),row(1),row(2),row(3),row(4),row(5))) val schema = new StructType().add(StructField("c0", StringType, true)).add(StructField("c1", StringType, true)).add(StructField("c2", StringType, true)).add(StructField("c3", StringType, true)).add(StructField("c4", StringType, true)).add(StructField("c5", StringType, true)) val df = sqlContext.createDataFrame(rdd, schema)<br> df.registerTempTable("raw_data"); <br><br>def getLastDateOfMonth(date:Date) : Date ={ val cal = Calendar.getInstance() cal.setTime(date); cal.set(Calendar.DAY_OF_MONTH, cal.getActualMaximum(Calendar.DAY_OF_MONTH)); cal.getTime(); } def getFirstDateOfMonth(date:Date) : Date ={ val cal = Calendar.getInstance() cal.setTime(date); cal.set(Calendar.DAY_OF_MONTH, cal.getActualMinimum(Calendar.DAY_OF_MONTH)); cal.getTime(); } def getLastDaysBetweenDates = (formatString:String, startDateString:String, endDateString:String) => { val format = new SimpleDateFormat(formatString) val startdate = getLastDateOfMonth(format.parse(startDateString)) val enddate =getLastDateOfMonth(format.parse(endDateString)) var dateList = new ListBuffer[Date]() var calendar = new GregorianCalendar() calendar.setTime(startdate) var yearMonth=""; var maxDates = scala.collection.mutable.Map[String, Date]() while (calendar.getTime().before(enddate)) { yearMonth = calendar.getTime().getYear()+"_"+calendar.getTime.getMonth() maxDates += (yearMonth -> calendar.getTime()) calendar.add(Calendar.DATE, 1) } maxDates += (yearMonth -> calendar.getTime()) for(eachMonth <- maxDates.keySet){ dateList += maxDates(eachMonth) } var dateListString = ""; for( date <- dateList.sorted){ dateListString=dateListString+","+format.format(date) } dateListString.substring(1, dateListString.length()) } def getFirstDateFromLastDateAndReference = (formatString:String, refDateString:String, lastDate:String) => { val format = new SimpleDateFormat(formatString) val firstDay = getFirstDateOfMonth(format.parse(lastDate)) val year = firstDay.getYear; val month = firstDay.getMonth; val refDate = format.parse(refDateString) val cal = Calendar.getInstance() cal.setTime(refDate) val refDateTime = cal.getTime(); val refYear=refDateTime.getYear; val refMonth = refDateTime.getMonth(); if(year==refYear&& month==refMonth){ refDateString }else{ format.format(firstDay) } } sqlContext.udf.register("lastday",getLastDaysBetweenDates) sqlContext.udf.register("firstday",getFirstDateFromLastDateAndReference) sqlContext.sql("select *,lastday('d-MMM-yy',c4,c5) from raw_data").show(); sqlContext.sql("select c0,c1,c2,c3,c4,c5,explode(split(lastday('d-MMM-yy',c4,c5),',')) as lastday from hello").registerTempTable("data_with_end_date"); sqlContext.sql("select c0,c1,c2,c3,c4,c5,lastday,firstday('d-MMM-yy',c4,lastday) from data_with_end_date").show() I used 2 udfs here 1) getLastDaysBetweenDates - Consumes a date format, start and end dates and returns a list of Month End Dates in this range 2) getFirstDateFromLastDateAndReference - Consumes a date format, a start date and an end date. Returns the first date of the month based on the last date. However for the first month, it returns the pa_start_date instead of the First Calendar date.

arunak · ‎06-14-2018

@AArora, is the requirement to create multiple rows from one row where you need to have all "First & Last Day of the Month" between pa_start_date pa_end_date as the period_end_date?

arunak · ‎06-13-2018

check out https://community.hortonworks.com/answers/77558/view.html

arunak · ‎03-27-2018

@Scott Aslan : Thanks, build successful after skipping the tests. The test failure trace is on the previous comment.

arunak · ‎03-27-2018

I will run again skipping the tests, [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (process-resource-bundles) @ nifi-solr-processors --- [INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.167 s - in org.apache.nifi.processors.standard.TestParseCEF [INFO] Running org.apache.nifi.processors.standard.TestGetFile [ERROR] Tests run: 7, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 0.138 s <<< FAILURE! - in org.apache.nifi.processors.standard.TestGetFile [ERROR] testWithUnreadableDir(org.apache.nifi.processors.standard.TestGetFile) Time elapsed: 0.028 s <<< ERROR! java.lang.NullPointerException at org.apache.nifi.processors.standard.TestGetFile.testWithUnreadableDir(TestGetFile.java:92) [ERROR] testWithInaccessibleDir(org.apache.nifi.processors.standard.TestGetFile) Time elapsed: 0.006 s <<< ERROR! java.lang.NullPointerException at org.apache.nifi.processors.standard.TestGetFile.testWithInaccessibleDir(TestGetFile.java:64) [ERROR] testWithUnwritableDir(org.apache.nifi.processors.standard.TestGetFile) Time elapsed: 0.007 s <<< ERROR! java.lang.NullPointerException at org.apache.nifi.processors.standard.TestGetFile.testWithUnwritableDir(TestGetFile.java:120) [INFO] Running org.apache.nifi.processors.standard.TestGenerateFlowFile [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.02 s - in org.apache.nifi.processors.standard.TestGenerateFlowFile [INFO] Running org.apache.nifi.processors.standard.TestExtractGrok

Online	Offline
Last Visited	‎01-10-2020 08:56 AM

Member Since	‎05-17-2016 11:59 AM
Last Visited	‎01-10-2020 08:56 AM
Posts	190
Kudos received	46

Cloudera Community

Re: Composed delimiter , multidilimiter in Hive !!...

Re: How to put running log of Apahce NiFi into Spl...

Re: How to extract Text from JSON

Re: How to expand a single row with a start and en...

Re: Enabling LZO compression using NiFi PutHDFS

Re: Nifi - HandleHttpRequest processor throwing "...

Re: Deleting a schema from Hortonworks Schema Regi...

Deleting a schema from Hortonworks Schema Registry

Re: Issue Configuring ListS3

Re: how to split row into multiple rows on the bas...

Re: how to split row into multiple rows on the bas...

Re: how to split row into multiple rows on the bas...

Re: how to split row into multiple rows on the bas...

Re: NiFi Build Failed

Re: NiFi Build Failed