Member since
05-17-2016
190
Posts
46
Kudos Received
11
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
780 | 09-07-2017 06:24 PM | |
1010 | 02-24-2017 06:33 AM | |
1387 | 02-10-2017 09:18 PM | |
5438 | 01-11-2017 08:55 PM | |
2650 | 12-15-2016 06:16 PM |
01-09-2020
07:45 AM
Hi Matt, The case was bit different than in the screenshot. This was a multi node cluster and instead of "localhost" @VijaySankar had one of the hostnames configured in the hostname field. The processor was however configured to run on all nodes. This was causing the Error messages. Cleared off the hostname field so that the processor is able to spin up a HTTP service on each host:port and the error doesn't occur anymore.
... View more
01-22-2019
06:24 PM
@Abhijeet Rajput : Any specific usecase that you are looking at? Because Bigdata, Containers and Microservices are all keywords and if we need to tie them all together or not depends on the real problem that you are trying to solve.
... View more
11-09-2018
04:22 AM
Tested against HDF Version 3.1.0
... View more
11-09-2018
04:11 AM
2 Kudos
Hi, In this article, let us take a look at how to delete a schema from the Hortonworks Schema Registry. Let me start with a word of caution that the approach is not recommended for Production systems and use these steps at your own risk. Also, would like to thank Brian Goerlitz for his ideas towards this post. Currently it is not possible to delete a schema from the UI. So the steps below shows how to delete the schema from its backend datastore. I am using MySQL as my backend datastore for the schema registry and the queries will be related to MySQL. You should change them according to your database type. Step 1 Verify that the two tables schema_version_info and schema_field_info have CASCADE ON UPDATE and CASCADE ON DELETE enabled. This can be done by the below queries on information_schema database select UPDATE_RULE,DELETE_RULE,REFERENCED_TABLE_NAME from REFERENTIAL_CONSTRAINTS where table_name='schema_version_info'; and select UPDATE_RULE,DELETE_RULE,REFERENCED_TABLE_NAME from REFERENTIAL_CONSTRAINTS where table_name='schema_field_info'; Step 2 Stop Schema Registry Service from Ambari Step 3 Backup the database Below is the content of my schema registry before the delete operation and I am interested in deleting the person.demographic.details schema Step 4 Identify the id of the schema to be deleted. For this, you need to switch to the database provisioned to store the schema registry information. In my case it is 'registry' and issue the select query. select id from schema_metadata_info where name ='person.demographic.details'; Step 5 Delete the schema from schema_serdes_mapping based on the id we queried in step 4 above delete from schema_serdes_mapping where schemaMetadataId=1; Step 6 Delete the schema from schema_metadata_info based on the id we queried in step 4 above delete from schema_metadata_info where id =1; We observe that the schema has been deleted from the tables. Step 7 Start the schema registry service via Ambari, and verify that the schema is deleted. Optionally we can recreate the schema with the same name on the UI and explore the front-end and back-end to ensure the schema can be re-created with no issues. We observe that the new schema was created with the same name and a different id. Thanks -Arun A K-
... View more
- Find more articles tagged with:
- Data Ingestion & Streaming
- delete
- FAQ
- schema-registry
- schema_registry
Labels:
10-18-2018
01:27 AM
It was an access issue on the Buckets. Right permission settings on the bucket fixed it.
... View more
08-03-2018
08:42 PM
You can use a flatMap followed by a mapToPair. See below JavaRDD<String> flatMapRdd = fileRDD.flatMap(new FlatMapFunction<String, String>() { List<String> dataList = new ArrayList<String>(); public Iterable<String> call(String line) throws Exception { String key = line.split(",")[0]; line = line.replace(key+",", "").trim(); String[] splits = line.split(","); for(int i = 0 ;i<splits.length;i+=2){ dataList.add(key+","+splits[i]); } return dataList; }} ); JavaPairRDD<String, String> kvRdd = flatMapRdd.mapToPair(new PairFunction<String, String, String>() { public Tuple2<String, String> call(String kv) throws Exception { return new Tuple2<String, String>(kv.split(",")[0], kv.split(",")[1]); }}); Thanks -ak-
... View more
06-14-2018
11:28 PM
Output Data of the form
... View more
06-14-2018
11:19 PM
May not be the best approach, but we could do this in a 2 step process. Step 1 Load the content to a data frame Apply an UDF to derive a set of period_end_date for the given row Explode the row based on the period_end_date Step 2 Derive the period_start_date for the period_end_date based on the pa_start_date You can either derive end date first and start date next or vice versa. Below is a code snippet. Can be optimized further import org.apache.spark.sql.types.{StructType,StructField,StringType,IntegerType};
import org.apache.spark.sql.Row;
import java.util.Date
import scala.collection.mutable.ListBuffer
import java.util.GregorianCalendar
import java.util.Calendar
import java.text.SimpleDateFormat
val csv = sc.textFile("/user/hdfs/ak/spark/197905/")
val rows = csv.map(line => line.split(",").map(_.trim))
val rdd = rows.map(row => Row(row(0),row(1),row(2),row(3),row(4),row(5)))
val schema = new StructType().add(StructField("c0", StringType, true)).add(StructField("c1", StringType, true)).add(StructField("c2", StringType, true)).add(StructField("c3", StringType, true)).add(StructField("c4", StringType, true)).add(StructField("c5", StringType, true))
val df = sqlContext.createDataFrame(rdd, schema)<br>
df.registerTempTable("raw_data");
<br><br>def getLastDateOfMonth(date:Date) : Date ={
val cal = Calendar.getInstance()
cal.setTime(date);
cal.set(Calendar.DAY_OF_MONTH, cal.getActualMaximum(Calendar.DAY_OF_MONTH));
cal.getTime();
}
def getFirstDateOfMonth(date:Date) : Date ={
val cal = Calendar.getInstance()
cal.setTime(date);
cal.set(Calendar.DAY_OF_MONTH, cal.getActualMinimum(Calendar.DAY_OF_MONTH));
cal.getTime();
}
def
getLastDaysBetweenDates = (formatString:String, startDateString:String, endDateString:String) => {
val format = new SimpleDateFormat(formatString)
val startdate = getLastDateOfMonth(format.parse(startDateString))
val enddate =getLastDateOfMonth(format.parse(endDateString))
var dateList = new ListBuffer[Date]()
var calendar = new GregorianCalendar()
calendar.setTime(startdate)
var yearMonth="";
var maxDates = scala.collection.mutable.Map[String, Date]()
while (calendar.getTime().before(enddate)) {
yearMonth = calendar.getTime().getYear()+"_"+calendar.getTime.getMonth()
maxDates += (yearMonth -> calendar.getTime())
calendar.add(Calendar.DATE, 1)
}
maxDates += (yearMonth -> calendar.getTime())
for(eachMonth <- maxDates.keySet){
dateList += maxDates(eachMonth)
}
var dateListString = "";
for( date <- dateList.sorted){
dateListString=dateListString+","+format.format(date)
}
dateListString.substring(1, dateListString.length())
}
def
getFirstDateFromLastDateAndReference = (formatString:String, refDateString:String, lastDate:String) => {
val format = new SimpleDateFormat(formatString)
val firstDay = getFirstDateOfMonth(format.parse(lastDate))
val year = firstDay.getYear;
val month = firstDay.getMonth;
val refDate = format.parse(refDateString)
val cal = Calendar.getInstance()
cal.setTime(refDate)
val refDateTime = cal.getTime();
val refYear=refDateTime.getYear;
val refMonth = refDateTime.getMonth();
if(year==refYear&& month==refMonth){
refDateString
}else{
format.format(firstDay)
}
}
sqlContext.udf.register("lastday",getLastDaysBetweenDates)
sqlContext.udf.register("firstday",getFirstDateFromLastDateAndReference)
sqlContext.sql("select *,lastday('d-MMM-yy',c4,c5) from raw_data").show();
sqlContext.sql("select c0,c1,c2,c3,c4,c5,explode(split(lastday('d-MMM-yy',c4,c5),',')) as lastday from hello").registerTempTable("data_with_end_date");
sqlContext.sql("select c0,c1,c2,c3,c4,c5,lastday,firstday('d-MMM-yy',c4,lastday) from data_with_end_date").show()
I used 2 udfs here 1) getLastDaysBetweenDates - Consumes a date format, start and end dates and returns a list of Month End Dates in this range 2) getFirstDateFromLastDateAndReference - Consumes a date format, a start date and an end date. Returns the first date of the month based on the last date. However for the first month, it returns the pa_start_date instead of the First Calendar date.
... View more
06-14-2018
12:23 PM
@AArora, is the requirement to create multiple rows from one row where you need to have all "First & Last Day of the Month" between pa_start_date pa_end_date as the period_end_date?
... View more
06-13-2018
06:26 PM
check out https://community.hortonworks.com/answers/77558/view.html
... View more
06-07-2018
07:38 PM
I have the hcat credentails as a part of my workflow. Somehow that isn't getting picked up.
... View more
06-07-2018
06:14 PM
Hello All, I have a simple pyspark application that queries a hive table using HiveContext and dumps the result to a csv file on to HDFS. The application runs fine on my cluster (kerberized) when I do spark submit. I do have a valid ticket, so there isn't any issue in submitting the job. When I run the same job via the Ambari WorkFlow manager, I get a GSS Exception : No Valid Credentials Provided. How do I pass on the user credentials to the workflow? Thanks -ak-
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Oozie
05-16-2018
07:03 PM
Let me know if this helps. https://community.hortonworks.com/questions/110519/extracting-substring-in-pig-latin.html?childToView=111854#answer-111854 Else can provide more information.
... View more
03-27-2018
09:27 PM
@Scott Aslan : Thanks, build successful after skipping the tests. The test failure trace is on the previous comment.
... View more
03-27-2018
08:41 PM
I will run again skipping the tests, [INFO]
[INFO] --- maven-remote-resources-plugin:1.5:process (process-resource-bundles) @ nifi-solr-processors ---
[INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.167 s - in org.apache.nifi.processors.standard.TestParseCEF
[INFO] Running org.apache.nifi.processors.standard.TestGetFile
[ERROR] Tests run: 7, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 0.138 s <<< FAILURE! - in org.apache.nifi.processors.standard.TestGetFile
[ERROR] testWithUnreadableDir(org.apache.nifi.processors.standard.TestGetFile) Time elapsed: 0.028 s <<< ERROR!
java.lang.NullPointerException
at org.apache.nifi.processors.standard.TestGetFile.testWithUnreadableDir(TestGetFile.java:92)
[ERROR] testWithInaccessibleDir(org.apache.nifi.processors.standard.TestGetFile) Time elapsed: 0.006 s <<< ERROR!
java.lang.NullPointerException
at org.apache.nifi.processors.standard.TestGetFile.testWithInaccessibleDir(TestGetFile.java:64)
[ERROR] testWithUnwritableDir(org.apache.nifi.processors.standard.TestGetFile) Time elapsed: 0.007 s <<< ERROR!
java.lang.NullPointerException
at org.apache.nifi.processors.standard.TestGetFile.testWithUnwritableDir(TestGetFile.java:120)
[INFO] Running org.apache.nifi.processors.standard.TestGenerateFlowFile
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.02 s - in org.apache.nifi.processors.standard.TestGenerateFlowFile
[INFO] Running org.apache.nifi.processors.standard.TestExtractGrok
... View more
03-27-2018
07:22 PM
build.txt Hello all, Any pointers would be helpful. I am trying to build NiFi from source on Centos 7. Have the preconditions met as per https://nifi.apache.org/quickstart.html EDIT : Attached the build log However, the build fails with the below trace :- [INFO] dockermaven 1.6.0-SNAPSHOT ......................... SUCCESS [ 0.713 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:05 min (Wall Clock)
[INFO] Finished at: 2018-03-27T18:15:24Z
[INFO] ------------------------------------------------------------------------
Downloaded from central: https://repo1.maven.org/maven2/org/apache/curator/curator-framework/2.10.0/curator-framework-2.10.0.pom (2.5 kB at 34 kB/s)
[ERROR] Failed to execute goal com.github.eirslett:frontend-maven-plugin:1.1:npm (npm install) on project nifi-web-ui: Failed to run task: 'npm --cache-min Infinity install' failed. (error code 1) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <goals> -rf :nifi-web-ui
Destroying 6 processes
Destroying process..
Destroying process..
Destroying process..
Destroying process..
Destroying process..
Destroying process..
Destroyed 6 processes
... View more
Labels:
- Labels:
-
Apache NiFi
03-12-2018
09:05 PM
NiFi failed to start with this change, I rolled back the changes. I am assuming that it expects an expression at the filter.
... View more
03-07-2018
06:56 PM
@Matt Clarke I assume I am getting close to the solution. I made the changes as you suggested. However I get the below error on login. o.a.n.w.a.c.AccessDeniedExceptionMapper identity[user1], groups[] does not have permission to access the requested resource. No applicable policies could be found. Returning Forbidden response. The Group Name is empty/not picked up. What could be wrong here?
... View more
03-07-2018
05:52 PM
@Matt Clarke, thank you. I will update after I give this a try.
... View more
03-07-2018
05:29 PM
Hi All, Is there a document that details on how to configure LDAP Group Authorization for NiFi - Ranger? This is for HDF 3.1.1 or NiFi 1.5 With the default configuration, NiFi still needs the policies to be defined for every user. Group level policies doesn't take into effect. Assuming that there is some configuration that is missing to the configs. EDIT: userGroupProvider <userGroupProvider>
<identifier>ldap-user-group-provider</identifier>
<class>org.apache.nifi.ldap.tenants.LdapUserGroupProvider</class>
<property name="Authentication Strategy">SIMPLE</property>
<property name="Manager DN">uid=admin,cn=blah,cn=blah,dc=blah,dc=com</property>
<property name="Manager Password">blah</property>
<property name="TLS - Keystore"></property>
<property name="TLS - Keystore Password"></property>
<property name="TLS - Keystore Type"></property>
<property name="TLS - Truststore"></property>
<property name="TLS - Truststore Password"></property>
<property name="TLS - Truststore Type"></property>
<property name="TLS - Client Auth"></property>
<property name="TLS - Protocol"></property>
<property name="TLS - Shutdown Gracefully"></property>
<property name="Referral Strategy">FOLLOW</property>
<property name="Connect Timeout">10 secs</property>
<property name="Read Timeout">10 secs</property>
<property name="Url">ldap://blah.ldap.com:389</property>
<property name="Page Size"></property>
<property name="Sync Interval">30 mins</property>
<property name="User Search Base">cn=users,cn=accounts,dc=blah,dc=blah,dc=com</property>
<property name="User Object Class">person</property>
<property name="User Search Scope">SUBTREE</property>
<property name="User Search Filter">(uid={0})</property>
<property name="User Identity Attribute">USE_USERNAME</property>
<property name="User Group Name Attribute"></property>
<property name="User Group Name Attribute - Referenced Group Attribute"></property>
<property name="Group Search Base">cn=groups,cn=accounts,dc=blah,dc=blah,dc=com</property>
<property name="Group Object Class">groupofnames</property>
<property name="Group Search Scope">SUBTREE</property>
<property name="Group Search Filter">(cn={0})</property>
<property name="Group Name Attribute">cn</property>
<property name="Group Member Attribute">member</property>
<property name="Group Member Attribute - Referenced User Attribute">uid</property>
</userGroupProvider> Sample User - LDAP Sample Group - LDAP
... View more
Labels:
- Labels:
-
Apache NiFi
-
Apache Ranger
02-16-2018
05:48 PM
3 Kudos
In this article, we will have a walk through of integrating LDAP with NiFi Registry. The precondition for LDAP to work with NiFi-Registry is that SSL need to be enabled. This article also covers the "How To" enable SSL for NiFI Registry. For the sake of simplicity, I am using self signed certificates (JKS, using keytool). Steps for creating self signed certificates are as below :- Generate KeyStore keytool -genkey -keyalg RSA -validity 3650 -alias <alias_name> -keypass <pwd> -storepass <pwd> -dname "cn=hostname, ou=home, o=ak, c=us" -keystore nifi_reg_keystore.jks Export a certificate with the public key keytool -export -alias <alias_name> -file nifi_reg.cer -storepass <pwd> -keystore nifi_reg_keystore.jks Generate TrustStore keytool -import -noprompt -alias nr-c0 -file nifi_reg.cer -storepass changeitchangeit -keystore nifi_reg_truststore.jks
Below, is a representation of the NiFi Registry UI with default http and anonymous user login. Now we will login to Ambari and use the above generated certificate details to complete setting up SSL. On the configuration tab, search for the SSL settings. The section to be edited is as below:- Populate the form with the details of the truststore and keystore that we generated above. At this stage, SSL setup for NiFi Registry is complete, however we have't assigned any users to login to the UI. Either we generate a certificate for an Initial Admin, or create an initial admin from the LDAP user base. We will use an LDAP user as the Initial Admin for the NiFi Registry. The configuration information for the same is as below, There are 4 sections that need to be edited, Configure Initial Admin Configure Security Identity Provider (nifi.registry.security.identity.provider) Configure login-identity-providers.xml
Remove the 2 lines that say
To enable the ldap-identity-provider remove 2 lines. This is 1 of 2. To enable the ldap-identity-provider remove 2 lines. This is 2 of 2. Fill details specific to your LDAP server. I am using a SIMPLE auth strategy with a non ssl LDAP server. Below are relevant sections from my configuration window
Configure authorizers.xml
Remove the 2 lines that say
To enable the ldap-user-group-provider remove 2 lines. This is 1 of 2. To enable the ldap-user-group-provider remove 2 lines. This is 2 of 2. Configure ldap-user-group-provider Configure accessPolicyProvider The below screenshot shows the relevant section for configuring the ldap-user-group-provider Screenshot below shows the configuration changes needed to the access policy provider. Set User Group Provider to ldap-user-group-provider. At this stage, we are in a position to save all the configuration changes and restart the NiFI Registry services. Follow the Ambari prompts and you should see as below Now we should be good to access the NiFi Registry UI and login as the Initial Admin that is configured. guest1 in my example. Access NiFi Registry UI from Quick Links Login Using the Initial Admin Credentials Verify login is successful Verify users are available/sync-ed You should be able to proceed using the NiFi registry from here on.
... View more
- Find more articles tagged with:
- Data Ingestion & Streaming
- hdf3.1
- How-ToTutorial
- integration
- nifi-registry
Labels:
11-28-2017
10:31 PM
@Matt Clarke : question on the /resources policy - The server running Ranger should be granted “read” privileges to this resource. How do we accomplish this? Is SSL for Ranger mandatory in this case?
... View more
10-06-2017
06:47 PM
@Karthik Narayanan it was NiFi 1.1 and looking at the pom.xml, I am assuming there is an avro 1.7 dependency. The logical time stamp stuff was introduced in avro 1.8
... View more
10-05-2017
08:38 PM
Thanks @Karthik Narayanan, same reason that I was on a NiFi that had Avro 1.7 the logicalType doesn't work. If I remember correctly, it threw an invalid avro schema error.
... View more
10-05-2017
05:46 PM
Thanks @Karthik Narayanan, I can use them now, but the question was when we had not released NiFi 1.2
... View more
09-13-2017
02:16 PM
1 Kudo
We would not make a performance comparison of Spark and NiFi in here for this usecase since we aren't talking even running into a cluster. Since the data volume is low and processing is finite, I would prefer NiFi to be used since the entire life cycle gets reduced. + it is UI driven.
... View more
09-12-2017
06:40 PM
Understanding the problem statement here, you just have small files on the local file system and do not want to use any HDFS layer in here. Spark can, but may be an over kill for this scenario. You can start with basic java jdbc code to get this task accomplished. However, you have Apache NiFi within the HDF stack that can easily solve this problem for you. You can simply design a dataflow that will monitor for new files into your SFTP location, pull those files to NiFi as soon as they land, optionally parse them/apply some transformations, perform a mysql operation. You can use this as a reference. Additionally, NiFi can also be used as a powerful MySQL CDC options. You can see more details on this 3 part article.
... View more
09-07-2017
06:24 PM
Looks like the serde is not in the class path. Can you try add jar /usr/hdp/<version number>/hive/lib/hive-contrib-<version>.jar; and then create the table.
... View more