1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1914 | 04-03-2024 06:39 AM | |
| 3011 | 01-12-2024 08:19 AM | |
| 1643 | 12-07-2023 01:49 PM | |
| 2420 | 08-02-2023 07:30 AM | |
| 3361 | 03-29-2023 01:22 PM |
12-15-2017
02:09 PM
Did you reboot? Did you do the upgrade with Ambari? livy and spark should be updated. make sure you submit jobs to livy2. i would uninstall existing livy, reboot, install livy2
... View more
12-15-2017
12:25 AM
I will try that, thanks!!!!
... View more
12-14-2017
09:57 PM
2 Kudos
I needed to build a quick SQL table from a JSON. There's some online tools, but I'd rather Java this process. It works okay enough, now I am wondering if this would make a good Apache NiFi processor package com.dataflowdeveloper.processors.process;
import java.util.Iterator;
import java.util.Map;
import com.fasterxml.jackson.core.JsonFactory;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
public class JsonToDDL {
/**
*
* @param tableName
* @param json
* @return String DDL SQL
*/
public String parse(String tableName, String json) {
JsonFactory factory = new JsonFactory();
StringBuilder sql = new StringBuilder(256);
sql.append("CREATE TABLE ").append(tableName).append(" ( ");
ObjectMapper mapper = new ObjectMapper(factory);
JsonNode rootNode = null;
try {
rootNode = mapper.readTree(json);
} catch (Exception e) {
e.printStackTrace();
}
Iterator<Map.Entry<String, JsonNode>> fieldsIterator = rootNode.fields();
while (fieldsIterator.hasNext()) {
Map.Entry<String, JsonNode> field = fieldsIterator.next();
System.out.println("Key: " + field.getKey() + "\tValue:" + field.getValue());
sql.append(field.getKey());
if (field.getValue().canConvertToInt()) {
sql.append(" INT, ");
} else if (field.getValue().canConvertToLong()) {
sql.append(" LONG, ");
} else if (field.getValue().asText().contains("/")) {
sql.append(" DATE, ");
} else if (field.getValue().asText().contains("-")) {
sql.append(" DATE, ");
} else if (field.getValue().asText().length() > 25) {
sql.append(" VARCHAR( ").append( field.getValue().asText().length() + 25 ) .append("), ");
} else {
sql.append(" VARCHAR(25), ");
}
}
// end table
sql.deleteCharAt(sql.length() - 2);
sql.append(" ) ");
return sql.toString();
}
public static void main(String[] args) {
JsonToDDL ddl = new JsonToDDL();
String json = "{\"EMP_ID\":3001,\"DURATION_SEC\":288000,\"LOG_DATE\":\"2017-11-07 10:00:00\"}";
String ddlSQL = ddl.parse("TIME_LOG", json);
System.out.println("DDL=" + ddlSQL);
json = " {\"EMP_ID\":4001,\"GENDER\": \"M\",\"DEPT_ID\":4, \"FIRST_NAME\":\"Brett\",\"LAST_NAME\" :\"Lee\"}";
ddlSQL = ddl.parse("EMPLOYEE", json);
System.out.println("DDL=" + ddlSQL);
json = "{\"DEPT_ID\":1,\"CODE\": \"FN\",\"NAME\":\"Finance\",\"DESCRIPTION\" :\"Finance Department\",\"ACTIVE\":1}";
ddlSQL = ddl.parse("DEPARTMENT", json);
System.out.println("DDL=" + ddlSQL);
}
}
In tests it looks okay. I do some guessing of what type something should be from the JSON value. I was also thinking I could hook this up to Avro Tools to do some other type investigation. So should I make this a processor? Or just a little script. Reference: https://github.com/kite-sdk/kite http://kitesdk.org/docs/1.1.0/Inferring-a-Schema-from-a-Java-Class.html https://avro.apache.org/docs/1.8.1/api/java/org/apache/avro/reflect/package-summary.html https://avro.apache.org/docs/1.8.2/gettingstartedjava.html#Serializing+and+deserializing+with+code+generation
... View more
Labels:
12-14-2017
04:29 PM
This was tried in Apache NiFi 1.3 and 1.4. Is DB2 Type 4 driver supported by Apache NiFi? This is the DB2 jdbc4 driver supports Java 8 jdbc-drivers/db2jcc4.jar Permissions are good com.ibm.db2.jcc.DB2Driver jdbc:db2://test.myserverrocks.com:50000/MYDB2DB Error in Logs ERROR [Timer-Driven Process Thread-1] o.a.n.p.standard.QueryDatabaseTable QueryDatabaseTable[id=516a17e9-0160-1000-6fbe-de7e80b72c3b] Unable to execute SQL select query SELECT * FROM MYDATA.MYTABLE due to org.apache.nifi.processor.exception.ProcessException: Error during database query or conversion of records to Avro.: {}
org.apache.nifi.processor.exception.ProcessException: Error during database query or conversion of records to Avro.
at org.apache.nifi.processors.standard.QueryDatabaseTable.lambda$onTrigger$13(QueryDatabaseTable.java:305)
at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2529)
at org.apache.nifi.processors.standard.QueryDatabaseTable.onTrigger(QueryDatabaseTable.java:299)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1120)
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147)
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.ibm.db2.jcc.am.SqlException: [jcc][t4][10120][10898][4.16.53] Invalid operation: result set is closed. ERRORCODE=-4470, SQLSTATE=null
at com.ibm.db2.jcc.am.fd.a(fd.java:723)
at com.ibm.db2.jcc.am.fd.a(fd.java:60)
at com.ibm.db2.jcc.am.fd.a(fd.java:103)
at com.ibm.db2.jcc.am.ResultSet.checkForClosedResultSet(ResultSet.java:4598)
at com.ibm.db2.jcc.am.ResultSet.getMetaDataX(ResultSet.java:1899)
at com.ibm.db2.jcc.am.ResultSet.getMetaData(ResultSet.java:1891)
at org.apache.commons.dbcp.DelegatingResultSet.getMetaData(DelegatingResultSet.java:322)
at org.apache.commons.dbcp.DelegatingResultSet.getMetaData(DelegatingResultSet.java:322)
at org.apache.nifi.processors.standard.util.JdbcCommon.createSchema(JdbcCommon.java:422)
at org.apache.nifi.processors.standard.util.JdbcCommon.convertToAvroStream(JdbcCommon.java:242)
at org.apache.nifi.processors.standard.QueryDatabaseTable.lambda$onTrigger$13(QueryDatabaseTable.java:303)
... 13 common frames omitted
... View more
Labels:
- Labels:
-
Apache NiFi
12-13-2017
04:23 PM
1 Kudo
Probably not a good idea to do that, Zeppelin is a notebook for exploration. Any Spark or sql code in there you should run in a Spark job and call that from Livy https://community.hortonworks.com/articles/148730/integrating-apache-spark-2x-jobs-with-apache-nifi.html
... View more
12-11-2017
04:12 PM
1 Kudo
Apache NiFi can ingest and clean up this MongoDB and monitor for errors. Machine Learning and Deep Learning flows can be trigger from Apache NiFi via Apache Livy for Apache Spark ML and also Apache MXNet and TensorFlow deep learning. This can also be done via Kafka/S2S and other streaming mechanisms. https://community.hortonworks.com/content/kbentry/53554/using-apache-nifi-100-with-mongodb.html https://community.hortonworks.com/content/kbentry/146198/data-flow-enrichment-with-nifi-part-3-lookuprecord.html https://community.hortonworks.com/articles/148730/integrating-apache-spark-2x-jobs-with-apache-nifi.html
... View more
12-11-2017
03:02 PM
2 Kudos
Building schemas is tedious work and fraught with errors. The InferAvroSchema processor can get you started. It generates a compliant schema for use. There is one caveat, you have to make sure you are using Apache Avro safe field names. I have a custom processor that will clean your attributes if you need them Avro-safe. See processor listed below. Example Flow Utilizing InferAvroSchema InferAvroSchema Details Step 0: Use Apache NiFi to Convert Data to JSON or CSV Step 1: Send JSON or CSV Data to InferAvroSchema I recommend setting output destination to flowfile-attribute, input content type to json, pretty avro output to true. Step 2: The New schema is now in attribute: inferred.avro.schema. inferred.avro.schema
{ "type" : "record", "name" : "schema1", "fields" : [ { "name" : "table", "type" : "string", "doc" : "Type inferred from '\"schema1.tableName\"'" } ] } This schema can then be used for conversions directly or stored in Hortonworks Schema Registry or Apache NiFi Built-in Avro Registry. Now you can use it for ConvertRecord, QueryRecord and other Record processing. Example Generated Schema in Avro-JSON Format Stored in Hortonworks Schema Registry: Source: https://github.com/tspannhw/nifi-attributecleaner-processor
... View more
Labels:
12-08-2017
02:18 PM
See https://community.hortonworks.com/articles/148730/integrating-apache-spark-2x-jobs-with-apache-nifi.html
... View more
12-06-2017
05:16 PM
Yes continuously, automatically. By default it polls for new files every 60 seconds, you can shrink that. You can also convert those files to Apache ORC and auto build new Hive tables on them if the files are CSV, TSV, Avro, Excel, JSON, XML, EDI, HL7 or C-CDA. Install Apache NiFi on an edge node, there are ways to combine them with HDP 2.6 and HDF 3 with the new Ambari. But it's easiest to have a separate node for Apache NiFi to start. You can also just download nifi unzip and run on a laptop that has JDK 8 installed https://www.apache.org/dyn/closer.lua?path=/nifi/1.4.0/nifi-1.4.0-bin.zip
... View more