About TimothySpann

TimothySpann · ‎12-15-2017

Did you reboot? Did you do the upgrade with Ambari? livy and spark should be updated. make sure you submit jobs to livy2. i would uninstall existing livy, reboot, install livy2

TimothySpann · ‎12-15-2017

I will try that, thanks!!!!

TimothySpann · ‎12-14-2017

I needed to build a quick SQL table from a JSON. There's some online tools, but I'd rather Java this process. It works okay enough, now I am wondering if this would make a good Apache NiFi processor package com.dataflowdeveloper.processors.process; import java.util.Iterator; import java.util.Map; import com.fasterxml.jackson.core.JsonFactory; import com.fasterxml.jackson.databind.JsonNode; import com.fasterxml.jackson.databind.ObjectMapper; public class JsonToDDL { /** * * @param tableName * @param json * @return String DDL SQL */ public String parse(String tableName, String json) { JsonFactory factory = new JsonFactory(); StringBuilder sql = new StringBuilder(256); sql.append("CREATE TABLE ").append(tableName).append(" ( "); ObjectMapper mapper = new ObjectMapper(factory); JsonNode rootNode = null; try { rootNode = mapper.readTree(json); } catch (Exception e) { e.printStackTrace(); } Iterator<Map.Entry<String, JsonNode>> fieldsIterator = rootNode.fields(); while (fieldsIterator.hasNext()) { Map.Entry<String, JsonNode> field = fieldsIterator.next(); System.out.println("Key: " + field.getKey() + "\tValue:" + field.getValue()); sql.append(field.getKey()); if (field.getValue().canConvertToInt()) { sql.append(" INT, "); } else if (field.getValue().canConvertToLong()) { sql.append(" LONG, "); } else if (field.getValue().asText().contains("/")) { sql.append(" DATE, "); } else if (field.getValue().asText().contains("-")) { sql.append(" DATE, "); } else if (field.getValue().asText().length() > 25) { sql.append(" VARCHAR( ").append( field.getValue().asText().length() + 25 ) .append("), "); } else { sql.append(" VARCHAR(25), "); } } // end table sql.deleteCharAt(sql.length() - 2); sql.append(" ) "); return sql.toString(); } public static void main(String[] args) { JsonToDDL ddl = new JsonToDDL(); String json = "{\"EMP_ID\":3001,\"DURATION_SEC\":288000,\"LOG_DATE\":\"2017-11-07 10:00:00\"}"; String ddlSQL = ddl.parse("TIME_LOG", json); System.out.println("DDL=" + ddlSQL); json = " {\"EMP_ID\":4001,\"GENDER\": \"M\",\"DEPT_ID\":4, \"FIRST_NAME\":\"Brett\",\"LAST_NAME\" :\"Lee\"}"; ddlSQL = ddl.parse("EMPLOYEE", json); System.out.println("DDL=" + ddlSQL); json = "{\"DEPT_ID\":1,\"CODE\": \"FN\",\"NAME\":\"Finance\",\"DESCRIPTION\" :\"Finance Department\",\"ACTIVE\":1}"; ddlSQL = ddl.parse("DEPARTMENT", json); System.out.println("DDL=" + ddlSQL); } } In tests it looks okay. I do some guessing of what type something should be from the JSON value. I was also thinking I could hook this up to Avro Tools to do some other type investigation. So should I make this a processor? Or just a little script. Reference: https://github.com/kite-sdk/kite http://kitesdk.org/docs/1.1.0/Inferring-a-Schema-from-a-Java-Class.html https://avro.apache.org/docs/1.8.1/api/java/org/apache/avro/reflect/package-summary.html https://avro.apache.org/docs/1.8.2/gettingstartedjava.html#Serializing+and+deserializing+with+code+generation

TimothySpann · ‎12-14-2017

This was tried in Apache NiFi 1.3 and 1.4. Is DB2 Type 4 driver supported by Apache NiFi? This is the DB2 jdbc4 driver supports Java 8 jdbc-drivers/db2jcc4.jar Permissions are good com.ibm.db2.jcc.DB2Driver jdbc:db2://test.myserverrocks.com:50000/MYDB2DB Error in Logs ERROR [Timer-Driven Process Thread-1] o.a.n.p.standard.QueryDatabaseTable QueryDatabaseTable[id=516a17e9-0160-1000-6fbe-de7e80b72c3b] Unable to execute SQL select query SELECT * FROM MYDATA.MYTABLE due to org.apache.nifi.processor.exception.ProcessException: Error during database query or conversion of records to Avro.: {} org.apache.nifi.processor.exception.ProcessException: Error during database query or conversion of records to Avro. at org.apache.nifi.processors.standard.QueryDatabaseTable.lambda$onTrigger$13(QueryDatabaseTable.java:305) at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2529) at org.apache.nifi.processors.standard.QueryDatabaseTable.onTrigger(QueryDatabaseTable.java:299) at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1120) at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147) at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.ibm.db2.jcc.am.SqlException: [jcc][t4][10120][10898][4.16.53] Invalid operation: result set is closed. ERRORCODE=-4470, SQLSTATE=null at com.ibm.db2.jcc.am.fd.a(fd.java:723) at com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:103) at com.ibm.db2.jcc.am.ResultSet.checkForClosedResultSet(ResultSet.java:4598) at com.ibm.db2.jcc.am.ResultSet.getMetaDataX(ResultSet.java:1899) at com.ibm.db2.jcc.am.ResultSet.getMetaData(ResultSet.java:1891) at org.apache.commons.dbcp.DelegatingResultSet.getMetaData(DelegatingResultSet.java:322) at org.apache.commons.dbcp.DelegatingResultSet.getMetaData(DelegatingResultSet.java:322) at org.apache.nifi.processors.standard.util.JdbcCommon.createSchema(JdbcCommon.java:422) at org.apache.nifi.processors.standard.util.JdbcCommon.convertToAvroStream(JdbcCommon.java:242) at org.apache.nifi.processors.standard.QueryDatabaseTable.lambda$onTrigger$13(QueryDatabaseTable.java:303) ... 13 common frames omitted

TimothySpann · ‎12-13-2017

Probably not a good idea to do that, Zeppelin is a notebook for exploration. Any Spark or sql code in there you should run in a Spark job and call that from Livy https://community.hortonworks.com/articles/148730/integrating-apache-spark-2x-jobs-with-apache-nifi.html

TimothySpann · ‎12-11-2017

Apache NiFi can ingest and clean up this MongoDB and monitor for errors. Machine Learning and Deep Learning flows can be trigger from Apache NiFi via Apache Livy for Apache Spark ML and also Apache MXNet and TensorFlow deep learning. This can also be done via Kafka/S2S and other streaming mechanisms. https://community.hortonworks.com/content/kbentry/53554/using-apache-nifi-100-with-mongodb.html https://community.hortonworks.com/content/kbentry/146198/data-flow-enrichment-with-nifi-part-3-lookuprecord.html https://community.hortonworks.com/articles/148730/integrating-apache-spark-2x-jobs-with-apache-nifi.html

TimothySpann · ‎12-11-2017

Building schemas is tedious work and fraught with errors. The InferAvroSchema processor can get you started. It generates a compliant schema for use. There is one caveat, you have to make sure you are using Apache Avro safe field names. I have a custom processor that will clean your attributes if you need them Avro-safe. See processor listed below. Example Flow Utilizing InferAvroSchema InferAvroSchema Details Step 0: Use Apache NiFi to Convert Data to JSON or CSV Step 1: Send JSON or CSV Data to InferAvroSchema I recommend setting output destination to flowfile-attribute, input content type to json, pretty avro output to true. Step 2: The New schema is now in attribute: inferred.avro.schema. inferred.avro.schema { "type" : "record", "name" : "schema1", "fields" : [ { "name" : "table", "type" : "string", "doc" : "Type inferred from '\"schema1.tableName\"'" } ] } This schema can then be used for conversions directly or stored in Hortonworks Schema Registry or Apache NiFi Built-in Avro Registry. Now you can use it for ConvertRecord, QueryRecord and other Record processing. Example Generated Schema in Avro-JSON Format Stored in Hortonworks Schema Registry: Source: https://github.com/tspannhw/nifi-attributecleaner-processor

TimothySpann · ‎12-08-2017

See https://community.hortonworks.com/articles/148730/integrating-apache-spark-2x-jobs-with-apache-nifi.html

TimothySpann · ‎12-06-2017

Split Json and then use ConverRecord

TimothySpann · ‎12-06-2017

Yes continuously, automatically. By default it polls for new files every 60 seconds, you can shrink that. You can also convert those files to Apache ORC and auto build new Hive tables on them if the files are CSV, TSV, Avro, Excel, JSON, XML, EDI, HL7 or C-CDA. Install Apache NiFi on an edge node, there are ways to combine them with HDP 2.6 and HDF 3 with the new Ambari. But it's easiest to have a separate node for Apache NiFi to start. You can also just download nifi unzip and run on a laptop that has JDK 8 installed https://www.apache.org/dyn/closer.lua?path=/nifi/1.4.0/nifi-1.4.0-bin.zip

Online	Offline
Last Visited	‎05-20-2024 05:42 PM

Member Since	‎01-07-2019 11:58 AM
Last Visited	‎05-20-2024 05:42 PM
Posts	1,973
Kudos received	1122

Cloudera Community

Re: Has anyone tried NiFi consuming (JMSConsume) f...

Re: NiFi Crash after runing chain of lookups

Re: Recommend approach for listening to RSS Feed i...

Re: NiFi ListenFTP Processor Default Data Port

Re: Nifi: Kafka Producer with Avro format in both ...

Re: Livy Job failing after upgrading cluster to HD...

Re: Connecting Apache NiFi and Querying tables to ...

Converting JSON to SQL DDL

Connecting Apache NiFi and Querying tables to DB2

Re: Apache Nifi with Zeppelin

Re: Using MongoDB and HortonWorks to apply machine...

Generating AVRO Schemas and Ensuring Field Names M...

Re: Submitting Spark Jobs From Apache NiFi Using L...

Re: NiFi JSON array to CSV file

Re: can we import filles from SFTP to HDFS directl...