Member since
09-06-2016
108
Posts
36
Kudos Received
11
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2097 | 05-11-2017 07:41 PM | |
866 | 05-06-2017 07:36 AM | |
5403 | 05-05-2017 07:00 PM | |
1978 | 05-05-2017 06:52 PM | |
5196 | 05-02-2017 03:56 PM |
11-23-2023
02:54 PM
CREATE EXTERNAL TABLE dwsimp.dim_agrupamento ( id INT, agrupamento_nome STRING, agrupamento_ordem INT, dim_relatorio_id INT, agrupamento_campo STRING ) STORED AS ORC TBLPROPERTIES (org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler, mapred.jdbc.driver.class = "oracle.jdbc.OracleDriver", mapred.jdbc.url = "jdbc:oracle:thin:@//jdbc:oracle:thin:@//host:port/servicename", mapred.jdbc.username = "user", mapred.jdbc.password= "password", mapred.jdbc.input.table.name="JDBCTable", mapred.jdbc.output.table.name="JDBCTable", mapred.jdbc.hive.lazy.split"= "false"); Error: Error while compiling statement: FAILED: ParseException line 10:2 cannot recognize input near 'org' '.' 'apache' in table properties list (state=42000,code=40000)
... View more
06-23-2020
02:01 AM
in hdp 3.1 the file is in app/hbase/data/Wal's you will find the server name that wont start ==> delete it
... View more
05-19-2017
01:49 PM
Ingest data with NIFI to Hive LLAP and Druid Setting up Hive LLAP Setting up Druid Configuring the dimensions Setting up Superset Connection to Druid Creating a dashboard with visualisations Differences with ES/Kibana and SOLR/Banana
... View more
Labels:
05-10-2017
08:01 PM
3 Kudos
The standard solution Let's say you want to collect log messages from an edge cluster with NIFI, and push it to a central NIFI cluster via the Site To Site (S2S) protocol. This is exactly what NIFI is designed for, and results in a simple flow setup like this:
A processor that tails the log file which sends it's flowfiles to a remote process group which is configured with the FQDN URL of the central NIFI cluster on the central NIFI cluster an INPUT port is defined and from that input port the rest of the flow is doing it's thing with the incoming flow files, like filtering, transformations and eventually sinking it into kafka, HDFS or SOLR. The NIFI S2S protocol is used for the connection between the edge NIFI cluster and the central nifi cluster. which PUSHES the flowfiles from the edge cluster to the central NIFI cluster. And now with a firewall blocking incoming connections in between This standard setup however assumes the central NIFI cluster has a public FQDN and isn't behind a firewall blocking incoming connections. But what if there is a firewall blocking incoming connections? Fear not! The flexibility of NIFI comes to the rescue once again.
The solution is to move the initiation of the S2S connection from the edge NIFI to central NIFI:
The remote process group in defined on the central node, which connects to a output port on the edge node as the edge NIFI node has a public FQDN (this is required!) and instead of a S2S PUSH, the data is effectively PULLED from the edge NIFI cluster to the central NIFI cluster. To be clear: this setup has the downside that the central cluster NIFI will need to know about all edge clusters. Not necessarily a big deal, just means the flow in the central NIFI cluster needs to be updated when edge clusters/nodes are added. But if you can't change the fact you have a firewall blocking incoming connections, it does the job. Example solution NIFI flow setup Screenshot of flow on Edge Node with a TailFile processor that send it's flowfiles to the output port named `logs`: Screenshot of flow on central NIFI cluster with a remote process group pointed to the FQDN of the Edge Node and a connection from the output port `logs` to the rest of the flow: The configuration of the remote process group: And the details of the `logs` connection:
... View more
Labels:
05-09-2017
09:22 AM
1 Kudo
To get an idea of the write performance of a Spark cluster i've created a Spark version of the standard
TestDFSIO tool, which measures the I/O performance of HDFS in your cluster. Lies, damn lies and benchmarks, so the goal of this tool is providing a sanity check of your Spark setup, focusing on the HDFS writing performance, not on the compute performance. Think the tool can be improved? Feel free to submit a pull request or raise a Github issue Getting the Spark Jar
Download the Spark Jar from here:
https://github.com/wardbekker/benchmark/releases/download/v0.1/benchmark-1.0-SNAPSHOT-jar-with-dependencies.jar
It's build for Spark 1.6.2 / Scala 2.10.5 Or build from from source
$ git clone https://github.com/wardbekker/benchmark
$ cd benchmark && mvn clean package Submit args explains
<file/partitions> : should ideally be equal to recommended spark.default.parallelism (cores x instances).
<bytes_per_file> : should fit in memory: for example: 90000000.
<write_repetitions> : no of re-writing of the test RDD to disk. benchmark will be averaged.
spark-submit --class org.ward.Benchmark --master yarn --deploy-mode cluster --num-executors X --executor-cores Y --executor-memory Z target/benchmark-1.0-SNAPSHOT-jar-with-dependencies.jar <files/partitions> <bytes_per_file> <write_repetitions>
CLI Example for 12 workers with 30GB mem per node: It's important to get the amount of executors and cores right: you want to get the maximum amount of parallelism without going over the maximum capacity of the cluster.
This command will write out the generated RDD 10 times, and will calculate an aggregate throughput over it.
spark-submit --class org.ward.Benchmark --master yarn --deploy-mode cluster --num-executors 60 --executor-cores 3 --executor-memory 4G target/benchmark-1.0-SNAPSHOT-jar-with-dependencies.jar 180 90000000 10
Retrieving benchmark results: You can retrieve the benchmark results by running yarn log in this way:
yarn logs -applicationId <application_id> | grep 'Benchmark'
for example:
Benchmark: Total volume : 81000000000 Bytes
Benchmark: Total write time : 74.979 s
Benchmark: Aggregate Throughput : 1.08030246E9 Bytes per second
So that's about 1 GB write per sec for this run.
... View more
Labels:
05-06-2017
06:12 AM
3 Kudos
Mindwave Neurosky The Mindwave Neurosky is a headset that allows you to record your brainwaves using EEG technology. In this article we show you how to ingest these brainwaves with NIFI Mindwave Neurosky driver installation for OSX Sierra
Download and install the latest driver from http://download.neurosky.com/public/Products/MindWave%20headset/RF%20driver%20for%20Mac/MindWaveDriver5.1.pkg After the driver is installed, download and install the latest MindWave Manager from http://download.neurosky.com/public/Products/MindWave%20headset/RF%20driver%20for%20Mac/MindWave%20Manager4.0.4.zip Launch the MindWave Manager, navigate to "Pairing" section and click the "Search for MindWave", then follow the instructions to pair the headset. Install NIFI on OSX Sierra with Homebrew
Install Homebrew from the terminal: /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" Install NIFI (time of writing v1.1.2): brew install nifi Import NIFI Flow Template
An example flow template can be downloaded using Curl:
curl -O https://gist.githubusercontent.com/wardbekker/a80cbe7d12bc1866f393c5a74bf417a0/raw/9d64daa748dec352ebe8cc350e3fb34e65130ec3/mindwave_nifi_ingest_template.xml
The most important processor here is the ListenTCP processor, which will listen on port 20000 and will receive the JSON payload. The flow also contains a Site 2 Site NIFI connection to a remote processgroup with the URL
http://wbekkerhdf0.field.hortonworks.com:9090/nifi . You can change it to your own remote NIFI cluster. Get Ruby 'forward' script
The Mindwave Thinkgear driver will create a socket where we can consume the sensor data as Json messages. To ingest it with the current vanilla version of NIFI, we need to 'forward' the messages from the thinkgear port to the NIFI ListenTCP processor port number. Upcoming versions of NIFI will have a
GetTCP processor, making this Ruby script obsolete.
Save this Ruby script as a file under
thinkgear.rb . Run it with ruby thinkgear.rb AFTER you have connected your headset AND started ListenTCP processor on the NIFI flow. Otherwise you will run into connection errors.
require 'socket'
require 'json'
require 'date'
thinkgear_server_socket = TCPSocket.new 'localhost', 13854
nifi_server_socket = TCPSocket.new 'localhost', 20000
# trigger json output
thinkgear_server_socket.puts "{\"enableRawOutput\": true, \"format\": \"Json\"}\n"
while line = thinkgear_server_socket.gets # Read lines from socket
hash = JSON.parse(line)
hash['timestamp'] = DateTime.now.strftime('%Q')
hash['user_id'] = 1
json = JSON.generate(hash)
puts json
nifi_server_socket.puts json
end
thinkgear_server_socket.close
nifi_server_socket.close
Start ingestion of your brainwaves
Connect you headset by launching the MindWave Manager, navigate to "Pairing" section and click the "Search for MindWave", then follow the instructions to pair the headset. Start the NIFI flow, or at least the ListenTCP processor. Start the ruby script with ruby thinkgear.rb .
At this point you should see JSON output from your Mindwave headset on your terminal, and new flowfiles into NIFI. Have fun with your brainwaves!
... View more
Labels:
05-10-2017
07:21 PM
@Bryan Bende, placed the phoenix jar file in NiFi's work directory path and restarted NiFi instance; now seeing a different error in the log when I enable the HBase client service - "org.apache.nifi.StdErr java.lang.NoSuchMethodError: org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosKeyTab(Ljavax/security/auth/Subject;)Z" Failed to invoke @OnEnabled method due to java.lang.NoSuchMethodError: org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosKeyTab(Ljavax/security/auth/Subject;)Z
2017-05-10 14:13:50,016 ERROR [NiFi logging handler] org.apache.nifi.StdErr [StandardProcessScheduler Thread-6] ERROR org.apache.nifi.controller.service.StandardControllerServiceNode -
2017-05-10 14:13:50,016 ERROR [NiFi logging handler] org.apache.nifi.StdErr java.lang.NoSuchMethodError: org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosKeyTab(Ljavax/security/auth/Subject;)Z
2017-05-10 14:13:50,016 ERROR [NiFi logging handler] org.apache.nifi.StdErr at org.apache.hadoop.security.UserGroupInformation.<init>(UserGroupInformation.java:623)
2017-05-10 14:13:50,017 ERROR [NiFi logging handler] org.apache.nifi.StdErr at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:1200)
2017-05-10 14:13:50,017 ERROR [NiFi logging handler] org.apache.nifi.StdErr at org.apache.nifi.hadoop.SecurityUtil.loginKerberos(SecurityUtil.java:52)
2017-05-10 14:13:50,017 ERROR [NiFi logging handler] org.apache.nifi.StdErr at org.apache.nifi.hbase.HBase_1_1_2_ClientService.createConnection(HBase_1_1_2_ClientService.java:226)
2017-05-10 14:13:50,017 ERROR [NiFi logging handler] org.apache.nifi.StdErr at org.apache.nifi.hbase.HBase_1_1_2_ClientService.onEnabled(HBase_1_1_2_ClientService.java:178)
2017-05-10 14:13:50,017 ERROR [NiFi logging handler] org.apache.nifi.StdErr at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2017-05-10 14:13:50,017 ERROR [NiFi logging handler] org.apache.nifi.StdErr at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2017-05-10 14:13:50,017 ERROR [NiFi logging handler] org.apache.nifi.StdErr at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2017-05-10 14:13:50,017 ERROR [NiFi logging handler] org.apache.nifi.StdErr at java.lang.reflect.Method.invoke(Method.java:498)
2017-05-10 14:13:50,018 ERROR [NiFi logging handler] org.apache.nifi.StdErr at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:137)
2017-05-10 14:13:50,018 ERROR [NiFi logging handler] org.apache.nifi.StdErr at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:125)
2017-05-10 14:13:50,018 ERROR [NiFi logging handler] org.apache.nifi.StdErr at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:70)
2017-05-10 14:13:50,018 ERROR [NiFi logging handler] org.apache.nifi.StdErr at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotation(ReflectionUtils.java:47)
2017-05-10 14:13:50,018 ERROR [NiFi logging handler] org.apache.nifi.StdErr at org.apache.nifi.controller.service.StandardControllerServiceNode$2.run(StandardControllerServiceNode.java:348)
2017-05-10 14:13:50,018 ERROR [NiFi logging handler] org.apache.nifi.StdErr at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
2017-05-10 14:13:50,018 ERROR [NiFi logging handler] org.apache.nifi.StdErr at java.util.concurrent.FutureTask.run(FutureTask.java:266)
2017-05-10 14:13:50,018 ERROR [NiFi logging handler] org.apache.nifi.StdErr at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
2017-05-10 14:13:50,019 ERROR [NiFi logging handler] org.apache.nifi.StdErr at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
2017-05-10 14:13:50,020 ERROR [NiFi logging handler] org.apache.nifi.StdErr at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
2017-05-10 14:13:50,020 ERROR [NiFi logging handler] org.apache.nifi.StdErr at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
2017-05-10 14:13:50,020 ERROR [NiFi logging handler] org.apache.nifi.StdErr at java.lang.Thread.run(Thread.java:745)
2017-05-10 14:13:50,020 ERROR [NiFi logging handler] org.apache.nifi.StdErr [StandardProcessScheduler Thread-6] ERROR org.apache.nifi.controller.service.StandardControllerServiceNode - Failed to invoke @OnEnabled method of HBase_1_1_2_ClientService[id=102e119a-19a2-1409-671f-dddd93a063de] due to java.lang.NoSuchMethodError: org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosKeyTab(Ljavax/security/auth/Subject;)Z
... View more
03-16-2018
03:29 AM
Hi @Ward Bekker I am trying the same code but getting error : Code is below one , could you let me know what is wrong, I have added dependency as well. import net.sf.json.JSON import net.sf.json.JSONObject import net.sf.json.JSONSerializer import net.sf.json.xml.XMLSerializer String str = '''{ "glossary": { "title": "example glossary", "GlossDiv": { "title": "S", "GlossList": { "GlossEntry": { "ID": "SGML", "SortAs": "SGML", "GlossTerm": "Standard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1986", "GlossDef": { "para": "A meta-markup language, used to create markup languages such as DocBook.", "GlossSeeAlso": ["GML", "XML"] }, "GlossSee": "markup" } } } } }'''JSON json = JSONSerializer.toJSON( str ) XMLSerializer xmlSerializer = new XMLSerializer() xmlSerializer.setTypeHintsCompatibility( false ) String xml = xmlSerializer.write( json ) print(xml)
... View more
05-05-2017
06:52 PM
1 Kudo
Hi @Alexander Daher. In the commits dropdown (see image) you can probably select a previous commit to return to that version.
... View more
05-18-2018
04:01 PM
In my experience, if you remove the indicated flags, you still get audit logging - but those logs never get purged. Perhaps it would be better to leave the flags, but to change "INFO" to "OFF", rendering something like: -Dhdfs.audit.logger=OFF,DRFAAUDIT" ?
... View more