Member since
12-06-2016
15
Posts
4
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
271 | 07-19-2017 06:20 PM | |
173 | 12-07-2016 03:04 PM |
04-04-2018
04:13 PM
Currently I have a dataflow with the GetFile processor that taps into a directory path with TSV files. I want to convert these TSV files to CSV for later work using the ConvertCSVToAvro processor. I've created this python script with a .bash wrapper to test it: import sys
import csv
tsvin = csv.reader(sys.stdin, dialect=csv.excel_tab)
commaout = csv.writer(sys.stdout, dialect=csv.excel)
for row in tsvin:
commaout.writerow(row)
bash wrapper for file in *.tsv
do
python tsv2csv.py < $file > ${file%.*}.csv
done
I see the ExecuteScript processor as a possible option. How would I use it to execute this python script--would the processor know where to import from for example...or is there a better way to convert?
... View more
Labels:
03-30-2018
10:36 PM
3 Kudos
Is Hadoop/Hive loaded on the HDF sandbox? It does not appear to be. Do I need to download the HDP sandbox? If so, how do I configure both virtual guests to successfully process a NiFi dataflow from the HDF virtual sandbox to the HDP sandbox?
... View more
Labels:
07-20-2017
03:58 PM
Fellow Techies-- I've set up my local open source development environment under the Eclipse Mars 4.5 IDE bench with Git, PyDev, Ivy, Gradle, Scala, Oozie, Pig and HDF/MapReduce plugins all accessing a VMWare Player Hortonworks 2.4 sandbox. This works for general small-scale development before deployment to other similar installations. I will soon be working on an Azure/Microsoft installation. This development environment is centered around Visual Studio with NuGet plugin equivalents. Can anyone tell me what the recommended development/deployment environment is for this ecosystem?
... View more
Labels:
07-19-2017
06:20 PM
To those techie pioneers that may be following behind me: I decided to follow the trail of the missing address assignments inherent in the "device eth0 does not seem to be present" from the guest opsys. I looked on my laptop's network devices to confirm that the network adapter that I needed (VMNet8) was missing in action. Here are the steps I followed.
... View more
07-14-2017
03:40 AM
While I've been able to get a 2.4 vmware player sandbox to work on my desktop under a NAT adapter without issue, I cannot get a second vmware player to work in a wireless environment on a laptop. The choices I've made so far for the network settings on the guest vm have been NAT or NAT Networking. Both of these choices return the same results-- an error on startup which reads: device eth0 does not seem to be present, then: Connectivity Issues Detected! when the startup process has completed. No IP address is displayed. Well, the vm isn't wrong since I don't have any ports/addresses set up--but I haven't a clue how to bridge the dynamic wireless environment to ip addresses/ports. I am not a network geek, so this has turned out to be a frustrating stumbling block. Please advise.
... View more
06-28-2017
04:36 PM
I am primarily a .NET/C# and JBOSS/J2EE database and applications integration developer beginning to work with NiFi. Since my previous work has been primarily on the back end, I find myself at a disadvantage when working with 3rd party API examples which reference javascript/web page access. I am interested in an available public API from DOT: https://vpic.nhtsa.dot.gov/api/ . Here is an example of a method for decoding an array of vehicle id numbers (VINs) called DecodeVINValuesBatch. DOT offers the javascript coding example below. How do I express or re-interpret this example in the context of NiFi? Assuming InvokeHttp is selected for the processor used for calling the service, how would I pass InvokeHttp the collection of VIN#s ( perhaps a different processor which passes the collection before InvokeHttp)? How would I address System.Threading? Would it be necessary? I am looking for some guidance on how I should approach this with a practical example. string text = "3GNDA13D76S000000;5XYKT3A12CG000000;";
string url = @"https://vpic.nhtsa.dot.gov/api/vehicles/DecodeVINValuesBatch/";
var nameValues = new Dictionary<string, string>();
nameValues.Add("data", text);
nameValues.Add("format", "json");
HttpClient client = new HttpClient();
client.BaseAddress = new Uri(url);
// using FormUrlEncodedContent
var name = new FormUrlEncodedContent(nameValues);
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
System.Threading.CancellationToken token = new System.Threading.CancellationToken();
try
{
var tmp = client.PostAsync(client.BaseAddress, name, token).Result;
var result = tmp.Content.ReadAsStringAsync();
}
catch (Exception err)
{
// error handling
}
... View more
06-02-2017
11:56 PM
For others behind me: I found this advice on clearing off the canvas here. From what I can tell, you can not select a previous template from the template management window. There is another template icon next to the funnel icon. Drag it over to the canvas. You'll see a drop down which lists the templates. The one you select will be instantiated.
... View more
06-02-2017
11:48 PM
Techies, Can someone please tell me how to do the following 2 things: 1. Clear the flow diagramming canvas of all previous dataflows. 2. Load an existing template onto the diagramming canvas workspace.
... View more
Labels:
12-20-2016
10:53 PM
1 Kudo
Techies-- I am working on a POC to replace some of the traditional ETL processes we have in place using NiFi to import data to HDFS/Hive -- where the data can then later be accessed by Microsoft's Polybase and visualization tools. As an adjunct, some SQL-power users have an interest in accessing the preliminary data streams imported to HDFS/Hive. In our production environment there are about 40,000 such files daily. Each file arrives with a fixed-length header and multiple fixed-length line items. These files include the datetime stamp in the naming convention. The files are sourced from multiple locations, from multiple partners, so an immediate example of a SQL-like query in HDFS/Hive would be (1) partner identification [derived from header]; (2) location [derived from header], (3) date/time [derived from filename] and (4) products [derived from line items]. From a guidance perspective, how should I approach this project? Is NiFi robust enough to import from so many active streams ? Is the data arriving with putHDFS inherently "Hive" access ready, or is there another step to consider? For audit purposes, how should I approach capturing things like line item counts, file names the data arrived on, etc.? Which examples from here: https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates are the best to follow? I am looking forward to your advice.
... View more
Labels:
12-17-2016
12:49 AM
Neeraj, thanks for your advice.
... View more
12-07-2016
07:33 PM
Karthik, That is a good idea. I can manually install Maven! Thank you so much for answering my question.
... View more
12-07-2016
04:10 PM
Thanks Karthik--I am not directly using either. The curl statement is embedded somewhere. Do you know where, or how I can bypass the failure event?
... View more
12-07-2016
04:00 PM
Fellow Techies-- The VNC install on Ambari failed on: resource_management.core.exceptions.Fail: Execution of 'curl -o /etc/yum.repos.d/epel-apache-maven.repo https://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo' returned 35. curl: (35) SSL connect error When I issue the statement interactively to avoid the SSL error through my ssh terminal as: curl -o /etc/yum.repos.d/epel-apache-maven.repo http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo , the file downloads as expected.
Where or how do I edit this? Here's what I see on the config page for the VNC Server install. The curl statement must be embedded in one of the *.tar files. Please advise on what the next step is toward resolution.
... View more
Labels:
12-07-2016
03:04 PM
For those newbies following this thread: The file was absent of any data because the default vnc-env-template did not have at least the vnc_user and vnc_geometry set. While this may not be what you want, at least the default entry will there to work with.
... View more
12-07-2016
02:04 AM
Hi Fellow Techies-- I'm pretty close to standing up VNC--however the install failed with the only clue in Ambari that I could find. The file /etc/sysconfig/vncservers and the file /etc/sysconfig/vncservers.bak exist (read: re-install after the first failure). What should these empty files contain--or is this a "red herring" message? 2016-12-07 01:35:59,048 - Execute['mv /etc/sysconfig/vncservers /etc/sysconfig/vncservers.bak >> /var/log/vnc-stack.log'] {'ignore_failures': True}
2016-12-07 01:35:59,391 - File['/etc/sysconfig/vncservers'] {'owner': 'root', 'content': InlineTemplate(...), 'group': 'root', 'mode': 0644}
2016-12-07 01:35:59,397 - Writing File['/etc/sysconfig/vncservers'] because it doesn't exist
... View more
Labels: