Member since
09-05-2016
24
Posts
2
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
484 | 10-01-2019 09:45 AM | |
1196 | 11-01-2016 06:47 PM | |
546 | 10-19-2016 05:07 PM |
12-16-2019
09:17 AM
There is no solution. The FTP server has a single location to grab files from. There is currently no easy way to do what I was asking.
... View more
10-01-2019
09:45 AM
Tried to delete this as noone seems to answer here. But if someone has a similiar issue, just UpdateAttribute on attributes you want to create a new flowfile and then pass them to AttributesToJSON processor.
... View more
10-01-2019
05:12 AM
I have a complex flow where I created 3 attributes, [name, date, json_content] extracted from other flow file data that need to go into a database. How can I take these 3 attributes convert them to a new flow file with those columns? The schema I will use has those names.
Schema: { "type": "record", "name": "mytable", "fields": [ {"name": "name","type": [ "string" ]}, {"name": "date","type": "type" : [ "null", { "type" : "long", "logicalType" : "timestamp-millis" } ], "default" : null } }, {"name": "json_content","type": [ "string" ]
}] }
... View more
Labels:
- Labels:
-
Apache Hadoop
09-25-2019
10:10 AM
Can anyone at Cloudera/Horton aswer this??? Having same issues.
... View more
09-18-2019
05:13 AM
I have an FTP location where I must grab specific files, *.tar, from within named sub directories and only those sub directories. The layout is like so:
path/conf1/conf1.tar
path/conf2/conf2.tar
path/conf3/conf3.tar
path/support/support.tar
I only want tar files from path/conf*/. Is this possible using Path Filter Regex or some combo of properties? I do not want to look into support/ at all. In fact some directories I do not have permissions to look at so I get those permission exceptions. How to limit to only conf*/ folders?
Thanks
... View more
Labels:
- Labels:
-
Apache Hadoop
09-08-2017
02:36 AM
Is there instructions on how to install spark2 on an HDP 2.5.6 cluster? We are currently running 1.6.3 and we need to use the Magellan spatial libs but this does not function under 1.6.3 spark. Can you point me to spark 2 installation instructions?
... View more
Labels:
- Labels:
-
Apache Spark
09-07-2017
08:05 PM
I am running spark version 1.6.3 under HDP 2.5.6. What version of magellan should I use to run with this version?
... View more
06-23-2017
01:54 AM
At the moment I have not worked on this issue since but I will resurrect it and try somethings out. First you could start with using the Databricks libraries. I did try
one library off git but it was too difficult to work with. The schema I
am using is quite complex. What schema do you have for your data. Some
ideas that I learned, not tried, include pre-converting the XML to CSV
or Avro before consuming it into spark and using Databricks CSV or other
lib to process in the stream portion. Let me know how you are ingesting
the XML. I still need to do this at some point.
... View more
05-03-2017
06:15 PM
I need to upgrade a cluster from 2.4.2 to 2.5. Shouldn't we be following this link: http://docs.hortonworks.com/HDPDocuments/Ambari-2.5.0.3/bk_ambari-upgrade/bk_ambari-upgrade.pdf? We are going TO 2.5.
... View more
04-13-2017
11:47 AM
1 Kudo
I run HDP spark 1.4.1 and 1.6.1 I have to process the input of rapidly arriving XML from a Kafka topic with spark strmng. I am able to do the .print function and see my data is indeed coming into spark. I have them batched now at 10 s. Now I need to know: 1) Is there a way to delimit each XML message? 2) How can I apply a JAXB like schema function to each message? I have a process already doing this in plain java and it works fine using the standard kafka APIs and JAXB. Sample output where I write the data with saveAsTextFiles() shows broken messages, seemed to be split on space and large XML messages are spread across more than one file. Thanks, M
... View more
Labels:
- Labels:
-
Apache Spark