Member since
09-21-2015
133
Posts
130
Kudos Received
24
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 7321 | 12-17-2016 09:21 PM | |
| 4788 | 11-01-2016 02:28 PM | |
| 2393 | 09-23-2016 09:50 PM | |
| 3675 | 09-21-2016 03:08 AM | |
| 1897 | 09-19-2016 06:41 PM |
07-27-2016
05:22 AM
"ETL" which is embarassingly parallel (all processing logic can execute completely based purely on the contents of the incoming record itself) is in NiFi's sweet spot. ETL which requires lookups for billions of records, or which must perform "group by" operations fits better in traditional Hadoop solutions like Hive, Pig, or Spark.
... View more
07-19-2016
03:24 PM
@Ash Pad, how big are your PDFs? As in all things, it depends on your use case. If you PDFs are not in the multi-megabyte range, you may be fine storing them in a second column family today. This has the advantage of letting you query against doc metadata very quickly without needing to load full file contents into RegionServer memory. In most document management systems, this is highly desirable, as there is far more searching/querying than there is actual full content access.
... View more
07-18-2016
07:56 PM
Hi @Ash Pad, Phoenix has JDBC and REST APIs today. There is an ODBC driver under development which I believe is currently in beta. Thus you can do reporting style queries against whatever document metadata you store in "normal" column types. To access the PDF object itself, you can use the JDBC/ODBC/REST apis to read/write the column as raw bytes. See the Phoenix DataTypes page to understand the various column types which support binary values. Re: HBase REST- you could use this if desired, though I don't see why you would vs. using the built-in JDBC/ODBC capabilities.
... View more
07-15-2016
07:46 PM
I don't know what if any additional interfaces Vora supports, but traditional Spark has a JDBC/ODBC/Thrift server which you can use with ExecuteSQL. You can query/launch jobs that way. If it's traditional programmatic jobs, you'd need to use InvokeHTTP to talk to YARN's RM to launch a job, OR, use ExecuteCommand to launch a job from the spark-submit scripts.
... View more
07-15-2016
07:42 PM
Labels:
- Labels:
-
Apache NiFi
07-15-2016
07:23 PM
To do what? Launch a Spark job? Run a query against Spark Thrift Server? I haven't seen a mention of a processor with SAP or Vora in the name. You can always check the docs here to see what processors are supported out of the box.
... View more
07-15-2016
05:46 PM
one thing to understand here is that traditionally going direct to SAP tables is a no-no. There's a lot of relational modeling that using the SAP APIs does for you behind the scenes. Just depends on what you need.
... View more
07-14-2016
05:12 PM
2 Kudos
SAP has a large set of REST APIs. That's a good starting point.
... View more
07-14-2016
03:57 PM
You can use ExecuteSQL to run any query that the target DB supports. See docs here.
... View more
07-07-2016
11:04 PM
I'll hazard a guess that your query exceeds the buffer size for data piped to the processor. As a workaround, try using PutFile to write the full Hive query to a local file, then running "hive -f myQuery.sql" instead.
... View more