About rgelhausen

rgelhausen · ‎07-27-2016

"ETL" which is embarassingly parallel (all processing logic can execute completely based purely on the contents of the incoming record itself) is in NiFi's sweet spot. ETL which requires lookups for billions of records, or which must perform "group by" operations fits better in traditional Hadoop solutions like Hive, Pig, or Spark.

rgelhausen · ‎07-19-2016

@Ash Pad, how big are your PDFs? As in all things, it depends on your use case. If you PDFs are not in the multi-megabyte range, you may be fine storing them in a second column family today. This has the advantage of letting you query against doc metadata very quickly without needing to load full file contents into RegionServer memory. In most document management systems, this is highly desirable, as there is far more searching/querying than there is actual full content access.

rgelhausen · ‎07-18-2016

Hi @Ash Pad, Phoenix has JDBC and REST APIs today. There is an ODBC driver under development which I believe is currently in beta. Thus you can do reporting style queries against whatever document metadata you store in "normal" column types. To access the PDF object itself, you can use the JDBC/ODBC/REST apis to read/write the column as raw bytes. See the Phoenix DataTypes page to understand the various column types which support binary values. Re: HBase REST- you could use this if desired, though I don't see why you would vs. using the built-in JDBC/ODBC capabilities.

rgelhausen · ‎07-15-2016

I don't know what if any additional interfaces Vora supports, but traditional Spark has a JDBC/ODBC/Thrift server which you can use with ExecuteSQL. You can query/launch jobs that way. If it's traditional programmatic jobs, you'd need to use InvokeHTTP to talk to YARN's RM to launch a job, OR, use ExecuteCommand to launch a job from the spark-submit scripts.

rgelhausen · ‎07-15-2016

rgelhausen · ‎07-15-2016

To do what? Launch a Spark job? Run a query against Spark Thrift Server? I haven't seen a mention of a processor with SAP or Vora in the name. You can always check the docs here to see what processors are supported out of the box.

rgelhausen · ‎07-15-2016

one thing to understand here is that traditionally going direct to SAP tables is a no-no. There's a lot of relational modeling that using the SAP APIs does for you behind the scenes. Just depends on what you need.

rgelhausen · ‎07-14-2016

SAP has a large set of REST APIs. That's a good starting point.

rgelhausen · ‎07-14-2016

You can use ExecuteSQL to run any query that the target DB supports. See docs here.

rgelhausen · ‎07-07-2016

I'll hazard a guess that your query exceeds the buffer size for data piped to the processor. As a workaround, try using PutFile to write the full Hive query to a local file, then running "hive -f myQuery.sql" instead.

Online	Offline
Last Visited	‎01-23-2018 02:10 AM

Member Since	‎09-21-2015 08:50 PM
Last Visited	‎01-23-2018 02:10 AM
Posts	133
Kudos received	123

Cloudera Community

Re: Phoenix table design

Re: How to determine whether a hive script fails?

Re: Performance metrics phoenix bulk load vs hbase...

Re: What is recommended way of moving mainframe da...

Re: HBase Row Level Filtering

Re: Nifi ETL: Principles or decision points for ET...

Re: Hbase Storing pdf and Retrieval

Re: Hbase Storing pdf and Retrieval

Re: SAP HANA / SAP HANA Vora Processor for Apache ...

What's the expected performance overhead for secur...

Re: SAP HANA / SAP HANA Vora Processor for Apache ...

Re: Can NiFi connect to SAP BW?

Re: Can NiFi connect to SAP BW?

Re: Can NiFi execute MDX queries?

Re: ExecuteStreamCommand Hangs when executing Hive...