I have a problem with PXF and external tables in HAWQ.
I have seven data nodes in cluster (and seven HAWQ segments) and I installed PXF on each of them. It looks like that:
Node1 - NameNode, HiveMetastore, HAWQ master, Hive Client, HCat Client, PXF
Node2 - SNameNode, DataNode, HAWQ Segment, PXF
Node3-7 - DataNode, HAWQ Segment, PXF
Node8-9 - HiveClient, HCat Clients
I created table based on "Test PXF" on this site. But I have a problem with accesing data. When I try to run a simple query:
SELECT * FROM pxf_hdfs_textsimple;
I get error:
ERROR: remote component error (0): (libchurl.c:897)
That's all. In HAWQ Master Node Log I see this:
2018-01-04 18:43:06.902998 CET,"hawq","postgres",p19781,th-319940160,"[local]",,2018-01-04 18:43:06 CET,16768,con26,,seg-10000,,,x16768,sx1,"LOG","00000","no master mirroring standby configuration found",,,,,,,0,,"cdblink.c",159, 2018-01-04 18:43:10.759145 CET,"hawq","poligon",p19820,th-319940160,"[local]",,2018-01-04 18:43:10 CET,0,con27,,seg-10000,,,,,"LOG","00000","getLocalTmpDirFromMasterConfig session_id:27 tmpdir:/tmp",,,,,,,0,,"postinit.c",470, 2018-01-04 18:43:20.600046 CET,"hawq","poligon",p19820,th-319940160,"[local]",,2018-01-04 18:43:10 CET,16773,con27,cmd7,seg-10000,,,x16773,sx1,"ERROR","XX000","remote component error (0): (libchurl.c:897)",,,,,,"select * from pxf_hdfs_textsimple ;",0,,"libchurl.c",897,"Stack trace: 1 0x8c165e postgres errstart (??:?) 2 0x8c34fb postgres elog_finish (??:?) 3 0x5124d6 postgres check_response_code (??:?) 4 0x512686 postgres churl_read_check_connectivity (??:?) 5 0x517b22 postgres <symbol not found> (pxfutils.c:?) 6 0x517d66 postgres call_rest (??:?) 7 0x5168c0 postgres <symbol not found> (pxfmasterapi.c:?) 8 0x516f97 postgres get_data_fragment_list (??:?) 9 0x512ff5 postgres map_hddata_2gp_segments (??:?) 10 0x73f8a2 postgres <symbol not found> (createplan.c:?) 11 0x73fdb5 postgres <symbol not found> (createplan.c:?) 12 0x741dec postgres create_plan (??:?) 13 0x74d1a6 postgres <symbol not found> (planner.c:?) 14 0x74eb3c postgres subquery_planner (??:?) 15 0x74f177 postgres <symbol not found> (planner.c:?) 16 0x74f72e postgres planner (??:?) 17 0x7e496a postgres pg_plan_queries (??:?) 18 0x7e4e05 postgres <symbol not found> (postgres.c:?) 19 0x7e6560 postgres PostgresMain (??:?) 20 0x799860 postgres <symbol not found> (postmaster.c:?) 21 0x79c5e9 postgres PostmasterMain (??:?) 22 0x4a2dff postgres main (??:?) 23 0x7fdfe9e2ab35 libc.so.6 __libc_start_main (??:0) 24 0x4a2e7c postgres <symbol not found> (??:?) "
When I try access data in Hive I get error too. Query:
select count(*) from hcatalog.default.aaa;
ERROR: remote component error (500) from '127.0.0.1:51200': type Exception report message java.lang.Exception: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/metastore/api/MetaException description The server encountered an internal error that prevented it from fulfilling this request. exception javax.servlet.ServletException: java.lang.Exception: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/metastore/api/MetaException (libchurl.c:897) LINE 1: select count(*) from hcatalog.default.aaa;
Does anyone know what I am doing wrong? What can cause a problem with accesing data from HDFS, Hive by PXF?
Please take a look at pxf-private.classpath to make sure the hive jars have the correct path. Also, take a look at pxf-service.log under tomcat where PXF is running. The above error is being returned from the PXF java server to C client.
Thanks for the answer. I checked my pxf-private.classpath. The problem was, that I installed PXF on all data nodes, but there are no Hive Clients installed (no Hive jars too). I copied jars on all machines and it works now. But there is another problem.
I have a problem with querying tables which have different format than textfile. If I execute query:
select count(*) from hcatalog.default.sales_info;
And the table sales_info is text file I got correct results:
count ------- 7540 (1 row)
But if table format is ORC I got this error:
ERROR: pxfwritable_import_beginscan function was not found (nodeExternalscan.c:310) (seg0 Node4:40000 pid=42109) (dispatcher.c:1805)
Does anyone know how to query ORC data?
And I have still a problem with external tables. When I create a table:
CREATE EXTERNAL TABLE pxf_hdfs_textsimple(location text, month text, num_orders int, total_sales float8) LOCATION ('pxf://Node1:51200/tmp/pxf_test/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple') FORMAT 'TEXT' (delimiter=E',');
I am not able to query this table. There is no error, I can see only this:
ERROR: remote component error (0):(libchurl.c:897)
I checked HAWQ segments logs, HAWQ master logs, Catalina and PXF-service logs, but I cannot find anything about this error. I would appreciate if anyone could help me.
What kind of platform is your Hadoop cluster running under (e.g. OSX, Centos6, etc)? You must have PXF server running on all the data-nodes as well as on the name node. It is best if you use Ambari to install HAWQ and PXF as it guides you with the setup.