Created 01-08-2018 09:25 AM
I have a problem with PXF and external tables in HAWQ.
I have seven data nodes in cluster (and seven HAWQ segments) and I installed PXF on each of them. It looks like that:
Node1 - NameNode, HiveMetastore, HAWQ master, Hive Client, HCat Client, PXF
Node2 - SNameNode, DataNode, HAWQ Segment, PXF
Node3-7 - DataNode, HAWQ Segment, PXF
Node8-9 - HiveClient, HCat Clients
I created table based on "Test PXF" on this site. But I have a problem with accesing data. When I try to run a simple query:
SELECT * FROM pxf_hdfs_textsimple;
I get error:
ERROR: remote component error (0): (libchurl.c:897)
That's all. In HAWQ Master Node Log I see this:
2018-01-04 18:43:06.902998 CET,"hawq","postgres",p19781,th-319940160,"[local]",,2018-01-04 18:43:06 CET,16768,con26,,seg-10000,,,x16768,sx1,"LOG","00000","no master mirroring standby configuration found",,,,,,,0,,"cdblink.c",159, 2018-01-04 18:43:10.759145 CET,"hawq","poligon",p19820,th-319940160,"[local]",,2018-01-04 18:43:10 CET,0,con27,,seg-10000,,,,,"LOG","00000","getLocalTmpDirFromMasterConfig session_id:27 tmpdir:/tmp",,,,,,,0,,"postinit.c",470, 2018-01-04 18:43:20.600046 CET,"hawq","poligon",p19820,th-319940160,"[local]",,2018-01-04 18:43:10 CET,16773,con27,cmd7,seg-10000,,,x16773,sx1,"ERROR","XX000","remote component error (0): (libchurl.c:897)",,,,,,"select * from pxf_hdfs_textsimple ;",0,,"libchurl.c",897,"Stack trace: 1 0x8c165e postgres errstart (??:?) 2 0x8c34fb postgres elog_finish (??:?) 3 0x5124d6 postgres check_response_code (??:?) 4 0x512686 postgres churl_read_check_connectivity (??:?) 5 0x517b22 postgres <symbol not found> (pxfutils.c:?) 6 0x517d66 postgres call_rest (??:?) 7 0x5168c0 postgres <symbol not found> (pxfmasterapi.c:?) 8 0x516f97 postgres get_data_fragment_list (??:?) 9 0x512ff5 postgres map_hddata_2gp_segments (??:?) 10 0x73f8a2 postgres <symbol not found> (createplan.c:?) 11 0x73fdb5 postgres <symbol not found> (createplan.c:?) 12 0x741dec postgres create_plan (??:?) 13 0x74d1a6 postgres <symbol not found> (planner.c:?) 14 0x74eb3c postgres subquery_planner (??:?) 15 0x74f177 postgres <symbol not found> (planner.c:?) 16 0x74f72e postgres planner (??:?) 17 0x7e496a postgres pg_plan_queries (??:?) 18 0x7e4e05 postgres <symbol not found> (postgres.c:?) 19 0x7e6560 postgres PostgresMain (??:?) 20 0x799860 postgres <symbol not found> (postmaster.c:?) 21 0x79c5e9 postgres PostmasterMain (??:?) 22 0x4a2dff postgres main (??:?) 23 0x7fdfe9e2ab35 libc.so.6 __libc_start_main (??:0) 24 0x4a2e7c postgres <symbol not found> (??:?) "
When I try access data in Hive I get error too. Query:
select count(*) from hcatalog.default.aaa;
Error:
ERROR: remote component error (500) from '127.0.0.1:51200': type Exception report message java.lang.Exception: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/metastore/api/MetaException description The server encountered an internal error that prevented it from fulfilling this request. exception javax.servlet.ServletException: java.lang.Exception: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/metastore/api/MetaException (libchurl.c:897) LINE 1: select count(*) from hcatalog.default.aaa;
Does anyone know what I am doing wrong? What can cause a problem with accesing data from HDFS, Hive by PXF?
Created 01-08-2018 06:40 PM
Please take a look at pxf-private.classpath to make sure the hive jars have the correct path. Also, take a look at pxf-service.log under tomcat where PXF is running. The above error is being returned from the PXF java server to C client.
Created 01-09-2018 11:14 AM
Thanks for the answer. I checked my pxf-private.classpath. The problem was, that I installed PXF on all data nodes, but there are no Hive Clients installed (no Hive jars too). I copied jars on all machines and it works now. But there is another problem.
I have a problem with querying tables which have different format than textfile. If I execute query:
select count(*) from hcatalog.default.sales_info;
And the table sales_info is text file I got correct results:
count ------- 7540 (1 row)
But if table format is ORC I got this error:
ERROR: pxfwritable_import_beginscan function was not found (nodeExternalscan.c:310) (seg0 Node4:40000 pid=42109) (dispatcher.c:1805)
Does anyone know how to query ORC data?
And I have still a problem with external tables. When I create a table:
CREATE EXTERNAL TABLE pxf_hdfs_textsimple(location text, month text, num_orders int, total_sales float8) LOCATION ('pxf://Node1:51200/tmp/pxf_test/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple') FORMAT 'TEXT' (delimiter=E',');
I am not able to query this table. There is no error, I can see only this:
ERROR: remote component error (0):(libchurl.c:897)
I checked HAWQ segments logs, HAWQ master logs, Catalina and PXF-service logs, but I cannot find anything about this error. I would appreciate if anyone could help me.
Created 01-09-2018 10:57 PM
@grabowski14
What kind of platform is your Hadoop cluster running under (e.g. OSX, Centos6, etc)? You must have PXF server running on all the data-nodes as well as on the name node. It is best if you use Ambari to install HAWQ and PXF as it guides you with the setup.