Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Reading Hive table using Pig Shell - ERROR 2245: Cannot get schema from loadFunc

avatar
Expert Contributor

Hi,

I am trying to read a Hive table (external table with original table in Hbase) using Pig shell.

I registered the following jars and also the hive-site.xml file in grunt shell. But when I try to load data as below. I am getting "ERROR 2245: Cannot get schema from loadFunc org.apache.hive.hcatalog.pig.HCatLoader. "

REGISTER /usr/hdp/2.3.4.0-3485/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar
REGISTER /usr/hdp/2.3.4.0-3485/hive-hcatalog/share/hcatalog/hive-hcatalog-pig-adapter.jar
REGISTER /usr/hdp/2.3.4.0-3485/hive-hcatalog/share/hcatalog/hive-hcatalog-server-extensions.jar
REGISTER /usr/hdp/2.3.4.0-3485/hive-hcatalog/share/hcatalog/hive-hcatalog-streaming.jar
REGISTER /usr/hdp/2.3.4.0-3485/hive/lib/hive-metastore.jar
REGISTER /usr/hdp/2.3.4.0-3485/hive/lib/libthrift-0.9.2.jar
REGISTER /usr/hdp/2.3.4.0-3485/hive/lib/hive-exec.jar
REGISTER /usr/hdp/2.3.4.0-3485/hive/lib/libfb303-0.9.2.jar
REGISTER /usr/hdp/2.3.4.0-3485/hive/conf/
REGISTER /usr/hdp/2.3.4.0-3485/hadoop/conf/
REGISTER /usr/hdp/2.3.4.0-3485/hive/lib/libfb303-0.9.2.jar
REGISTER /usr/hdp/2.3.4.0-3485/hive/lib/jdo-api-3.0.1.jar
REGISTER /usr/hdp/2.3.4.0-3485/hadoop/client/slf4j-api.jar
REGISTER /usr/hdp/2.3.4.0-3485/pig/lib/hive-shims-common-1.2.1.2.3.4.0-3485.jar
REGISTER /usr/hdp/2.3.4.0-3485/etc/hive/conf.dist/hive-site.xml
REGISTER /usr/hdp/2.3.4.0-3485/etc/hive-hcatalog/conf.dist/proto-hive-site.xml
REGISTER /usr/hdp/2.3.4.0-3485/hive/lib/datanucleus-rdbms-3.2.9.jar
REGISTER /usr/hdp/2.3.4.0-3485/hive/lib/datanucleus-core-3.2.10.jar
REGISTER /usr/hdp/2.3.4.0-3485/hive/lib/datanucleus-api-jdo-3.2.6.jar
REGISTER /usr/hdp/2.3.4.0-3485/hive/lib/bonecp-0.8.0.RELEASE.jar
REGISTER /usr/hdp/2.3.4.0-3485/hive/lib/derby-10.10.2.0.jar
REGISTER /usr/hdp/2.3.4.0-3485/hive/conf/hive-site.xml

A = LOAD 'TABLE_TEST' USING org.apache.hive.hcatalog.pig.HCatLoader(); 
Error Message: 2016-02-18 16:56:13,642 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2245: Cannot get schema from loadFunc org.apache.hive.hcatalog.pig.HCatLoader

When I use pig -useHCatalog to invoke the shell, I dont get any errors during reading from the hive table. But when I issue dump. I am receiving the below error.

Can you please help me understand the issue here in both these cases?

grunt> DUMP A;
2016-02-18 17:17:05,605 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-02-18 17:17:05,642 [main] WARN  org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.pre-event.listeners does not exist
2016-02-18 17:17:05,642 [main] WARN  org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.semantic.analyzer.factory.impl does not exist
2016-02-18 17:17:05,689 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1003: Unable to find an operator for alias A
1 ACCEPTED SOLUTION

avatar
Master Mentor

@R M is the Hive table also an hcatalog table? Do describe on test_table before running dump usually that will give you schema but you're most likely not loading correctly

View solution in original post

8 REPLIES 8

avatar
Master Mentor

@R M is the Hive table also an hcatalog table? Do describe on test_table before running dump usually that will give you schema but you're most likely not loading correctly

avatar
Expert Contributor

Hi, Aren't hive tables and Hcat tables same? Anyways, I created a new table via Hcat this time using hcat -e 'create table ...'. I could see that in hive now. I tried reading from that table also using Pig Hcatloader() as described above. (without using -useHcatalog, but I registered all the jars I listed above). Again I am getting the same error.

What is this loadFunc? Where does it get the schema from? Can you please help me understand whats happening behind the screen?

2016-02-19 12:29:58,238 [main] INFO  org.apache.hadoop.hive.metastore.ObjectStore - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
2016-02-19 12:29:58,282 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2245: Cannot get schema from loadFunc org.apache.hive.hcatalog.pig.HCatLoader

Thanks!

avatar
Master Mentor

@R M do not register any jars, that's the purpose of -useHCatalog switch. Schema is in HCatalog, the purpose of HCatalog is to maintain metadata across different components like Pig, Hive, HBase, MR. Do not register the jars and use the proper package name as in the link.

avatar
Expert Contributor

Thanks. using -useHcatalog for this table works. can you please help me with below questions too?

1. Aren't hive tables and Hcat tables same? (I think the issue in above table was it was an external table managed by Hbase. - could this be the reason it didnt work first time?)

2. If I use -useHcatalog and lauch the shell, then would I be able to use HbaseStorage to read from Hbase too in the same shell? how do I call pig -useHcatalog from oozie?

3. Instead of registering the jars, would it work if I added the jars in PIG_CLASSPATH? what is the difference between adding the path in PIG_CLASSPATH and REGISTER?

avatar
Master Mentor

@R M

1. no not the same, HCatalog has different data types in some instances than Hive. It's possible, make sure HCatalog knows about the HBase table.

2. yes you can use HBaseStorage and HCatStorer and HCatLoader but keep in mind HCatalog needs to be the storage for all data, so in your case you can load from HBaseStorage but HCatStorer needs to be the output, or read from HCat and store to HBaseStorage, makes sense? HCatalog pig example in Oozie here https://github.com/dbist/oozie/tree/master/apps/hcatalog

3. You don't always want to load the libs, global settings are not always recommended, you will be pulling unnecessarily a lot of libs. Use register for any small number of libs. Essentially it's the same thing.

avatar
Expert Contributor

Hi Artem,

I already accepted the answer. I could not reply to your previous answer. so I am adding another comment. Would you please answer this. Thanks for your time and help!

1. if Hive and Hcat are not the same. How do I ensure that Hcat is aware of all the tables that I create in Hive?

2. you said, hcat does not support all the types that hive does. then would it be possible to make hcat know about this table so that I can use hcatloader to read this table? Is there any other way to read from Hive table using Pig besides HCatLoader?

3. How do I ensure that the HBase tables are known to Hcat? Can you please explain a little bit more on the second statement or point to some documentation regarding the same? I can read from Hbase table using Hbasestorage but when I write to Hbase i should be using Hcatstorer? Is that right?

4. In one link it says to either use -useHCatalog or add these jars to PIG_CLASSPATH. If REGISTER and PIG_CLASSPATH works in similar way, then why didn't it work when I Registered these jars? Sorry for bringing up the same question again, but I just wanted to understand it more.

avatar
Master Mentor

1. as long as you're conforming to HCat rules, you will be able to see Hive tables. I suggest you read https://cwiki.apache.org/confluence/display/Hive/HCatalog+UsingHCat

2. I can't think of any other way. HCatalog serves as the common metadata repository. Would be curious to learn if there's another approach.

3. yes you can do this

A = load 'fromHCat' using HCatLoader();
B = load 'fromHBase' using HBaseStorageHandler();
... do something and make sure data types don't conflict and combine data 
store joined_dataset into 'toHCat' using HCatStorer(); 

https://cwiki.apache.org/confluence/display/HCATALOG/HCatalog+HBase+Integration+Design

4. maybe you missed a jar or something? I don't know, I tried that once and it didn't work for me, seems like more pain than just calling -useHCatalog

avatar
Expert Contributor

Hi Artem,

Thanks for the info! I will go through those links.

Yes, pig -useHCatalog is easier, but I am just curious to find out what I am doing wrong with other method.

I think I missed to setup the PIG_OPTS properly. Now I setup that again and I am getting a different issue connecting to Hive metastore URI. Below is the error.

2016-02-19 15:07:43,839 [main] INFO  hive.metastore - Trying to connect to metastore with URI 
2016-02-19 15:08:49,718 [main] WARN  hive.metastore - set_ugi() not successful, Likely cause: new client talking to old server. Continuing without it.

Thank you!