Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Access HDFS extended attributes (xattrs) in HiveQL

avatar
Contributor

Is there a way to retrieve and use in HiveQL extended attributes (xattrs) defined in underlying HDFS files?

If source raw data files stored in HDFS is tagged with metadata (e.g. original source, import time) at the file level, this would allow data within these files to automatically inherit such metadata.

Is this something a custom SerDe could be built for?

Is there an alternate way to achieve this?

Thanks

1 ACCEPTED SOLUTION

avatar
Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
16 REPLIES 16

avatar
Contributor

I actually don't need to store it. I want to be able to refer to HDFS file metata as some kind of virtual column in a hive query. For instance, if an existing HDFS file testdata.csv contains my data. The file had extended attributes defined:

hdfs dfs -setfattr -n user.src -v my_src testdata.csv

I then want to query a Hive external table with this HDFS file (or multiple similar files) defined as location by retrieving columns from the file content and file extended attributes (using xattrs or something similar):

select col1, col2, xattrs.user.src from Testdata;

avatar
Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Master Mentor

avatar
Contributor

And thanks again for your replies helping me building this!

avatar
Contributor

Similar UDF as Groovy code for a more direct use in Hive:

compile `import org.apache.hadoop.hive.ql.exec.UDF \;
import org.apache.hadoop.io.Text \;
import org.apache.hadoop.conf.Configuration \;
import org.apache.hadoop.fs.FileSystem \;
import org.apache.hadoop.fs.Path \;
import java.net.URI \;
public class XAttr extends UDF {
public Text evaluate(Text uri, Text attr){
if (uri == null || attr == null) return null \; 
URI myURI = URI.create(uri.toString()) \;
Configuration myConf = new Configuration() \;
FileSystem fs = FileSystem.get(myURI, myConf) \;
return new Text(fs.getXAttr(new Path(myURI), attr.toString())) \;
}
} ` AS GROOVY NAMED XAttr.groovy;

To be used similarly as:

XAttr(INPUT__FILE__NAME,'user.src')

avatar
Master Mentor

@Claude Villermain create an article with this, this is really great.

avatar
Contributor

I was not used to it, but here is my contribution to the community: Access HDFS file extended attributes in Hive with Groovy UDF