Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How do I programmatically query atlas to return a list of hive tables in certain hive databases?

avatar

I wish to programmatically query atlas to provide a list of hive tables that are in certain hive databases. I only want to see hive tables that are in databases that contain a certain string in their name. In the hive_table atlas type, the db property is a reference to an entity of type hive_db, so I cannot use a simple where clause.

For example pretend I have many hive databases, some end with '_temp' some end with '_final'. Each database may have several tables. I want to generate a list of all hive tables in databases that end with '_final.' I would also like to exclude hive tables that have been deleted.

I have been experimenting with using the /api/atlas/discovery/search/dsl rest endpoint, but I have had no success.

There is documentation for the dsl at http://atlas.apache.org/Search.html, but this documentation is very esoteric, and I cannot figure out how to use it.

Does anyone have examples of returning lists of entities in atlas bases on properties of referred-to entities?

Is there a more user-friendly or complete source of documentation for the atlas query dsl?

Also note that I do not wish to query the hive metastore directly, I wish to use atlas.

Thank you for any help!

1 ACCEPTED SOLUTION

avatar
Expert Contributor
@David Miller

DSL query should help you with this. Each hive_table has hive_db name in its property.

This DSL query should help you: hive_table where (db.name like '*_final' or db.name like '*_temp')

About filtering deleted entities in DSL, there isn't a way to do it yet.

We are in process of improving DSL. As for the documentation, I agree, it needs improvement. There is no firm ETA on this yet. Given the current state, my suggestion would be to use basic query as much as possible.

Hope this helps.

View solution in original post

4 REPLIES 4

avatar
Expert Contributor
@David Miller

DSL query should help you with this. Each hive_table has hive_db name in its property.

This DSL query should help you: hive_table where (db.name like '*_final' or db.name like '*_temp')

About filtering deleted entities in DSL, there isn't a way to do it yet.

We are in process of improving DSL. As for the documentation, I agree, it needs improvement. There is no firm ETA on this yet. Given the current state, my suggestion would be to use basic query as much as possible.

Hope this helps.

avatar

@Ashutosh Mestry

Thanks Ashuthosh,

when I run the query: "hive_table where db.name like '*_final'" I get an error in the webui:

Gremlin script execution failed: L:{def r=(([]) as Set);def f1={GremlinPipeline x->x.as('a0').out('__hive_table.db').as('__res') [0..<25].select(['a0', '__res']).fill(r)};f1(g.V().has('__typeName','hive_table'));f1(g.V().has('__superTypeNames','hive_table'));r._().as('__tmp').transform({((Row)it).getColumn('a0')}).as('a0').back('__tmp').transform({((Row)it).getColumn('__res')}).as('__res').filter({it.'Asset.name'.matches('.*_final')}).back('a0') [0..<25].toList()}

We are running atlas 0.8.0.2, perhaps like clauses are unsupported on our version? I can use an equal sign and successfully retrieve tables in a certain database.

do you know of a way to get the same information with a basic query?

avatar
Expert Contributor

That's strange. Would it be possible to attach the log?

I am using data from the sandbox VM and I am able to run the DSL queries just fine.

I will get back to your question on basic query. I need to confirm a few things with someone from my team.

avatar

I will accept the answer because it seems this might be an issue on my side. Thank you

I will open up a ticket with hortonworks support if further troubleshooting is needed after I take a look at the logs. thanks again.