Created 11-17-2017 04:19 PM
I wish to programmatically query atlas to provide a list of hive tables that are in certain hive databases. I only want to see hive tables that are in databases that contain a certain string in their name. In the hive_table atlas type, the db property is a reference to an entity of type hive_db, so I cannot use a simple where clause.
For example pretend I have many hive databases, some end with '_temp' some end with '_final'. Each database may have several tables. I want to generate a list of all hive tables in databases that end with '_final.' I would also like to exclude hive tables that have been deleted.
I have been experimenting with using the /api/atlas/discovery/search/dsl rest endpoint, but I have had no success.
There is documentation for the dsl at http://atlas.apache.org/Search.html, but this documentation is very esoteric, and I cannot figure out how to use it.
Does anyone have examples of returning lists of entities in atlas bases on properties of referred-to entities?
Is there a more user-friendly or complete source of documentation for the atlas query dsl?
Also note that I do not wish to query the hive metastore directly, I wish to use atlas.
Thank you for any help!
Created 11-17-2017 07:57 PM
DSL query should help you with this. Each hive_table has hive_db name in its property.
This DSL query should help you: hive_table where (db.name like '*_final' or db.name like '*_temp')
About filtering deleted entities in DSL, there isn't a way to do it yet.
We are in process of improving DSL. As for the documentation, I agree, it needs improvement. There is no firm ETA on this yet. Given the current state, my suggestion would be to use basic query as much as possible.
Hope this helps.
Created 11-17-2017 07:57 PM
DSL query should help you with this. Each hive_table has hive_db name in its property.
This DSL query should help you: hive_table where (db.name like '*_final' or db.name like '*_temp')
About filtering deleted entities in DSL, there isn't a way to do it yet.
We are in process of improving DSL. As for the documentation, I agree, it needs improvement. There is no firm ETA on this yet. Given the current state, my suggestion would be to use basic query as much as possible.
Hope this helps.
Created 11-20-2017 06:30 PM
Thanks Ashuthosh,
when I run the query: "hive_table where db.name like '*_final'" I get an error in the webui:
Gremlin script execution failed: L:{def r=(([]) as Set);def f1={GremlinPipeline x->x.as('a0').out('__hive_table.db').as('__res') [0..<25].select(['a0', '__res']).fill(r)};f1(g.V().has('__typeName','hive_table'));f1(g.V().has('__superTypeNames','hive_table'));r._().as('__tmp').transform({((Row)it).getColumn('a0')}).as('a0').back('__tmp').transform({((Row)it).getColumn('__res')}).as('__res').filter({it.'Asset.name'.matches('.*_final')}).back('a0') [0..<25].toList()}
We are running atlas 0.8.0.2, perhaps like clauses are unsupported on our version? I can use an equal sign and successfully retrieve tables in a certain database.
do you know of a way to get the same information with a basic query?
Created 11-25-2017 09:47 PM
That's strange. Would it be possible to attach the log?
I am using data from the sandbox VM and I am able to run the DSL queries just fine.
I will get back to your question on basic query. I need to confirm a few things with someone from my team.
Created 11-27-2017 04:27 PM
I will accept the answer because it seems this might be an issue on my side. Thank you
I will open up a ticket with hortonworks support if further troubleshooting is needed after I take a look at the logs. thanks again.