Created 05-20-2016 05:51 AM
Hi:
When i do a query on hive with tex the session take long time, how can I improve that??? because the query take 1 second, but the create the connection take 2 or 3 seconds
INFO : Tez session hasn't been created yet. Opening session DEBUG : Adding local resource: scheme: "hdfs" host: "hostname" port: xxxx file: "/tmp/hive/hdfs/_tez_session_dir/e5c7e17d-5ab5-4e92-b468-d10e81198c13/hive-hcatalog-core.jar" INFO : Dag name: SELECT count(*) FROM ...aoprcnf='2016-02-14'(Stage-1) DEBUG : DagInfo: {"context":"Hive","description":"SELECT count(*) FROM canal_bucketed_v3 where fechaoprcnf='2016-02-14'"} DEBUG : Setting Tez DAG access for hdfs INFO : INFO : Status: Running (Executing on YARN cluster with App id application_1463562631519_0062) INFO : Map 1: -/- Reducer 2: 0/1 INFO : Map 1: 0/1 Reducer 2: 0/1 INFO : Map 1: 0(+1)/1 Reducer 2: 0/1 INFO : Map 1: 0/1 Reducer 2: 0/1 INFO : Map 1: 1/1 Reducer 2: 0(+1)/1 INFO : Map 1: 1/1 Reducer 2: 1/1
Created 05-20-2016 09:30 AM
While Hive is perfect for analytical queries and is amazing for highly parallel workloads with lots of parallel queries, it is not as fast for small queries as traditional databases yet. You will not get queries faster than 2-3 seconds in total even under perfect circumstances. This is due to the architecture.
Rule of thumb:
- If Tez has to create a new session ( application master ), i.e. a query on a cold system, you can expect 10-15s pre time. You can fix this by pre-creating sessions. However that takes a bit of the cluster even if you don't need it.
- If Tez has to create task containers you can expect 2-3s extra. Tez can reuse containers and there is also prewarm to precreate containers but it's tuning depends a lot on your usecase. If you don't know what you are doing you can make it worse
- In general the Hiveserver has a bit of overhead ( around 1s ) for plan compilation communication with the metastore etc.
So yes at the moment you will not get faster than 2-3s, realistically 4-5s. If you need sub second responses look at Phoenix for example.
However things will soon get better for these short queries:
- LLAP is already available as a tec preview ( long running processes that have an ORC data cache and remove the startup needs.
- Hive will have an Hbase backed metastore which should speed up the hiveserver2
and more. In short look out for this space.
Created 05-20-2016 09:30 AM
While Hive is perfect for analytical queries and is amazing for highly parallel workloads with lots of parallel queries, it is not as fast for small queries as traditional databases yet. You will not get queries faster than 2-3 seconds in total even under perfect circumstances. This is due to the architecture.
Rule of thumb:
- If Tez has to create a new session ( application master ), i.e. a query on a cold system, you can expect 10-15s pre time. You can fix this by pre-creating sessions. However that takes a bit of the cluster even if you don't need it.
- If Tez has to create task containers you can expect 2-3s extra. Tez can reuse containers and there is also prewarm to precreate containers but it's tuning depends a lot on your usecase. If you don't know what you are doing you can make it worse
- In general the Hiveserver has a bit of overhead ( around 1s ) for plan compilation communication with the metastore etc.
So yes at the moment you will not get faster than 2-3s, realistically 4-5s. If you need sub second responses look at Phoenix for example.
However things will soon get better for these short queries:
- LLAP is already available as a tec preview ( long running processes that have an ORC data cache and remove the startup needs.
- Hive will have an Hbase backed metastore which should speed up the hiveserver2
and more. In short look out for this space.