1. Impala is always faster. Impala does not use yarn. Impala stores catalog data locally which fetches information faster. Impala backend gthread is built on C++ which is very fast.
2. Impala is not fault tolerant , it is best suited for adhoc queries and ETL is best suited for Hive as Hive is fault tolerant. If the query fails due to network/disk failure,hive will retry but Impala would fail.
3. For stemaming/ingestion like Kafka flow you need to put it in EXTERNAL tables not in Managed(ACID) tables. Managed tabled can be used,if you want to perform alteration of the data like Update/Delete .
Please let me know,if you have any queries. Please click "Accept As Solution" , if your query is answered.