About tsk

Sadique1 · ‎02-27-2022

This solution is works for me on HDP 3.1.4 Ambari 2.7 Thanks for sharing.

tsk · ‎01-20-2020

@ChineduLB What is your exact Query? You can write count Queries SQL for Hive table. In general you can refer below articles: https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/performance-tuning/content/hive_prepare_to_tune_performance.html https://www.qubole.com/blog/5-tips-for-efficient-hive-queries/ Thanks, Tamil Selvan K

tsk · ‎01-08-2020

1. use Ranger Auditing for Hive to check the Query details run by a user. Hive does not store this detail in metastore. https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.4/audit-ref/content/managing_auditing_in_ranger_access.html 2. You can use the below Query To get all the apps having states as FINISHED,KILLED by the specific user for specific time period GET "http://Resource-Manager-Address:8088/ws/v1/cluster/apps?limit=20&states=FINISHED,KILLED&user=<user-id>&startedTimeBegin={time in epoch}&startedTimeEnd={time in epoch}" 3. Simply make use of Tez view if your execution Engine is Tez

Shelton · ‎12-26-2019

@Prakashcit To ensure data from multiple data sources are ingested to discover at a later stage business insights, usually we dump everything. Comparison of source data with data ingested to simply validate that all the data has been pushed and verifying that correct data files are generated and loaded into HDFS correctly into the desired location. A smart data lake ingestion tool or solution like kylo should enable self-service data ingestion, data wrangling, data profiling, data validation, data cleansing/standardization,see attached architecture /landing_Zone/Raw_data/ [ Corresponding to stage1] /landing_Zone/Raw_data/refined [ Corresponding to stage2] /landing_Zone/Raw_data/refined/Trusted Data [ Corresponding to stage3] /landing_Zone/Raw_data/refined/Trusted Data/sandbox [ Corresponding to stage4] The data lake can be used also to feed upstream systems for a real-time monitoring system or long storage like HDFS or hive for analytics Data quality is often seen as the unglamorous component of working with data. Ironically, it’s usually the component that makes up the majority of our time of data engineers. Data quality might very well be the single most important component of a data pipeline, since, without a level of confidence and reliability in your data, the dashboard and analysis generated from the data is useless. The challenge with data quality is that there are no clear and simple formulas for determining if data is correct this is a continuous data engineering task as more data sources are incorporated to the data pipeline. Typically hive plugged on stage 3 and tables are created after the data validation of stage 2 this ensures that data scientists have cleansed data to run their models and analysts using BI tools at least this has been the tasks I have done all through many projects HTH

falbani · ‎05-28-2018

@Tamil Selvan K If the above answer helped addressed your question, please take a moment to login and click the "accept" link on the answer.

tsk · ‎02-07-2018

@SMACH H You can follow the below: 1. lock down the location in HDFS: set permission 700 to /apps/hive/warehouse 2. add policy to Ranger/Hive for database: *, allowing users to create databases. (note that the ambari-qa user also needs access to database: * to complete the service check) 3. Allow access to individual databases via Ranger/Hive policies. This blog post may be of interest: http://hortonworks.com/blog/best-practices-for-hive-authorization-using-apache-ranger-in-hdp-2-2/ Also you may explore the options with "hive.server2.enable.doAs"

chpurnachandrar · ‎12-28-2018

I believe, Beeline JDBC client can also be used to connect to Spark SQL thru Spark Thrift server.

nitins · ‎10-01-2017

@Tamil Selvan K If you want to access the db and fetch the details This can be done by executing the following commands (Postgres db is the default) : docker exec -it cbreak_commondb_1 su postgres -c 'psql' postgres=# \l postgres=# \c cbdb You are now connected to database "cbdb" as user "postgres” cbdb=# select id, name, stack_id, status from cluster; // Queries the cluster id, name, and the status of the clusters

tsk · ‎06-01-2017

@Jay SenSharma Thanks for it. And is there any other way round as well? Like for an particular rpmlib, can we find the list of HDP packages as well?

tsk · ‎05-31-2017

@Satish Sarapuri Thanks, but when I tried to check its behavior (expecting something like it would return only the duplicate records), but it returned every records in that table. Hence, wanted to know an simple implementation of it.

Online	Offline
Last Visited	‎12-30-2020 03:11 AM

Member Since	‎04-24-2017 11:13 AM
Last Visited	‎12-30-2020 03:11 AM
Posts	82
Kudos received	11

Cloudera Community

Re: get counts of rows meeting different filter cr...

Re: ambari metrics collector

Re: get counts of rows meeting different filter cr...

Re: How to get the list of queries executed in Hiv...

Re: How to avoid duplicate row insertion in Hive?

Re: What is the difference between HTTP & Binary t...

Re: How To: Configure Hive View 2.0 to view Ranger...

Re: Working with Beeline

Re: How to check the cluster related details in da...

Re: Is there a way by which we can find the depend...

Re: How does collect_list() function work in Hive....