Member since
01-19-2017
3676
Posts
632
Kudos Received
372
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 557 | 06-04-2025 11:36 PM | |
| 1111 | 03-23-2025 05:23 AM | |
| 561 | 03-17-2025 10:18 AM | |
| 2110 | 03-05-2025 01:34 PM | |
| 1319 | 03-03-2025 01:09 PM |
09-25-2019
10:59 AM
1 Kudo
@parthk You can definitely use sentry for RBAC type of style in Impala you don't really need Kerberos but it's highly advised to have Kerberos why??? If you know historical sentry has been the weakest link in the security architecture of Cloudera that's the reason it was dropped in favor of Ranger in the upcoming new offering CDP. Having said that sentry role-based access Control (RBAC) is an approach to restricting system access to authorized users whereas Kerberos using keytabs is like a biometric passport where the password is only know to the keytab and principal that allows a process (a client) running on behalf of a principal (a user) to prove its identity to a verifier (an application server, or just server) without sending data across the network that might allow an attacker or the verifier to subsequently impersonate the principal. Kerberos optionally provides integrity and confidentiality for data sent between the client and the server. You can safely build your cluster without Kerberos especially for self-study and development but not for production. There are 2 types of Kerberos setup MIT and AD Active Directory is a directory services implementation that provides all sorts of functionality like authentication, group and user management, policy administration and more in a centralized manner. LDAP (Lightweight Directory Access Protocol) is an open and cross-platform protocol used for directory services authentication hence the pointer in the Cloudera documentation to use LDAP/LDAPS HTH Happy hadooping
... View more
09-25-2019
10:29 AM
@elmismo999 Sqoop uses Mapreduce so make sure it's running and YARN then secondly you first validate that the database and table exist, follow the below steps # mysql -u root -p[root_password] mysql>show databases; If the sqoop database exists then run mysql> use sqoop; mysql> show tables; This MUST show the table result if it doesn't then your export cannot work and export command I don't see the MySQL database port default 3306 and the root password place holder -P or simple -p[root_password] # sqoop import --connect jdbc:mysql://127.0.0.1:3306/sqoop --username root -P --table result --target-dir /user/results10/ Can you confirm the above and revert
... View more
09-21-2019
10:53 AM
1 Kudo
@rvillanueva To add on to what @jsensharma commented it's always a good idea to have separate databases for druid and superset! In case you get some issues then you have only a component's data in jeopardy 🙂
... View more
09-21-2019
03:12 AM
@jhc I have downloaded a HDP 3.0 sandbox to try get aroung your problem. After suceesful deployment on Virtual box When you access DAS the default user is hive see screenshot DAS default user and on the beeline too [root@sandbox-hdp ~]# su - hive Last login: Sat Sep 21 08:07:40 UTC 2019 [hive@sandbox-hdp ~]$ hive Connecting to jdbc:hive2://sandbox-hdp.hortonworks.com:2181/default;password=hive;serviceDiscoveryMode=zooKeeper;user=hive;zooKeeperNamespace=hiveserver2 19/09/21 08:38:06 [main]: INFO jdbc.HiveConnection: Connected to sandbox-hdp.hortonworks.com:10000 Connected to: Apache Hive (version 3.1.0.3.0.1.0-187) Driver: Hive JDBC (version 3.1.0.3.0.1.0-187) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 3.1.0.3.0.1.0-187 by Apache Hive 0: jdbc:hive2://sandbox-hdp.hortonworks.com:2> show databases; INFO : Completed executing command(queryId=hive_20190921083816_c231488f-8a6f-4fb1-bdfb-48493a3cb98e); Time taken: 0.064 seconds INFO : OK +--------------------------+ | database_name | +--------------------------+ | default | | foodmart | | information_schema | | sys | +-------------------------+ 4 rows selected (0.443 seconds) Now to demonstrate I will create a new database JHC using DAS as the default user hive [See Create DB with DAS], You will see warning waiting for the database to be created it should succeed and now it should be available in the drop-down list see [JHC database] Now choose this database to populate it with tables sample table Cloudera [see create_table_cloudera] The DAS view should update with the table [cloudera ] in [ jhc] database While the beeline also proves the successful creation of the database and table therein 0: jdbc:hive2://sandbox-hdp.hortonworks.com:2> show databases; INFO : OK +-------------------------+ | database_name | +-------------------------+ | default | | foodmart | | information_schema | | jhc | | sys | +---------------------+ 5 rows selected (0.092 seconds) 0: jdbc:hive2://sandbox-hdp.hortonworks.com:2> use jhc; 0: jdbc:hive2://sandbox-hdp.hortonworks.com:2> show tables; +-----------+ | tab_name | +-----------+ | cloudera | +-----------+ 0: jdbc:hive2://sandbox-hdp.hortonworks.com:2> describe cloudera; INFO : OK +-------------+-------------+------------+ | col_name | data_type | comment | +-------------+-------------+------------+ | id | tinyint | | | username | tinyint | | | position | tinyint | | | dept | tinyint | | +------------+-------------+-------------+ 4 rows selected (0.173 seconds) DAS view now shows the new table Cloudera in JHC database. Now getting back to your question "When you query the databases the table doesn't appear in the drop-down list " Are you sure you selected the same DB where you created the table?" I realized I had to wait for the refresh maybe you should manually refresh 🙂 Hope that helps, please revert
... View more
09-17-2019
02:11 PM
1 Kudo
@psilvarochagome In this community, we share knowledge to advance the Cloudera community and don't get cash for that! though some are real production issues, having said that it's unfortunate people like you got a solution to a problem being faced by a member and don't want to share as requested by @slim_abderrahim It's very unfortunate I hope member see this and tag you ... .........we open-source as opposed to proprietary code. 🙂 Happy hadooping
... View more
09-17-2019
01:45 PM
@ranger What is your HDP version? I this you have hit this bug despite not matching the Ranger version. Try the workaround and revert https://issues.apache.org/jira/browse/RANGER-1342
... View more
09-17-2019
11:43 AM
4 Kudos
@mike_bronson7 For sure that is the last major version of HDP but there could be minor release to correct bugs, Cloudera is keen to release the CDP sometime before December according to insiders. Here is a link to a webinar available on-demand, you will need to register to view it. Here is a preview of what you should expect the best of 2 the worlds (Hortonworks & Cloudera) Cloudera Streams Management is GA so CDP should be round the corner This is my personal view and not that of Cloudera I
... View more
09-14-2019
10:46 AM
1 Kudo
@ThriftTran I don't know how you expect any member to help on a subject with no context. The least you could do is provide some logs, screenshots, description of the environment or components etc
... View more
09-14-2019
05:37 AM
@ranger Can you try something like this it explains how to connect Hive running on a remote host (HiveSever2) using commonly used Python package, Pyhive? There are a lot of other Python packages available to connect to remote Hive, but Pyhive package is one of the easy and well-maintained and supported package Here I am assuming you installed already the Pyhive package if not please do that first! from phive import hive import re,os, time host_name = "localhost" port = 10001 user = "hive" password ="hive" database = "employeeDB" def hiveconnection(host_name, port, user,password, database): conn = hive.Connection(host=host_name, port=port, username=user, password=password, database=database, auth='CUSTOM') cur = conn.cursor() cur.execute('select * from employees returns limit 5') result = cur.fetchall() return result # Call above function output = hiveconnection(host_name, port, user,password, database) print(output) Before you attempt to connect using Pyhive you should execute the below steps to install the Pyhive package below are the step on an ubuntu machine as Pyhive is dependent on these Modules: Installing gcc sudo apt-get install gcc Install Thrift pip install thrift+ Install SASL pip install sasl Install thrift sasl pip install thrift_sasl After the above steps have run successfully, you can go ahead and install Pyhive using pip: pip install pyhive should you encounter Pyhive sasl fatal error install the below dependencies. sudo apt-get install libsasl2-dev Now you can re-test your hive database connection Please let me know
... View more
09-13-2019
05:13 PM
@budati There is a good response by Burgess that should work out even for you CSV with duplicate headers
... View more