About rrrajesh

rrrajesh · ‎04-11-2023

Follow these docs https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/migrating-data-into-hive/topics/hive_moving_data_from_databases_to_hive.html Command example:- Import from SQL Server to Hadoop sqoop import -Dmapreduce.job.queuename=<yourqueur> --connect "jdbc:sqlserver://<sqlserverhost>:<sqlserver port>;database=<sqlserver database>;username=<sqlserver user>;password=<sqlserver password>" --table "<sqlserver table>" -- --schema <sqlserver schema> -m 1 --hive-import --hive-database "<hive database>" --hive-table "<hive table>" --hive-overwrite --direct

rrrajesh · ‎09-21-2018

This article explains how to delete a Registered HDP cluster from DPS Steps: 1. Download the attached file dp-cluster-remove.txt and store it in postgres db server. 2. psql -f ./dp_cluster_remove.txt -d dataplane -v cluster_name=c149 -d --> <dataplane database name> cluster_name --> Your HDP clustername that should be removed from DPS cluster. Example Output:- [root@rraman bin]# docker exec -it dp-database /bin/bash bash-4.3# su - postgres de0ff40ad912:~$ ls -lrt total 8 drwx------ 19 postgres postgres 4096 Sep 21 01:27 data -rwxrwxrwx 1 postgres postgres 892 Sep 21 17:59 dp_cluster_remove.txt de0ff40ad912:~$ psql -f ./dp_cluster_remove.txt -d dataplane -v cluster_name=c149 CREATE FUNCTION remove_cluster_from_dps ------------------------- (1 row) DROP FUNCTION

rrrajesh · ‎06-13-2018

In DPS-1.1.0 we can't edit all the LDAP configuration properties after the initial setup. If we have to correct LDAP configs, you need to re-initialize the DPS setup to change it, which can be a painful task. To avoid re-initializing the DPS setup, we can make the changes directly in the postgres database of dataplane. Step1:- Find the container id of dp-database on DPS machine docker ps Step2: - connect to the docker machine docker exec -it cf3f4a31e146 /bin/bash Step3:- Login to the postgres database (dataplane) su - postgres psql -d dataplane Take backup of the table, create table dataplane.ldap_configs_bkp as select * from dataplane.ldap_configs; To view the existing configuration: select * from dataplane.ldap_configs; Sample Output: id | url | bind_dn | user_searchbase | usersearch_attributename | group_searchbase | groupsearch_attr ibutename | group_objectclass | groupmember_attributename | user_object_class ----+---------------------------------------+----------------------------------------------------+-----------------------------------------+--------------------------+------------------------------------------+----------------- ----------+-------------------+---------------------------+------------------- 1 | ldap://ldap.hortonworks.com:389 | uid=xyz,ou=users,dc=support,dc=hortonworks,dc=com | ou=users,dc=support,dc=hortonworks,dc=com | uid | ou=groups,dc=support,dc=hortonworks,dc=com | cn | posixGroup | memberUid | posixAccount Step 4:- Make the changes in database for the required field For example:- if i need to change usersearch_attributename from uid to cn, then i can issue this command update dataplane.ldap_configs set usersearch_attributename='cn'; That's it! it should reflect immediately on the dataplane UI. Note:- Use this doc, when you are newly installing DPS and made a mistake in LDAP configs.

rrrajesh · ‎04-17-2018

Just a caution note:- When you are trying to set "server.jdbc.properties.loglevel=2", it adds additional database query logging and it should be enabled only when we troubleshoot performance issues. Advise to remove this property once the troubleshooting is over. We have seen 2x to 3x Ambari performance degradation when it is enabled.

rrrajesh · ‎02-02-2018

To print GC details, please add the following line in Spark--> config--> Advanced spark-env --> spark-env template and restart spark history server. export SPARK_DAEMON_JAVA_OPTS=" -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:{{spark_log_dir}}/spark_history_server.gc.`date +'%Y%m%d%H%M'`"

rrrajesh · ‎01-03-2018

Command to delete a particular solr collection:- Example:- curl "http://<SOLR-HOSTNAME>:8886/solr/audit_logs/update?commit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>evtTime:[* TO NOW-7DAYS]</query></delete>" In this example, audit_logs - is the collection name. TO NOW-7DAYS - Will delete the data older than 7 days.

rrrajesh · ‎01-03-2018

Hive metastore DB Connection verification from Command line:- You can run the following on any node where ambari agent installed on the cluster, You need the following information to run this test. You can use this to validate if the password stored in ambari and the actual mysql db password are same. 1. mysql db hostname 2. mysql db port number 3. mysql database name which hive metastore uses 4. mysql username 5. mysql password Syntax:- java -cp /usr/lib/ambari-agent/DBConnectionVerification.jar:/usr/share/java/mysql-connector-java.jar -Djava.library.path=/usr/lib/ambari-agent org.apache.ambari.server.DBConnectionVerification "jdbc:mysql://<mysql db hostname>:<mysql db port number>/<mysql database name>" "<mysql username>" "<mysql password>" com.mysql.jdbc.Driver Example:- /usr/jdk64/jdk1.8.0_112/bin/java -cp /usr/lib/ambari-agent/DBConnectionVerification.jar:/usr/share/java/mysql-connector-java.jar -Djava.library.path=/usr/lib/ambari-agent org.apache.ambari.server.DBConnectionVerification "jdbc:mysql://test.openstacklocal:50001/hive" hive hive com.mysql.jdbc.Driver Connected to DB Successfully!

rrrajesh · ‎09-21-2017

@Anwaar Siddiqui Great it works for you. Please accept the answer. You can't just upgrade one component in the stack. You have to consider moving to HDP-2.6.X latest version which has Knox 0.12. Hope this helps.

rrrajesh · ‎09-20-2017

Khaja Hussain You haven't mentioned time the workload timings when they run separately. For example, Can you run Jan 2017 data processing separately and record the number of containers it requires and the time it takes (10 min) The same way repeat for Feb 2017 data processing separately(12 min). Compare the number of containers each job demands. Then set the order policy as FAIR and set Minimum user limit to 50% for the queue, and run both the jobs parallely. Now the resource allocation should be equally distributed and observe the increased run time for the jobs as the number of containers available for each job were reduced. Also please refer the article on user-limit-factor in a yarn queue, https://community.hortonworks.com/content/supportkb/49640/what-does-the-user-limit-factor-do-when-used-in-ya.html Accept the answer if this helps for your query.

rrrajesh · ‎09-20-2017

@Khaja Hussain Please find the attached Yarn Queue manager screen snap for example. 1. You can control the resources by having separate queues for different applications with that you can control resources spent at the queue. 2. Within a Single queue , you can mention the "Minimum User limit %" to control a single user resource %. Also you can choose ordering policy as "Fair" instead of "FIFO"(First in first out) 3. You can also control maximum % of resources a single queue can take out of the total cluster capacity. Hope the attached screen snap helps. yarn-queue-config-to-control-min-max.jpg

Online	Offline
Last Visited	‎03-16-2025 07:21 PM

Member Since	‎06-03-2019 08:42 AM
Last Visited	‎03-16-2025 07:21 PM
Posts	59
Kudos received	21

Cloudera Community

Re: Sql to Cdp Data Migration

Re: Ambari server startup error

Re: Drivers.csv doesn't get uploaded

Re: Sql to Cdp Data Migration

Remove HDP Cluster from DPS using postgres script

DPS LDAP config changes after initial setup

Re: How to enable Query logging in Ambari server f...

GC logging for Spark History Server

How to delete Solr collections data beyond 7 days

Hive metastore DB Connection verification from Com...

Re: beeline/JDBC/ODBC session timeout / connection...

Re: Is there a way to set minimum/maximum number o...

Re: Is there a way to set minimum/maximum number o...