About Shelton

VidyaSargur · ‎07-06-2021

Hi @Faizan123, I hope the replies provided by @Shelton or @shobikas has helped you resolve your issue. If so, can you kindly accept them as a solution?

noamsh88 · ‎07-06-2021

sending basic examples of getting the expiration date via API for single host: import cm_client from cm_client.rest import ApiException from pprint import pprint # Configure HTTP basic authorization: basic cm_client.configuration.username = '<USERNAME>' cm_client.configuration.password = '<PASSWORD>' # Create an instance of the API class api_host = 'http://<Cloudera host>' port = '7180' api_version = 'v41' # Construct base URL for API api_url = api_host + ':' + port + '/api/' + api_version api_client = cm_client.ApiClient(api_url) cluster_api_instance = cm_client.ClouderaManagerResourceApi(api_client) try: # Retrieve information about the Cloudera Manager license. api_response = cluster_api_instance.read_license() pprint(api_response.expiration) except ApiException as e: print("Exception when calling ClouderaManagerResourceApi->read_license: %s\n" % e) for several hosts: import cm_client from cm_client.rest import ApiException from pprint import pprint import datetime # Configure HTTP basic authorization: basic cm_client.configuration.username = '<USERNAME>' cm_client.configuration.password = '<PASSWORD>' hosts_list= {"<host 1>" , "<host 2>" , "<host 3>", ....} # Create an instance of the API class for cdh_host in hosts_list: api_host = 'http://' + cdh_host port = '7180' api_version = 'v41' # Construct base URL for API api_url = api_host + ':' + port + '/api/' + api_version api_client = cm_client.ApiClient(api_url) cluster_api_instance = cm_client.ClouderaManagerResourceApi(api_client) print cdh_host + ":" try: # Retrieve information about the Cloudera Manager license. api_response = cluster_api_instance.read_license() expiration_date = api_response.expiration[0:10] print(expiration_date) print "##############" except ApiException as e: print("Exception when calling ClouderaManagerResourceApi->read_license: %s\n" % e)

SagarKanani · ‎07-05-2021

Check your Kerberos credentials cache Also note keyring is not completely compatible. You may have to use file credential cache. If this doesnt work, please send some stack trace to understand the problem.

Shelton · ‎07-04-2021

@ask_bill_brooks Thanks for the addendum and official context. Happy hadooping

Shelton · ‎07-03-2021

@harsh8 Happy to help with that question. The simple answer is YES below I am demonstrating with the table staff created in the previous post! Before the import [hdfs@bern ~]$ hdfs dfs -ls /tmp Found 5 items drwxrwxr-x - druid hadoop 0 2020-07-06 02:04 /tmp/druid-indexing drwxr-xr-x - hdfs hdfs 0 2020-07-06 01:50 /tmp/entity-file-history drwx-wx-wx - hive hdfs 0 2020-07-06 01:59 /tmp/hive -rw-r--r-- 3 hdfs hdfs 1024 2020-07-06 01:57 /tmp/ida8c04300_date570620 drwxr-xr-x - hdfs hdfs 0 2021-06-29 10:14 /tmp/sqoop When you run the sqoop import to ensure the destination directory sqoop_harsh8 doesn't already exist in HDFS $ sqoop import --connect jdbc:mysql://localhost/harsh8 --table staff --username root -P --target-dir /tmp/sqoop_harsh8 -m 1 Here I am importing the table harsh8.staff I created in the previous session. The sqoop export will create 2 files _SUCCESS and part-m-0000 in the HDFS directory as shown below. After the export the directory /tmp/sqoop_harsh8 is newly created [hdfs@bern ~]$ hdfs dfs -ls /tmp Found 5 items drwxrwxr-x - druid hadoop 0 2020-07-06 02:04 /tmp/druid-indexing drwxr-xr-x - hdfs hdfs 0 2020-07-06 01:50 /tmp/entity-file-history drwx-wx-wx - hive hdfs 0 2020-07-06 01:59 /tmp/hive -rw-r--r-- 3 hdfs hdfs 1024 2020-07-06 01:57 /tmp/ida8c04300_date570620 drwxr-xr-x - hdfs hdfs 0 2021-06-29 10:14 /tmp/sqoop -rw-r--r-- 3 hdfs hdfs 0 2021-07-03 22:04 /tmp/sqoop_harsh8 Check the contents of /tmp/sqoop_harsh8 [hdfs@bern ~]$ hdfs dfs -ls /tmp/sqoop_harsh8 Found 2 items -rw-r--r-- 3 hdfs hdfs 0 2021-07-03 22:04 /tmp/sqoop_harsh8/_SUCCESS -rw-r--r-- 3 hdfs hdfs 223 2021-07-03 22:04 /tmp/sqoop_harsh8/part-m-00000 The _SUCCESS is just a log file so cat the contents of part-m-00000 this is the data from our table harsh8.staff [hdfs@bern ~]$ hdfs dfs -cat /tmp/sqoop_harsh8/part-m-00000 100,Geoffrey,manager,50000,Admin 101,Thomas,Oracle Consultant,15000,IT 102,Biden,Project Manager,28000,PM 103,Carmicheal,Bigdata developer,30000,BDS 104,Johnson,Treasurer,21000,Accounts 105,Gerald,Director,30000,Management I piped the contents to a text file hr2.txt in my local tmp directory to enable me run a sqoop import with an acceptable format [hdfs@bern ~]$ hdfs dfs -cat /tmp/sqoop_harsh8/part-m-00000 > /tmp/hr2.txt Validate the hr2.txt contents [hdfs@bern ~]$ cat /tmp/hr2.txt 100,Geoffrey,manager,50000,Admin 101,Thomas,Oracle Consultant,15000,IT 102,Biden,Project Manager,28000,PM 103,Carmicheal,Bigdata developer,30000,BDS 104,Johnson,Treasurer,21000,Accounts 105,Gerald,Director,30000,Management Copied the hr2.txt to HDFS and validated the file was copied [hdfs@bern ~]$ hdfs dfs -copyFromLocal /tmp/hr2.txt /tmp Validation [hdfs@bern ~]$ hdfs dfs -ls /tmp Found 7 items drwxrwxr-x - druid hadoop 0 2020-07-06 02:04 /tmp/druid-indexing drwxr-xr-x - hdfs hdfs 0 2020-07-06 01:50 /tmp/entity-file-history drwx-wx-wx - hive hdfs 0 2020-07-06 01:59 /tmp/hive -rw-r--r-- 3 hdfs hdfs 223 2021-07-03 22:41 /tmp/hr2.txt -rw-r--r-- 3 hdfs hdfs 1024 2020-07-06 01:57 /tmp/ida8c04300_date570620 drwxr-xr-x - hdfs hdfs 0 2021-06-29 10:14 /tmp/sqoop drwxr-xr-x - hdfs hdfs 0 2021-07-03 22:04 /tmp/sqoop_harsh8 Connected to MySQL and switch to harsh8 database [root@bern ~]# mysql -uroot -p Enter password: Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 179 Server version: 5.5.65-MariaDB MariaDB Server Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> use harsh8; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed Check the existing tables in the harsh8 database before the export MariaDB [harsh8]> show tables; +------------------+ | Tables_in_harsh8 | +------------------+ | staff | +------------------+ 1 row in set (0.00 sec) Pre-create a table staff2 to receive the hr2.txt data MariaDB [harsh8]> CREATE TABLE staff2 ( id INT NOT NULL PRIMARY KEY, Name VARCHAR(20), Position VARCHAR(20),Salary INT,Department VARCHAR(10)); Query OK, 0 rows affected (0.57 sec) MariaDB [harsh8]> show tables; +------------------+ | Tables_in_harsh8 | +------------------+ | staff | | staff2 | +------------------+ 2 rows in set (0.00 sec) Load data into staff2 from a sqoop export ! sqoop]$ sqoop export --connect jdbc:mysql://localhost/harsh8 --username root --password 'w3lc0m31' --table staff2 --export-dir /tmp/hr2.txt [hdfs@bern ~]$ sqoop export --connect jdbc:mysql://localhost/harsh8 --username root --password 'w3lc0m31' --table staff2 --export-dir /tmp/hr2.txt ... 21/07/03 22:44:36 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7.3.1.4.0-315 21/07/03 22:44:37 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 21/07/03 22:44:37 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 21/07/03 22:44:37 INFO tool.CodeGenTool: Beginning code generation 21/07/03 22:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `staff2` AS t LIMIT 1 21/07/03 22:44:40 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `staff2` AS t LIMIT 1 21/07/03 22:44:40 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/3.1.4.0-315/hadoop-mapreduce 21/07/03 22:45:48 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/a53bb813b88ab155201196658f3ee001/staff2.jar 21/07/03 22:45:48 INFO mapreduce.ExportJobBase: Beginning export of staff2 21/07/03 22:47:59 INFO client.RMProxy: Connecting to ResourceManager at bern.swiss.ch/192.168.0.139:8050 21/07/03 22:48:07 INFO client.AHSProxy: Connecting to Application History server at bern.swiss.ch/192.168.0.139:10200 21/07/03 22:48:18 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /user/hdfs/.staging/job_1625340048377_0003 21/07/03 22:49:05 INFO input.FileInputFormat: Total input files to process : 1 21/07/03 22:49:05 INFO input.FileInputFormat: Total input files to process : 1 21/07/03 22:49:12 INFO mapreduce.JobSubmitter: number of splits:4 21/07/03 22:49:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1625340048377_0003 21/07/03 22:49:24 INFO mapreduce.JobSubmitter: Executing with tokens: [] 21/07/03 22:49:26 INFO conf.Configuration: found resource resource-types.xml at file:/etc/hadoop/3.1.4.0-315/0/resource-types.xml 21/07/03 22:49:32 INFO impl.YarnClientImpl: Submitted application application_1625340048377_0003 21/07/03 22:49:33 INFO mapreduce.Job: The url to track the job: http://bern.swiss.ch:8088/proxy/application_1625340048377_0003/ 21/07/03 22:49:33 INFO mapreduce.Job: Running job: job_1625340048377_0003 21/07/03 22:52:15 INFO mapreduce.Job: Job job_1625340048377_0003 running in uber mode : false 21/07/03 22:52:15 INFO mapreduce.Job: map 0% reduce 0% 21/07/03 22:56:45 INFO mapreduce.Job: map 75% reduce 0% 21/07/03 22:58:10 INFO mapreduce.Job: map 100% reduce 0% 21/07/03 22:58:13 INFO mapreduce.Job: Job job_1625340048377_0003 completed successfully 21/07/03 22:58:14 INFO mapreduce.Job: Counters: 32 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=971832 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=1132 HDFS: Number of bytes written=0 HDFS: Number of read operations=19 HDFS: Number of large read operations=0 HDFS: Number of write operations=0 Job Counters Launched map tasks=4 Data-local map tasks=4 Total time spent by all maps in occupied slots (ms)=1733674 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=866837 Total vcore-milliseconds taken by all map tasks=866837 Total megabyte-milliseconds taken by all map tasks=1775282176 Map-Reduce Framework Map input records=6 Map output records=6 Input split bytes=526 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=1565 CPU time spent (ms)=6710 Physical memory (bytes) snapshot=661999616 Virtual memory (bytes) snapshot=12958916608 Total committed heap usage (bytes)=462422016 Peak Map Physical memory (bytes)=202506240 Peak Map Virtual memory (bytes)=3244965888 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=0 21/07/03 22:58:14 INFO mapreduce.ExportJobBase: Transferred 1.1055 KB in 630.3928 seconds (1.7957 bytes/sec) 21/07/03 22:58:14 INFO mapreduce.ExportJobBase: Exported 6 records. Switch and log onto MariaDB, switch to harsh8 database, and query for the new table staff2 [root@bern ~]# mysql -uroot -pwelcome1 Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 266 Server version: 5.5.65-MariaDB MariaDB Server Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> use harsh8; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed MariaDB [harsh8]> show tables; +------------------+ | Tables_in_harsh8 | +------------------+ | staff | | staff2 | +------------------+ 2 rows in set (2.64 sec) MariaDB [harsh8]> SELECT NOW(); +---------------------+ | NOW() | +---------------------+ | 2021-07-03 23:02:38 | +---------------------+ 1 row in set (0.00 sec) After the import MariaDB [harsh8]> select * from staff2; +-----+------------+-------------------+--------+------------+ | id | Name | Position | Salary | Department | +-----+------------+-------------------+--------+------------+ | 100 | Geoffrey | manager | 50000 | Admin | | 101 | Thomas | Oracle Consultant | 15000 | IT | | 102 | Biden | Project Manager | 28000 | PM | | 103 | Carmicheal | Bigdata developer | 30000 | BDS | | 104 | Johnson | Treasurer | 21000 | Accounts | | 105 | Gerald | Director | 30000 | Management | +-----+------------+-------------------+--------+------------+ 6 rows in set (0.00 sec) Check the source table used in the export see the timestamp ! MariaDB [harsh8]> SELECT NOW(); +---------------------+ | NOW() | +---------------------+ | 2021-07-03 23:04:50 | +---------------------+ 1 row in set (0.00 sec) Comparison MariaDB [harsh8]> select * from staff; +-----+------------+-------------------+--------+------------+ | id | Name | Position | Salary | Department | +-----+------------+-------------------+--------+------------+ | 100 | Geoffrey | manager | 50000 | Admin | | 101 | Thomas | Oracle Consultant | 15000 | IT | | 102 | Biden | Project Manager | 28000 | PM | | 103 | Carmicheal | Bigdata developer | 30000 | BDS | | 104 | Johnson | Treasurer | 21000 | Accounts | | 105 | Gerald | Director | 30000 | Management | +-----+------------+-------------------+--------+------------+ You have successfully created a table from a Sqoop export! Et Voila The conversion from part-m-00000 to txt did the trick, so this proves it's doable so you question is answered 🙂 You can revalidate by following my steps Happy hadooping !

Shelton · ‎07-02-2021

@dooby There is a Jira out theret see the solution https://issues.apache.org/jira/browse/SPARK-32536

Shelton · ‎07-01-2021

@drgenious Primo Impala shares metadata [data about data] with HMS Hive Metastore. Impala uses HDFS caching to provide performance and scalability benefits in production environments where Impala queries and other Hadoop jobs operate on quantities of data much larger than the physical RAM on the DataNodes, making it impractical to rely on the Linux OS cache, which only keeps the most recently used data in memory. Data read from the HDFS cache avoids the overhead of checksumming and memory-to-memory copying involved when using data from the Linux OS cache. Having said that when you restart impala you are discarding all the Cached Metadata [Location of table, permissions, query execution plans, or statistics] that makes it efficient. That explains why after the restart your queries are so slow. Impala is very efficient if it reads from data that is pinned in memory through HDFS caching. It takes advantage of the HDFS API and reads the data from memory rather than from disk whether the data files are pinned using Impala DDL statements, or using the command-line mechanism where you specify HDFS paths. There is no better source of Impala information than Cloudera I will urge you to take time and read the below documentation to pin the option in your memory 🙂 Using HDFS Caching with Impala Configuring HDFS Caching for Impala There are 2 other options that you should think of as less expensive than restarting Impala I can't imagine you you have more than 70 data nodes INVALIDATE METADATA Is an asynchronous operation that simply discards the loaded metadata from the catalog and coordinator caches. After that operation, the catalog and all the Impala coordinators only know about the existence of databases and tables and nothing more. Metadata loading for tables is triggered by any subsequent queries. REFRESH Reloads the metadata synchronously. REFRESH is more lightweight than doing a full metadata load after a table has been invalidated. REFRESH cannot detect changes in block locations triggered by operations like HDFS balancer, hence causing remote reads during query execution with negative performance implications. The INVALIDATE METADATA statement marks the metadata for one or all tables as stale. The next time the Impala service performs a query against a table whose metadata is invalidated, Impala reloads the associated metadata before the query proceeds. As this is a very expensive operation compared to the incremental metadata update done by the REFRESH statement, when possible, prefer REFRESH rather than INVALIDATE METADATA. INVALIDATE METADATA is required when the following changes are made outside of Impala, in Hive and other Hive clients, such as SparkSQL: Metadata of existing tables changes. New tables are added, and Impala will use the tables. The SERVER or DATABASE level Sentry privileges are changed. Block metadata changes, but the files remain the same (HDFS rebalance). UDF jars change. Some tables are no longer queried, and you want to remove their metadata from the catalog and coordinator caches to reduce memory requirements. No INVALIDATE METADATA is needed when the changes are made by impalad. I hope that explains to you why and gives you options to use rather than warm start impala. If you know what table you want to query the run this before by qualify db. table name. This has saved me time with my data scientists and encapsulating them in their scripts is a good thing INVALIDATE METADATA [[db_name.]table_name] Recomputing the statistics is another solution Compute stats <table name>; COMPUTE STATS statement gathers information about the volume and distribution of data in a table and all associated columns and partitions. The information is stored in the Hive metastore database and used by Impala to help optimize queries. Hope that enlightens you.

Shelton · ‎06-30-2021

@mike_bronson7 Here you go how to determine YARN and MapReduce Memory Configuration Settings Happy hadooping

dmharshit · ‎06-09-2021

Problem is still there. 21/06/09 16:50:22 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 21/06/09 16:50:22 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA> 21/06/09 16:50:22 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 21/06/09 16:50:22 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64 21/06/09 16:50:22 INFO zookeeper.ZooKeeper: Client environment:os.version=3.10.0-1127.19.1.el7.x86_64 21/06/09 16:50:22 INFO zookeeper.ZooKeeper: Client environment:user.name=eagledev 21/06/09 16:50:22 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/user1 21/06/09 16:50:22 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/user1 21/06/09 16:50:22 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=hdp-slave1.mydomain.com:2181,hdp-slave2.mydomain.com:2181,hdp-master.mydomain.com:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@5ace1ed4 21/06/09 16:50:22 INFO zookeeper.ClientCnxn: Opening socket connection to server hdp-slave1.mydomain.com/10.200.104.188:2181. Will not attempt to authenticate using SASL (unknown error) 21/06/09 16:50:22 INFO zookeeper.ClientCnxn: Socket connection established to hdp-slave1.mydomain.com/10.200.104.188:2181, initiating session 21/06/09 16:50:22 INFO imps.CuratorFrameworkImpl: backgroundOperationsLoop exiting 21/06/09 16:50:22 INFO zookeeper.ClientCnxn: Session establishment complete on server hdp-slave1.mydomain.com/10.200.104.188:2181, sessionid = 0x279ef5fd2c3006b, negotiated timeout = 60000 21/06/09 16:50:22 INFO zookeeper.ZooKeeper: Session: 0x279ef5fd2c3006b closed 21/06/09 16:50:22 INFO zookeeper.ClientCnxn: EventThread shut down org.apache.curator.CuratorZookeeperClient.startAdvancedTracer(Ljava/lang/String;)Lorg/apache/curator/drivers/OperationTrace; Beeline version 3.1.0.3.1.4.0-315 by Apache Hive 0: jdbc:hive2://hdp-slave1.mydomain.com:2 (closed)>

ryu · ‎06-03-2021

Also Squirrel seems to be connecting to the dev cluster. It just times out when running a query such as "show databases". If squirrel stays connected for a long time, I noticed that the query will eventually return results instead of timing out. Per cloudera "https://docs.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_hive_metastore_configure.html#concept_jsw_bnc_rp" It says that minimum 4 dedicated cores to HS2 and 4 for hive metastore. The server that hosts hs2 and metastore only has a total of 8 cores. Can this be a reason for the performance issue? Any help on this is much appreciated. Thanks,

Online	Offline
Last Visited	‎06-05-2025 02:03 PM

Member Since	‎01-19-2017 04:35 AM
Last Visited	‎06-05-2025 02:03 PM
Posts	3,676
Kudos received	627

Cloudera Community

Re: Apache nifi memory consumption in kubernetes

Re: Nifi toolkit command for GitLabFlowRegistry

Re: Not able to delete the NiFi existing flow usin...

Re: Securing Nifi with SSL and using OIDC provider...

Re: External zookeeper and nifi cluster connection...

Re: Segregation between compute node and data node

Re: CLI to get expiration date of Cloudera license

Re: Issue on HORTONWORK ODBC

Re: username gives error in wget

Re: Sqoop export to create table

Re: Hive insert oquery failed in HDP 3.1

Re: Warm up impala

Re: YARN resource manager + what is the count of n...

Re: hiveserver2 not starting in HDP 3.1.4

Re: hiveserver2 error