Member since
01-19-2017
3676
Posts
632
Kudos Received
372
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 605 | 06-04-2025 11:36 PM | |
| 1150 | 03-23-2025 05:23 AM | |
| 573 | 03-17-2025 10:18 AM | |
| 2169 | 03-05-2025 01:34 PM | |
| 1362 | 03-03-2025 01:09 PM |
07-06-2021
11:50 PM
Hi @Faizan123, I hope the replies provided by @Shelton or @shobikas has helped you resolve your issue. If so, can you kindly accept them as a solution?
... View more
07-06-2021
03:58 AM
sending basic examples of getting the expiration date via API for single host: import cm_client from cm_client.rest import ApiException from pprint import pprint # Configure HTTP basic authorization: basic cm_client.configuration.username = '<USERNAME>' cm_client.configuration.password = '<PASSWORD>' # Create an instance of the API class api_host = 'http://<Cloudera host>' port = '7180' api_version = 'v41' # Construct base URL for API api_url = api_host + ':' + port + '/api/' + api_version api_client = cm_client.ApiClient(api_url) cluster_api_instance = cm_client.ClouderaManagerResourceApi(api_client) try: # Retrieve information about the Cloudera Manager license. api_response = cluster_api_instance.read_license() pprint(api_response.expiration) except ApiException as e: print("Exception when calling ClouderaManagerResourceApi->read_license: %s\n" % e) for several hosts: import cm_client from cm_client.rest import ApiException from pprint import pprint import datetime # Configure HTTP basic authorization: basic cm_client.configuration.username = '<USERNAME>' cm_client.configuration.password = '<PASSWORD>' hosts_list= {"<host 1>" , "<host 2>" , "<host 3>", ....} # Create an instance of the API class for cdh_host in hosts_list: api_host = 'http://' + cdh_host port = '7180' api_version = 'v41' # Construct base URL for API api_url = api_host + ':' + port + '/api/' + api_version api_client = cm_client.ApiClient(api_url) cluster_api_instance = cm_client.ClouderaManagerResourceApi(api_client) print cdh_host + ":" try: # Retrieve information about the Cloudera Manager license. api_response = cluster_api_instance.read_license() expiration_date = api_response.expiration[0:10] print(expiration_date) print "##############" except ApiException as e: print("Exception when calling ClouderaManagerResourceApi->read_license: %s\n" % e)
... View more
07-05-2021
02:09 AM
Check your Kerberos credentials cache Also note keyring is not completely compatible. You may have to use file credential cache. If this doesnt work, please send some stack trace to understand the problem.
... View more
07-04-2021
11:03 PM
@ask_bill_brooks Thanks for the addendum and official context. Happy hadooping
... View more
07-03-2021
02:39 PM
@harsh8 Happy to help with that question. The simple answer is YES below I am demonstrating with the table staff created in the previous post! Before the import [hdfs@bern ~]$ hdfs dfs -ls /tmp
Found 5 items
drwxrwxr-x - druid hadoop 0 2020-07-06 02:04 /tmp/druid-indexing
drwxr-xr-x - hdfs hdfs 0 2020-07-06 01:50 /tmp/entity-file-history
drwx-wx-wx - hive hdfs 0 2020-07-06 01:59 /tmp/hive
-rw-r--r-- 3 hdfs hdfs 1024 2020-07-06 01:57 /tmp/ida8c04300_date570620
drwxr-xr-x - hdfs hdfs 0 2021-06-29 10:14 /tmp/sqoop When you run the sqoop import to ensure the destination directory sqoop_harsh8 doesn't already exist in HDFS $ sqoop import --connect jdbc:mysql://localhost/harsh8 --table staff --username root -P --target-dir /tmp/sqoop_harsh8 -m 1 Here I am importing the table harsh8.staff I created in the previous session. The sqoop export will create 2 files _SUCCESS and part-m-0000 in the HDFS directory as shown below. After the export the directory /tmp/sqoop_harsh8 is newly created [hdfs@bern ~]$ hdfs dfs -ls /tmp
Found 5 items
drwxrwxr-x - druid hadoop 0 2020-07-06 02:04 /tmp/druid-indexing
drwxr-xr-x - hdfs hdfs 0 2020-07-06 01:50 /tmp/entity-file-history
drwx-wx-wx - hive hdfs 0 2020-07-06 01:59 /tmp/hive
-rw-r--r-- 3 hdfs hdfs 1024 2020-07-06 01:57 /tmp/ida8c04300_date570620
drwxr-xr-x - hdfs hdfs 0 2021-06-29 10:14 /tmp/sqoop
-rw-r--r-- 3 hdfs hdfs 0 2021-07-03 22:04 /tmp/sqoop_harsh8 Check the contents of /tmp/sqoop_harsh8 [hdfs@bern ~]$ hdfs dfs -ls /tmp/sqoop_harsh8
Found 2 items
-rw-r--r-- 3 hdfs hdfs 0 2021-07-03 22:04 /tmp/sqoop_harsh8/_SUCCESS
-rw-r--r-- 3 hdfs hdfs 223 2021-07-03 22:04 /tmp/sqoop_harsh8/part-m-00000 The _SUCCESS is just a log file so cat the contents of part-m-00000 this is the data from our table harsh8.staff [hdfs@bern ~]$ hdfs dfs -cat /tmp/sqoop_harsh8/part-m-00000
100,Geoffrey,manager,50000,Admin
101,Thomas,Oracle Consultant,15000,IT
102,Biden,Project Manager,28000,PM
103,Carmicheal,Bigdata developer,30000,BDS
104,Johnson,Treasurer,21000,Accounts
105,Gerald,Director,30000,Management I piped the contents to a text file hr2.txt in my local tmp directory to enable me run a sqoop import with an acceptable format [hdfs@bern ~]$ hdfs dfs -cat /tmp/sqoop_harsh8/part-m-00000 > /tmp/hr2.txt Validate the hr2.txt contents [hdfs@bern ~]$ cat /tmp/hr2.txt
100,Geoffrey,manager,50000,Admin
101,Thomas,Oracle Consultant,15000,IT
102,Biden,Project Manager,28000,PM
103,Carmicheal,Bigdata developer,30000,BDS
104,Johnson,Treasurer,21000,Accounts
105,Gerald,Director,30000,Management Copied the hr2.txt to HDFS and validated the file was copied [hdfs@bern ~]$ hdfs dfs -copyFromLocal /tmp/hr2.txt /tmp Validation [hdfs@bern ~]$ hdfs dfs -ls /tmp
Found 7 items
drwxrwxr-x - druid hadoop 0 2020-07-06 02:04 /tmp/druid-indexing
drwxr-xr-x - hdfs hdfs 0 2020-07-06 01:50 /tmp/entity-file-history
drwx-wx-wx - hive hdfs 0 2020-07-06 01:59 /tmp/hive
-rw-r--r-- 3 hdfs hdfs 223 2021-07-03 22:41 /tmp/hr2.txt
-rw-r--r-- 3 hdfs hdfs 1024 2020-07-06 01:57 /tmp/ida8c04300_date570620
drwxr-xr-x - hdfs hdfs 0 2021-06-29 10:14 /tmp/sqoop
drwxr-xr-x - hdfs hdfs 0 2021-07-03 22:04 /tmp/sqoop_harsh8 Connected to MySQL and switch to harsh8 database [root@bern ~]# mysql -uroot -p
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 179
Server version: 5.5.65-MariaDB MariaDB Server
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> use harsh8;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed Check the existing tables in the harsh8 database before the export MariaDB [harsh8]> show tables;
+------------------+
| Tables_in_harsh8 |
+------------------+
| staff |
+------------------+
1 row in set (0.00 sec) Pre-create a table staff2 to receive the hr2.txt data MariaDB [harsh8]> CREATE TABLE staff2 ( id INT NOT NULL PRIMARY KEY, Name VARCHAR(20), Position VARCHAR(20),Salary INT,Department VARCHAR(10));
Query OK, 0 rows affected (0.57 sec)
MariaDB [harsh8]> show tables;
+------------------+
| Tables_in_harsh8 |
+------------------+
| staff |
| staff2 |
+------------------+
2 rows in set (0.00 sec) Load data into staff2 from a sqoop export ! sqoop]$ sqoop export --connect jdbc:mysql://localhost/harsh8 --username root --password 'w3lc0m31' --table staff2 --export-dir /tmp/hr2.txt
[hdfs@bern ~]$ sqoop export --connect jdbc:mysql://localhost/harsh8 --username root --password 'w3lc0m31' --table staff2 --export-dir /tmp/hr2.txt
...
21/07/03 22:44:36 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7.3.1.4.0-315
21/07/03 22:44:37 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
21/07/03 22:44:37 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
21/07/03 22:44:37 INFO tool.CodeGenTool: Beginning code generation
21/07/03 22:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `staff2` AS t LIMIT 1
21/07/03 22:44:40 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `staff2` AS t LIMIT 1
21/07/03 22:44:40 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/3.1.4.0-315/hadoop-mapreduce
21/07/03 22:45:48 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/a53bb813b88ab155201196658f3ee001/staff2.jar
21/07/03 22:45:48 INFO mapreduce.ExportJobBase: Beginning export of staff2
21/07/03 22:47:59 INFO client.RMProxy: Connecting to ResourceManager at bern.swiss.ch/192.168.0.139:8050
21/07/03 22:48:07 INFO client.AHSProxy: Connecting to Application History server at bern.swiss.ch/192.168.0.139:10200
21/07/03 22:48:18 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /user/hdfs/.staging/job_1625340048377_0003
21/07/03 22:49:05 INFO input.FileInputFormat: Total input files to process : 1
21/07/03 22:49:05 INFO input.FileInputFormat: Total input files to process : 1
21/07/03 22:49:12 INFO mapreduce.JobSubmitter: number of splits:4
21/07/03 22:49:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1625340048377_0003
21/07/03 22:49:24 INFO mapreduce.JobSubmitter: Executing with tokens: []
21/07/03 22:49:26 INFO conf.Configuration: found resource resource-types.xml at file:/etc/hadoop/3.1.4.0-315/0/resource-types.xml
21/07/03 22:49:32 INFO impl.YarnClientImpl: Submitted application application_1625340048377_0003
21/07/03 22:49:33 INFO mapreduce.Job: The url to track the job: http://bern.swiss.ch:8088/proxy/application_1625340048377_0003/
21/07/03 22:49:33 INFO mapreduce.Job: Running job: job_1625340048377_0003
21/07/03 22:52:15 INFO mapreduce.Job: Job job_1625340048377_0003 running in uber mode : false
21/07/03 22:52:15 INFO mapreduce.Job: map 0% reduce 0%
21/07/03 22:56:45 INFO mapreduce.Job: map 75% reduce 0%
21/07/03 22:58:10 INFO mapreduce.Job: map 100% reduce 0%
21/07/03 22:58:13 INFO mapreduce.Job: Job job_1625340048377_0003 completed successfully
21/07/03 22:58:14 INFO mapreduce.Job: Counters: 32
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=971832
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1132
HDFS: Number of bytes written=0
HDFS: Number of read operations=19
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=4
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=1733674
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=866837
Total vcore-milliseconds taken by all map tasks=866837
Total megabyte-milliseconds taken by all map tasks=1775282176
Map-Reduce Framework
Map input records=6
Map output records=6
Input split bytes=526
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=1565
CPU time spent (ms)=6710
Physical memory (bytes) snapshot=661999616
Virtual memory (bytes) snapshot=12958916608
Total committed heap usage (bytes)=462422016
Peak Map Physical memory (bytes)=202506240
Peak Map Virtual memory (bytes)=3244965888
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
21/07/03 22:58:14 INFO mapreduce.ExportJobBase: Transferred 1.1055 KB in 630.3928 seconds (1.7957 bytes/sec)
21/07/03 22:58:14 INFO mapreduce.ExportJobBase: Exported 6 records. Switch and log onto MariaDB, switch to harsh8 database, and query for the new table staff2 [root@bern ~]# mysql -uroot -pwelcome1
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 266
Server version: 5.5.65-MariaDB MariaDB Server
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> use harsh8;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
MariaDB [harsh8]> show tables;
+------------------+
| Tables_in_harsh8 |
+------------------+
| staff |
| staff2 |
+------------------+
2 rows in set (2.64 sec)
MariaDB [harsh8]> SELECT NOW();
+---------------------+
| NOW() |
+---------------------+
| 2021-07-03 23:02:38 |
+---------------------+
1 row in set (0.00 sec) After the import MariaDB [harsh8]> select * from staff2;
+-----+------------+-------------------+--------+------------+
| id | Name | Position | Salary | Department |
+-----+------------+-------------------+--------+------------+
| 100 | Geoffrey | manager | 50000 | Admin |
| 101 | Thomas | Oracle Consultant | 15000 | IT |
| 102 | Biden | Project Manager | 28000 | PM |
| 103 | Carmicheal | Bigdata developer | 30000 | BDS |
| 104 | Johnson | Treasurer | 21000 | Accounts |
| 105 | Gerald | Director | 30000 | Management |
+-----+------------+-------------------+--------+------------+
6 rows in set (0.00 sec) Check the source table used in the export see the timestamp ! MariaDB [harsh8]> SELECT NOW();
+---------------------+
| NOW() |
+---------------------+
| 2021-07-03 23:04:50 |
+---------------------+
1 row in set (0.00 sec) Comparison MariaDB [harsh8]> select * from staff;
+-----+------------+-------------------+--------+------------+
| id | Name | Position | Salary | Department |
+-----+------------+-------------------+--------+------------+
| 100 | Geoffrey | manager | 50000 | Admin |
| 101 | Thomas | Oracle Consultant | 15000 | IT |
| 102 | Biden | Project Manager | 28000 | PM |
| 103 | Carmicheal | Bigdata developer | 30000 | BDS |
| 104 | Johnson | Treasurer | 21000 | Accounts |
| 105 | Gerald | Director | 30000 | Management |
+-----+------------+-------------------+--------+------------+ You have successfully created a table from a Sqoop export! Et Voila The conversion from part-m-00000 to txt did the trick, so this proves it's doable so you question is answered 🙂 You can revalidate by following my steps Happy hadooping !
... View more
07-02-2021
02:51 AM
@dooby There is a Jira out theret see the solution https://issues.apache.org/jira/browse/SPARK-32536
... View more
07-01-2021
11:23 AM
1 Kudo
@drgenious Primo Impala shares metadata [data about data] with HMS Hive Metastore. Impala uses HDFS caching to provide performance and scalability benefits in production environments where Impala queries and other Hadoop jobs operate on quantities of data much larger than the physical RAM on the DataNodes, making it impractical to rely on the Linux OS cache, which only keeps the most recently used data in memory. Data read from the HDFS cache avoids the overhead of checksumming and memory-to-memory copying involved when using data from the Linux OS cache. Having said that when you restart impala you are discarding all the Cached Metadata [Location of table, permissions, query execution plans, or statistics] that makes it efficient. That explains why after the restart your queries are so slow. Impala is very efficient if it reads from data that is pinned in memory through HDFS caching. It takes advantage of the HDFS API and reads the data from memory rather than from disk whether the data files are pinned using Impala DDL statements, or using the command-line mechanism where you specify HDFS paths. There is no better source of Impala information than Cloudera I will urge you to take time and read the below documentation to pin the option in your memory 🙂 Using HDFS Caching with Impala Configuring HDFS Caching for Impala There are 2 other options that you should think of as less expensive than restarting Impala I can't imagine you you have more than 70 data nodes INVALIDATE METADATA Is an asynchronous operation that simply discards the loaded metadata from the catalog and coordinator caches. After that operation, the catalog and all the Impala coordinators only know about the existence of databases and tables and nothing more. Metadata loading for tables is triggered by any subsequent queries. REFRESH Reloads the metadata synchronously. REFRESH is more lightweight than doing a full metadata load after a table has been invalidated. REFRESH cannot detect changes in block locations triggered by operations like HDFS balancer, hence causing remote reads during query execution with negative performance implications. The INVALIDATE METADATA statement marks the metadata for one or all tables as stale. The next time the Impala service performs a query against a table whose metadata is invalidated, Impala reloads the associated metadata before the query proceeds. As this is a very expensive operation compared to the incremental metadata update done by the REFRESH statement, when possible, prefer REFRESH rather than INVALIDATE METADATA. INVALIDATE METADATA is required when the following changes are made outside of Impala, in Hive and other Hive clients, such as SparkSQL: Metadata of existing tables changes.
New tables are added, and Impala will use the tables.
The SERVER or DATABASE level Sentry privileges are changed.
Block metadata changes, but the files remain the same (HDFS rebalance).
UDF jars change.
Some tables are no longer queried, and you want to remove their metadata from the catalog and coordinator caches to reduce memory requirements.
No INVALIDATE METADATA is needed when the changes are made by impalad. I hope that explains to you why and gives you options to use rather than warm start impala. If you know what table you want to query the run this before by qualify db. table name. This has saved me time with my data scientists and encapsulating them in their scripts is a good thing INVALIDATE METADATA [[db_name.]table_name] Recomputing the statistics is another solution Compute stats <table name>; COMPUTE STATS statement gathers information about the volume and distribution of data in a table and all associated columns and partitions. The information is stored in the Hive metastore database and used by Impala to help optimize queries. Hope that enlightens you.
... View more
06-30-2021
08:48 AM
@mike_bronson7 Here you go how to determine YARN and MapReduce Memory Configuration Settings Happy hadooping
... View more
06-09-2021
04:24 AM
Problem is still there. 21/06/09 16:50:22 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
21/06/09 16:50:22 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
21/06/09 16:50:22 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
21/06/09 16:50:22 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
21/06/09 16:50:22 INFO zookeeper.ZooKeeper: Client environment:os.version=3.10.0-1127.19.1.el7.x86_64
21/06/09 16:50:22 INFO zookeeper.ZooKeeper: Client environment:user.name=eagledev
21/06/09 16:50:22 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/user1
21/06/09 16:50:22 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/user1
21/06/09 16:50:22 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=hdp-slave1.mydomain.com:2181,hdp-slave2.mydomain.com:2181,hdp-master.mydomain.com:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@5ace1ed4
21/06/09 16:50:22 INFO zookeeper.ClientCnxn: Opening socket connection to server hdp-slave1.mydomain.com/10.200.104.188:2181. Will not attempt to authenticate using SASL (unknown error)
21/06/09 16:50:22 INFO zookeeper.ClientCnxn: Socket connection established to hdp-slave1.mydomain.com/10.200.104.188:2181, initiating session
21/06/09 16:50:22 INFO imps.CuratorFrameworkImpl: backgroundOperationsLoop exiting
21/06/09 16:50:22 INFO zookeeper.ClientCnxn: Session establishment complete on server hdp-slave1.mydomain.com/10.200.104.188:2181, sessionid = 0x279ef5fd2c3006b, negotiated timeout = 60000
21/06/09 16:50:22 INFO zookeeper.ZooKeeper: Session: 0x279ef5fd2c3006b closed
21/06/09 16:50:22 INFO zookeeper.ClientCnxn: EventThread shut down
org.apache.curator.CuratorZookeeperClient.startAdvancedTracer(Ljava/lang/String;)Lorg/apache/curator/drivers/OperationTrace;
Beeline version 3.1.0.3.1.4.0-315 by Apache Hive
0: jdbc:hive2://hdp-slave1.mydomain.com:2 (closed)>
... View more
06-03-2021
03:54 PM
Also Squirrel seems to be connecting to the dev cluster. It just times out when running a query such as "show databases". If squirrel stays connected for a long time, I noticed that the query will eventually return results instead of timing out. Per cloudera "https://docs.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_hive_metastore_configure.html#concept_jsw_bnc_rp" It says that minimum 4 dedicated cores to HS2 and 4 for hive metastore. The server that hosts hs2 and metastore only has a total of 8 cores. Can this be a reason for the performance issue? Any help on this is much appreciated. Thanks,
... View more