About Shelton

Shelton · ‎11-03-2019

@Ani73 Azure controls access to different port through the firewall easily configured from the networking pane. You should locate the VM running Ambari Home --> Resource Group-->filter type and look for the Ambari VM and under settings locate the networking it will open a network interface pane on the right where you set the inbound and outbound rules. See attached screenshot If you are not a network expert the best would be to lock the inbound port rules with source * and Destination * to your IP which you can get from the URL www.whatismyipaddress.com

Shelton · ‎11-03-2019

@mike_bronson7 Yes, it's possible to deploy HDF using Ambari blueprints. If you compared an HDP and HDF blueprint you will notice a difference in the components section only. Deploy HDF 1 using a blueprint Deploy HDF 2 using a blueprint Deploy HDF 3 using a blueprint Above are some links that show the possibility

Shelton · ‎10-31-2019

@moubaba Here is a fantastic document by Artem Ervits Ambari Views REST API Overview Hope that answers your query Happy hadooping

Shelton · ‎10-31-2019

@mike_bronson7 Confluent and Kafka are inseparable 🙂 HDF also has good tooling around Kafka but what you decide on usually depends on the skillsets at hand. Containerized apps are now the norm with reasons as shared before but nevertheless, HDF 3.1 is package with SAM, Nifi, Ambari, Registry and Ranger quite a complete offering. But the Dockerized version you have too many moving parts and synchronizing Kafka; zookeeper and registry could be a challenge without the good skillsets but the positive side goes to upgrades and deployment and portability OS agnostic. The choice is yours 🙂

Shelton · ‎10-31-2019

@RjsChw The error you are encountering is Oracle related you should validate the username password against the oracle database. If you have an oracle client installed on your laptop do the below there are many variations. But let your DBA give you the username/password for the database you are trying to export. Variant 1 sqlplus /nolog Connect user/password@dbname Variant 2 sqlplus user@orcl Variant 3 sqlplus user/password@hostname:port/sid ----------- ERROR manager.SqlManager: Error executing statement: java.sql.SQLException: ORA-01017: invalid username/password; logon denied java.sql.SQLException: ORA-01017: invalid username/password; logon denied Having said that Sqoop on the command line will display your password in clear text, that's not secure so below is a way to encrypt your password so that your sqoop jobs are secured from prying eyes, to do that you will use the hadoop package to encrypt your oracle credential. In the below example I am creating a password for my fictitious testDB and using the name in the alias to easily identify it from the 100's of db's. In the below example I am using MySQL database the alias doesn't matter ie oracle.testDB.alias or db2.testDB.alias, the most important is the password it should match the password of the user of the Oracle/MySQL/dbs user Encrypting SQOOP password Generating the jceks file you MUST provide a path to your hdfs home, create one before executing this command $ hadoop credential create mysql.testDB.alias -provider jceks://hdfs/user/george/mysql.testDB.password.jceks Enter password: [database_password] Enter password again: [database_password] mysql.testDB.alias has been successfully created. org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated. Validating the encyrpted password creation The encrypted password jceks will be MUST be written to your hdfs home $ hdfs dfs -ls /user/george Found 1 items -rwx------ 3 george hdfs 503 2018-09-02 01:40 /user/george/mysql.testDB.password.jceks Running the sqoop with the jceks alias Assumption my mysql database test is running on host is pomme.cloudera.com port 3306 $ sqoop import -Dhadoop.security.credential.provider.path=jceks://hdfs/user/george/mysql.testDB.password.jceks --driver com.mysql.jdbc.Driver --connect jdbc:mysql://pomme.cloudera.com:3306/test --username george --password-alias mysql.testDB.alias --table "customer" --target-dir /user/george/test Success output Warning: /usr/hdp/2.6.2.0-205/accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 18/09/02 02:08:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.2.0-205 18/09/02 02:08:06 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time. 18/09/02 02:08:06 INFO manager.SqlManager: Using default fetchSize of 1000 18/09/02 02:08:06 INFO tool.CodeGenTool: Beginning code generation .......... some text removed here.............. 18/09/02 02:08:18 INFO mapreduce.Job: The url to track the job: http://pomme.cloudera.com:8088/proxy/application_1535835049607_0002/ 18/09/02 02:08:18 INFO mapreduce.Job: Running job: job_1535835049607_0002 18/09/02 02:08:55 INFO mapreduce.Job: Job job_1535835049607_0002 running in uber mode : false Total megabyte-milliseconds taken by all map tasks=917431296 Map-Reduce Framework Map input records=2170 Map output records=2170 Input split bytes=396 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=944 CPU time spent (ms)=11690 Physical memory (bytes) snapshot=669270016 Virtual memory (bytes) snapshot=18275794944 Total committed heap usage (bytes)=331350016 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=243892 18/09/02 02:11:48 INFO mapreduce.ImportJobBase: Transferred 238.1758 KB in 218.8164 seconds (1.0885 KB/sec) 18/09/02 02:11:48 INFO mapreduce.ImportJobBase: Retrieved 2170 records. Sqoop import in hdfs Check the export was successful $ hdfs dfs -ls /user/george/test Found 5 items -rw-r--r-- 3 george hdfs 0 2018-09-02 02:11 /user/george/test/_SUCCESS -rw-r--r-- 3 george hdfs 60298 2018-09-02 02:10 /user/george/test/part-m-00000 -rw-r--r-- 3 george hdfs 60894 2018-09-02 02:10 /user/george/test/part-m-00001 -rw-r--r-- 3 george hdfs 62050 2018-09-02 02:11 /user/george/test/part-m-00002 -rw-r--r-- 3 george hdfs 60650 2018-09-02 02:11 /user/george/test/part-m-00003 Check the values in he splits $ hdfs dfs -cat /user/george/test/part-m-00000 1,Julian Stuart,sagittis.felis@sedhendrerit.com,Suspendisse Tristique Neque Associates,9230 Metus. Av.,Pemberton,Mexico 2,Ferris Fulton,condimentum@morbitristique.co.uk,Nunc Ltd,256-788 At Avenue,Northampton,China 3,Byron Irwin,adipiscing.Mauris@DonecnibhQuisque.edu,Nascetur Ridiculus Foundation,4042 Non, St.,Gattatico,Lithuania ..........................some text removed ................ 18,Peter Middleton,purus.Nullam.scelerisque@egetdictumplacerat.com,Erat In Consectetuer Associates,1618 Donec St.,Grand Island,Thailand Voila

Shelton · ‎10-31-2019

@nirajp Either way HIVE or Beeline you MUST provide username /password to authenticate to be able to execute any SQL statement against the DB. See below examples Hive CLI [hive@calgary ~]$ hive .......... SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Connecting to jdbc:hive2://calgary.canada.ca:2181,ottawa.canada.ca:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 Enter username for jdbc:hive2://calgary.canada.ca:2181,ottawa.canada.ca:2181/default: hive Enter password for jdbc:hive2://calgary.canada.ca:2181,ottawa.canada.ca:2181/default: **** Beeline Connection [hive@london ~]$ beeline Beeline version 1.2.1000.2.5.3.0-37 by Apache Hive beeline> ! connect jdbc:hive2://london.tesco.co.uk:10000/;principal=hive/london.tesco.co.uk@TESCO.CO.UK Connecting to jdbc:hive2://london.tesco.co.uk:10000/;principal=hive/london.tesco.co.uk@TESCO.CO.UK Enter username for jdbc:hive2://london.tesco.co.uk:10000/;principal=hive/london.tesco.co.uk@TESCO.CO.UK:xxxxx Enter password for jdbc:hive2://london.tesco.co.uk:10000/;principal=hive/london.tesco.co.uk@TESCO.CO.UK:xxxxx Connected to: Apache Hive (version 1.2.1000.2.5.3.0-37) Driver: Hive JDBC (version 1.2.1000.2.5.3.0-37) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://london.tesco.co.uk:10000/> show databases; +----------------+--+ | database_name | +----------------+--+ | default | | uxbribge | | White_city | +----------------+--+ 3 rows selected (2.863 seconds) If you have ranger plugin enable for hive then you will have authorization centrally handles by Ranger. HTH

Shelton · ‎10-31-2019

@saivenkatg55 This is nor a silver bullet but worth trying,your ambari database could be overwhelmed scanning through old data to help you narrow down the problem please do the below steps Stop the Ambari Server by using ambari-server stop. # ambari-server stop Run db-purge-history Use the cotrrect data format for your server # ambari-server db-purge-history --cluster-name [PROD] --from-date 2016-04-01 Start the Ambari Server: by using ambari-server start . # ambari-server start Ref: Tuning Ambari server performance Please revert l

Shelton · ‎10-30-2019

@mike_bronson7 Docker containers provide an ideal foundation for running Kafka-as-a-Service on-premises or in the public cloud. However, using Docker containers in production environments poses some challenges including container management, scheduling, network configuration and security, and performance Containerized applications have no resource constraints and can use as much of a given resource as the host’s kernel scheduler allows and also each container’s access to the host machine’s CPU cycles is unlimited. It is important not to allow a running container to consume too much of the host machine’s memory As you are aware kafka will need Zookeepers so you got to architect well your Kafka deployment but once you master then its a piece of cake and it brings a lot of advantages like upgrades, scaling out, etc As I reiterated that a good move get you hands dirty 🙂

Shelton · ‎10-30-2019

@moubaba I know its a dompting task but unfortunately, I could create a link by an article for Jordan Moore let me try to elaborate. Simple advice always run a crontab job to dump all your databases or at leat the ambari DB nightly ! Firstly Ambari doesn't control any data residing on the data nodes, so you should be safe there. Stop all ambari-agents in the cluster, maybe even uninstalling it so as to let all the Hadoop components remain running "in the dark". Install and set up a new Ambari server, add a cluster but do not register any hosts [very important]. If you uninstalled the ambari -agents on the nodes please do re-install ensure the ambari-server and the agents are the same versions. Do the below if you removed Ambari and the agents # yum repolist On the ambari-server # yum install ambari-server # yum install ambari-agent On the other hosts # yum install ambari-agent Reconfigure manually the ambari agents [ambari-agent.ini] to point at the new Ambari server address, and start them. Add the hosts in the Ambari server UI, selecting "manual registration" option the hosts register successfully since the agents are running a point to note install and configure an ambari-agent on the Ambari server to point to itself !!!! After this, you get the option of installing clients and servers. Now, you could try to "reinstall" what is already there, but you might want to deselect all the servers on the data node column. In theory, it will try to perform OS package installation, and say that the service already exists, and doesn't error out. If it does error, then restart the install process, but deselect everything. At which point, it should continue and now you have Ambari back up and running with all the hosts monitored, just with no processes to configure. To add the services back, you would need to use the Ambari REST API to add back in the respective Services, Components, and Host Components that you have running on the cluster. If you can't remember all the Services, Components, and Host Components then go to each host and do a process check to see what's running. Variables export AMBARI_USER=admin export AMBARI_PASSWD=admin export CLUSTER_NAME=<New_Cluster_name> export AMBARI_SERVER=<Your_new_ambari_server_FQDN> export AMBARI_PORT=8080 List all services related to hive In the below example I am listing all hive components curl -u $AMBARI_USER:$AMBARI_PASSWD -H 'X-Requested-By: ambari' -X GET "http://$AMBARI_SERVER:$AMBARI_PORT/api/v1/clusters/$CLUSTER_NAME/services/HIVE" Adding back pig curl -k -u $AMBARI_USER:$AMBARI_PASSWD -H "X-Requested-By:ambari" -i -X POST -d '{"RequestInfo":{"context":"Install PIG"}, "Body":{"HostRoles":{"state":"INSTALLED"}}}' 'http://'$AMBARI_SERVER':$AMBARI_PORT/api/v1/clusters/'$CLUSTER_NAME'/services/PIG/components/PIG' This is the time to get your hands dirty 🙂 This isn't production so it should train you for a real situation Happy hadooping !!

Shelton · ‎10-29-2019

@Elephanta Okay, now with that information I get a better understanding and picture. By default, HDP 2.6 has a replication factor of 3 so it's looking to place the other 2 copies on different data nodes that the existing unless you create new files with a replication factor of 1 you will continue to get the unreplicated block errors 🙂 but now that you know it's manageable. Maybe next time you delete files in hdfs use the -skipTrash option hdfs dfs -rm -skipTrash /path/to/hdfs/file/to/remove/permanently or emtpyting the existing .Trash Options: Change the replication factor of a file at creatio: hdfs dfs –setrep –w 1 /user/hdfs/file.txt Or change the replication factor of a directory hdfs dfs -setrep -R 1 /user/hdfs/your_dir Changing the replication factor for a directory will only affect the existing files and the new files under the directory will get created with the default replication factor in dfs.replication from hdfs-site.xml maybe in your case this is what you should change to 1 as this takes effect cluster-wide for your Dev environment Happy hadooping

Online	Offline
Last Visited	‎06-05-2025 02:03 PM

Member Since	‎01-19-2017 04:35 AM
Last Visited	‎06-05-2025 02:03 PM
Posts	3,676
Kudos received	627

Cloudera Community

Re: Apache nifi memory consumption in kubernetes

Re: Nifi toolkit command for GitLabFlowRegistry

Re: Not able to delete the NiFi existing flow usin...

Re: Securing Nifi with SSL and using OIDC provider...

Re: External zookeeper and nifi cluster connection...

Re: How do I setup internet access to all Web UI f...

Re: Kafka using Docker for production clusters

Re: Ambari Mysql Database lost

Re: Kafka using Docker for production clusters

Re: Sqoop connection to Kerberos authenticated RDB...

Re: Hive cli asking username and password

Re: Ambari UI is responding very Slow

Re: Kafka using Docker for production clusters

Re: Ambari Mysql Database lost

Re: How to troubleshoot slow NameNode start up?