Member since
09-18-2015
216
Posts
208
Kudos Received
49
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
497 | 09-13-2017 06:04 AM | |
1021 | 06-27-2017 06:31 PM | |
1205 | 06-27-2017 06:27 PM | |
5694 | 11-04-2016 08:02 PM | |
2164 | 05-25-2016 03:42 PM |
12-15-2017
06:14 PM
Currently, distcp data from secure zone in one cluster to secure zone in another cluster is not pssible unless you copy encryption keys from source cluster to target cluster. If both clusters have same keys, specify the -skipcrccheck and -update flags to avoid verifying checksums. AFAIK, HDFS client (DistCp) patch will be required using which HDFS Client will decrypt the Source-data in folder-1 using key-1, pass it over wire and encrypt Target-data in folder-1 using key-2, its a work in progress currently.
... View more
09-13-2017
06:27 AM
@Bhushan kumar What do you mean by ldapsearch not working on Knox? I am assuming that you are trying to do ldapsearch from host where knox is installed and getting "ldapsearch: command not found" which means ldap client utilities are not installed. Installing openldap-clients should fix the issue. Below is example from Centos7: [root@pk-test4 ~]# yum provides ldapsearch
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: mirror.sjc02.svwh.net
* extras: mirror.keystealth.org
* updates: centos.mia.host-engine.com
openldap-clients-2.4.40-13.el7.x86_64 : LDAP client utilities
Repo : base
Matched from:
Filename : /usr/bin/ldapsearch
openldap-clients-2.4.40-13.el7.x86_64 : LDAP client utilities
Repo : @base
Matched from:
Filename : /bin/ldapsearch
openldap-clients-2.4.40-13.el7.x86_64 : LDAP client utilities
Repo : @base
Matched from:
Filename : /usr/bin/ldapsearch
... View more
09-13-2017
06:21 AM
@saanvi I suggest asking this question in Microsoft forums..Here are some references though: https://www.sqlservercentral.com/Forums/Topic1383362-2799-1.aspx https://technet.microsoft.com/en-us/library/ms170438(v=sql.110).aspx
... View more
09-13-2017
06:13 AM
1 Kudo
Closest supported version is CentOS7.2 and you should refer below documentation for the same. https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.2.0/bk_ambari-installation/content/ch_Getting_Ready.html
... View more
09-13-2017
06:04 AM
There is no specific Architect certification or exam currently. Below are available certification exams: https://hortonworks.com/services/training/certification/
... View more
07-11-2017
07:03 PM
Muni, yes you can shut down phoenix while HBase will still be running. On other note, as Tim mentioned below, Pheonix is really good and doesn't consume much resources, try exploring it.
... View more
07-08-2017
03:23 AM
Hello @Marian Canciu Can you try below link. https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004109
... View more
07-08-2017
03:21 AM
@Rishit shah Can you open a new question for this issue? This will help community with better search on issues.
... View more
07-08-2017
03:14 AM
@Sami Ahmad check this https://pig.apache.org/docs/r0.8.1/piglatin_ref2.html#REGEX_EXTRACT
... View more
06-27-2017
06:31 PM
1 Kudo
@Sami Ahmad Datanode directories is where HDFS data is actually stored while Namenode Directory is used for storing Namenode Metadata i.e. file, block information etc. associated with HDFS data stored on Datanodes. To add additional space to HDFS with additional disks, you should add disk to Datanode Directories i.e. if you add u02 to server(s), add the same to Datanode Directories. DataNode directories=/u01/hadoop1/hdfs/data,/u02/hadoop1/hdfs/data
... View more
06-27-2017
06:27 PM
1 Kudo
Well there are many disadvantages of using replication factor 1 and we strongly do not recommend it for below reasons: 1. Data loss --> One or more datanode or disk failure will result in data loss. 2. Performance issues --> Having replication factor of more than 1 results in more parallelization. 3. Handling Failure --> With replication factor > 1, one or more Datanode doesn't result in job failure.
... View more
05-09-2017
01:33 AM
To troubleshoot Ambari, use ambari-server log and not ambari-agent log. Default location is /var/log/ambari-server/ambari-server.log
... View more
03-25-2017
06:19 PM
Yes, there is always performance overhead of using upper/lower function as that will be called for each row.
... View more
03-25-2017
06:10 PM
You can download mysql rpm and then use yum for install. https://www.linode.com/docs/databases/mysql/how-to-install-mysql-on-centos-7
... View more
11-04-2016
08:02 PM
1 Kudo
It seems you are trying to install Nifi/HDF using Ambari which is managing HDP cluster. At present, HDF cannot be installed using same Ambari which is already managing HDP cluster. You will have to install HDF using dedicated Ambari. Refer below: https://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.1/bk_ambari-installation/content/ch_getting_ready.html
... View more
11-03-2016
06:09 PM
Glad that it got resolved.
... View more
11-02-2016
06:08 PM
2 Kudos
To setup an Ambari managed HDP cluster, below services’ components require database for their respective metastore databases.
Ambari Server Hive Oozie Ranger
While a lab/sandbox environment can be setup with default databases for these components, using the same is strongly not recommended for Dev/QA/UAT/Production clusters. Due diligence and planning must be done to ensure that database selection is appropriate for enterprise standard production cluster. Below are key areas to be taken into consideration while planning to select a database for Ambari and HDP components. Supported Databases Refer below for supported databases for Ambari and different HDP components in current state. https://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-installation/content/database_requirements.html High Availability Support It is strongly recommended to setup High Availability for different HDP components, which can be enabled for High Availability in a production cluster. High Availability is supported by different HDP components as below: https://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-user-guide/content/ch_managing_service_high_availability.html The relational database that backs the Hive Metastore, Ambari Server, Oozie Server etc. itself should also be made highly available using best practices defined for the database system in use for HDP services and Ambari Server to be truly Highly Available and not to have database as single point of failure for the service. Therefore, it is important to select relational database, which supports highly availability, and it should be discussed with in-house DBA when planning for a new database or to use an existing in-house database for HDP deployment. Cost of licensing and support HDP support subscription doesn’t cover any licensing and support for databases being used for Ambari Server and HDP components i.e. Hive Metastore and would incur additional licensing and support cost. Therefore, cost of licensing and support should be considered as an important factor for selecting appropriate database for Ambari Server and HDP stack. Note: Contact in-house database team/database vendor for details on cost for licensing and support for databases. Database maintenance and management Database to be used for Ambari Server and HDP components would need maintenance and management which can be quite frequent/regular for database backup, HA setup and recovery etc. Therefore, while selecting a database for Ambari/HDP, it must be ensured that your organization has in-house skilled people/DBAs available to perform these activities. It is not a good practice to use different relational databases for different components i.e. Postgres for Ambari, MySQL for Hive etc. to avoid complexity for management and maintenance of these different databases. It is recommended to pick relational database of your choice and use the same i.e. MySQL for all components or Postgres for all components and so on.
... View more
- Find more articles tagged with:
- Ambari
- Database
- Deployment
- Design & Architecture
- FAQ
- hdp-2.5.0
06-15-2016
09:58 PM
As discussed with @avoma, issue is resolved, it was due to incorrect entries in /etc/hosts file.
... View more
06-15-2016
07:55 PM
2 Kudos
@avoma Are you able to do hadoop fs -ls to both clusters from HDP2.4.x cluster? Try hadoop fs -ls hdfs://source_cluster_Active_NN:8020/tmp hadoop fs -ls hdfs://destination_cluster_Active_NN:8020/tmp Also, check if host is reachable from destination cluster host. And if above works, then try running Distcp with other user than hdfs. Try below: hadoop distcp -strategy dynamic -prgbup \ -<overwrite/update> \ hdfs://source_cluster_Active_NN:8020/<test_file_path> \ hdfs://destination_cluster_Active_NN:8020/<Test_file_path>
... View more
05-26-2016
06:03 AM
3 Kudos
@Timothy Spann Here is a quick and dirty way to do it. Had some time, so tried @Ravi Mutyala's suggestion and it works :). hive> create table test_orc_sqoop (name varchar(20)) ;
OK
Time taken: 0.729 seconds
hive> select * from test_orc_sqoop;
OK
Time taken: 0.248 seconds
hive> exit;
[hdfs@test-mon-wmt ~]$ sqoop import --connect jdbc:mysql://mon-WMT-upgrade.cloud.hortonworks.com/test --username test --password hadoop --table test --hcatalog-database default --hcatalog-table test_orc_sqoop --hcatalog-storage-stanza "stored as orcfile" -m 1
Warning: /usr/hdp/2.4.2.0-258/hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/hdp/2.4.2.0-258/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/05/26 05:56:03 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258
16/05/26 05:56:03 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/05/26 05:56:04 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
16/05/26 05:56:04 INFO tool.CodeGenTool: Beginning code generation
16/05/26 05:56:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1
16/05/26 05:56:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1
16/05/26 05:56:05 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce
Note: /tmp/sqoop-hdfs/compile/64f04ad998cebf113bf8ec1efdbf6b95/test.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/05/26 05:56:10 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/64f04ad998cebf113bf8ec1efdbf6b95/test.jar
16/05/26 05:56:10 WARN manager.MySQLManager: It looks like you are importing from mysql.
16/05/26 05:56:10 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
16/05/26 05:56:10 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
16/05/26 05:56:10 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
16/05/26 05:56:10 INFO mapreduce.ImportJobBase: Beginning import of test
16/05/26 05:56:11 INFO hcat.SqoopHCatUtilities: Configuring HCatalog for import job
16/05/26 05:56:11 INFO hcat.SqoopHCatUtilities: Configuring HCatalog specific details for job
16/05/26 05:56:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1
16/05/26 05:56:11 INFO hcat.SqoopHCatUtilities: Database column names projected : [name]
16/05/26 05:56:11 INFO hcat.SqoopHCatUtilities: Database column name - info map :
name : [Type : 12,Precision : 20,Scale : 0]
16/05/26 05:56:12 INFO hive.metastore: Trying to connect to metastore with URI thrift://rm-wmt-upgrade.cloud.hortonworks.com:9083
16/05/26 05:56:12 INFO hive.metastore: Connected to metastore.
16/05/26 05:56:14 INFO hcat.SqoopHCatUtilities: HCatalog full table schema fields = [name]
16/05/26 05:56:18 INFO hcat.SqoopHCatUtilities: HCatalog table partitioning key fields = []
16/05/26 05:56:18 INFO hcat.SqoopHCatUtilities: HCatalog projected schema fields = [name]
16/05/26 05:56:18 INFO hcat.SqoopHCatUtilities: HCatalog job : Hive Home = /usr/hdp/current/hive-client
16/05/26 05:56:18 INFO hcat.SqoopHCatUtilities: HCatalog job: HCatalog Home = /usr/hdp/2.4.2.0-258//sqoop/../hive-hcatalog
16/05/26 05:56:18 INFO hcat.SqoopHCatUtilities: Adding jar files under /usr/hdp/2.4.2.0-258//sqoop/../hive-hcatalog/share/hcatalog to distributed cache
..............
...........
16/05/26 05:56:18 INFO hcat.SqoopHCatUtilities: Adding jar files under /usr/hdp/2.4.2.0-258//sqoop/../hive-hcatalog/share/hcatalog/storage-handlers to distributed cache (recursively)
16/05/26 05:56:18 WARN hcat.SqoopHCatUtilities: No files under /usr/hdp/2.4.2.0-258/sqoop/../hive-hcatalog/share/hcatalog/storage-handlers to add to distributed cache for hcatalog job
16/05/26 05:56:18 INFO hcat.SqoopHCatUtilities: Validating dynamic partition keys
16/05/26 05:56:18 WARN hcat.SqoopHCatUtilities: The HCatalog field name has type varchar(20). Expected = varchar based on database column type : VARCHAR
16/05/26 05:56:18 WARN hcat.SqoopHCatUtilities: The Sqoop job can fail if types are not assignment compatible
16/05/26 05:56:18 INFO mapreduce.DataDrivenImportJob: Configuring mapper for HCatalog import job
16/05/26 05:56:20 INFO impl.TimelineClientImpl: Timeline service address: http://rm-wmt-upgrade.cloud.hortonworks.com:8188/ws/v1/timeline/
16/05/26 05:56:21 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
16/05/26 05:57:07 INFO db.DBInputFormat: Using read commited transaction isolation
16/05/26 05:57:07 INFO mapreduce.JobSubmitter: number of splits:1
16/05/26 05:57:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1464036504999_0008
16/05/26 05:57:11 INFO impl.YarnClientImpl: Submitted application application_1464036504999_0008
16/05/26 05:57:11 INFO mapreduce.Job: The url to track the job: http://nn1-wmt-upgrade.cloud.hortonworks.com:8088/proxy/application_1464036504999_0008/
16/05/26 05:57:11 INFO mapreduce.Job: Running job: job_1464036504999_0008
16/05/26 05:57:56 INFO mapreduce.Job: Job job_1464036504999_0008 running in uber mode : false
16/05/26 05:57:56 INFO mapreduce.Job: map 0% reduce 0%
16/05/26 05:58:25 INFO mapreduce.Job: map 100% reduce 0%
16/05/26 05:58:27 INFO mapreduce.Job: Job job_1464036504999_0008 completed successfully
16/05/26 05:58:27 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=302189
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=22
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=17899
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=17899
Total vcore-seconds taken by all map tasks=17899
Total megabyte-seconds taken by all map tasks=12207118
Map-Reduce Framework
Map input records=3
Map output records=3
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=189
CPU time spent (ms)=2610
Physical memory (bytes) snapshot=172310528
Virtual memory (bytes) snapshot=2470862848
Total committed heap usage (bytes)=58130432
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
16/05/26 05:58:27 INFO mapreduce.ImportJobBase: Transferred 22 bytes in 129.1108 seconds (0.1704 bytes/sec)
16/05/26 05:58:27 INFO mapreduce.ImportJobBase: Retrieved 3 records.
[hdfs@test-mon-wmt ~]$ hive
WARNING: Use "yarn jar" to launch YARN applications.
Logging initialized using configuration in file:/etc/hive/2.4.2.0-258/0/hive-log4j.properties
hive> select * from test_orc_sqoop;
OK
abcd
abcdef
abcdefghi
Time taken: 4.31 seconds, Fetched: 3 row(s)
... View more
05-26-2016
05:59 AM
1 Kudo
I agree with @Ravi Mutyala.. So unaccepted the answer to get right answer in place..
... View more
05-26-2016
12:01 AM
Glad that its resolved :).
... View more
05-25-2016
03:42 PM
1 Kudo
@Radhakrishnan Rk Refer to below article I had posted earlier to migrate Hue from SQLite to MySQL. https://community.hortonworks.com/questions/399/how-to-move-hue-database-from-default-sqlite-datab.html
... View more
05-24-2016
06:21 PM
@chennuri gouri shankar Is Ambari Metrics running in embedded mode or distributed mode?
... View more
05-24-2016
04:43 PM
Yes, I think you can delete if you don't want those.
... View more
05-24-2016
04:31 PM
3 Kudos
@Smart Solutions You can restrict groups to be synced using Group search filter. Refer below for detail. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_Ranger_Install_Guide/content/ranger_user_sync_ldap_ad.html And other option would be to use Ranger FileSource. https://cwiki.apache.org/confluence/display/RANGER/File+Source+User+Group+Sync+process
... View more
05-24-2016
04:26 PM
3 Kudos
@Smart Solutions There are two aspects - cluster size and load as explained below. 1. 3 zookeepers on decent H/W machines for cluster size upto 300-500 nodes is fine assuming that you don't have Kafka and Storm too as having Kafka, Storm, HDFS, Hive, HBase, YARN all depend on one quorum of 3 zookeepers can be really heavy for one quorum. So, in your case for 40 node cluster size, 3 zookeepers should be good. But if you have all these components and heavy lifting being done in Kafka/Storm as well, then go with 2 quorums of 3 Zookeepers each i.e. 1 quorum of 3ZKs for HDFS, YARN, HBase, Hive and other quorum of 3 ZKs for Storm and Kafka. 2. Now in terms of increasing number of Zookeeper Servers in one quorum, while it is recommended to keep 3 ZKs as minimal and you are good there but if your cluster size increases to 500+ nodes, you should increase number of ZKs in one quorum to 5 or 7 depending on cluster size. Organizations running large clusters i.e. 1000-2500 nodes have 5 to 7 Zookeeper Servers per quorum in production.
... View more
05-24-2016
12:39 PM
Try starting/stoping services services from Ambari UI. screen-shot-2016-05-24-at-73650-am.png And then go to link to see the commands Ambari executes. screen-shot-2016-05-24-at-73759-am.png
... View more
05-23-2016
11:58 PM
Thanks for posting the answer!!
... View more