Member since
09-29-2015
286
Posts
601
Kudos Received
60
Solutions
04-13-2021
02:22 AM
Hello, Do you have any test to be sure the cluster is working fine after all those steps ? I move one node ever 3. I put some files on the hdfs and i don't see file system in the dfs.journalnode.edits.dir even whether for the old or new Journal Node. Best Regards Abdou
... View more
12-17-2019
07:48 AM
Hi All . here is more Details about above :- https://community.cloudera.com/t5/Support-Questions/HDInsight-Vs-HDP-Service-on-Azure-Vs-HDP-on-Azure-IaaS/m-p/166424 Thanks HadoopHelp
... View more
12-17-2018
07:59 PM
Thanks for this article. A followup question on the Ephemeral storage consideration for HDFS. We use Hortonworks cluster nodes(m4.4x large). Is it recommended to have a cluster of 10 data nodes out of which 5 are ephemerals and 5 EBS backed instances?. Aassume, 5 ephemeral nodes backed up to s3. What are the pros and cons would be, especially the data loss when there is 1.Crash of few ephemeral or EBS backed instances 2. AWS outage at this az 3. Region outage(DR plan in another region with S3 cross-region replication?).
... View more
04-13-2018
04:35 PM
@Ancil McBarnett Thanks for letting me know. They are back online now.
... View more
02-10-2017
05:21 AM
This was awesome Tim
... View more
01-19-2017
04:10 PM
3 Kudos
HDB 2.1.1 Reference: http://hdb.docs.pivotal.io/211 http://hdb.docs.pivotal.io/211/hdb/releasenotes/HAWQ211ReleaseNotes.html http://hdb.docs.pivotal.io/211/hdb/install/install-ambari.html Download HDB from Hortonworks at http://hortonworks.com/downloads/ or directly from Pivotal at https://network.pivotal.io/products/pivotal-hdb (You need to create a pivotal account) What to look out for
If you use only only 1 Master Node, you cannot have a Hawq Master and Standby If I install Hawq Master on Same node with Ambari need to change PostGres Port from 5432 on Install Prep
Ensure that httpd is installed yum install httpd
sudo service httpd status
sudo service httpd start Get and Install repo
Log onto Pivotal and download hdb-2.1.1.0-7.tar /* On Ambari Node */
1. mkdir /staging
2. chmod a+rx /staging
3. scp -i <<your key>> -o 'StrictHostKeyChecking=no' hdb-2.1.1.0-7.tar root@<<ambarinode>>:~/staging
4. tar -zxvf hdb-2.1.1.0-7.tarcd /staging/hdb-2.1.1.0./setup_repo.sh
/* You should see the message “hdb-2.1.1.0 Repo file successfully created at /etc/yum.repos.d/hdb-2.1.1.0.repo. */
5. yum install -y hawq-ambari-plugin
6. cd /var/lib/hawq
7. ./add-hawq.py --user admin --password admin --stack HDP-2.5
/* if the repo is in the same node as Ambari else pint to where the repo lives*/
./add-hawq.py --user <admin-username> --password <admin-password> --stack HDP-2.5 --hawqrepo <hdb-2.1.x-url> --addonsrepo <hdb-add-ons-2.1.x-url>
8. ambari-server restart Configurations during Install with Ambari
Set VM overcommit to 0 if you plan to use Hive and/or LLAP also on the same cluster; Don’t follow Pivotal docs to set this to 2 ele your Hive processes will have memory issues. Advanced hdfs-site Property Setting dfs.allow.truncate true dfs.block.access.token.enable false for an unsecured HDFS cluster, or true for a secure cluster dfs.block.local-path-access.user gpadmin dfs.client.read.shortcircuit true dfs.client.socket-timeout*** 300000000 dfs.client.use.legacy.blockreader.local false dfs.datanode.handler.count 60 dfs.datanode.socket.write.timeout*** 7200000 dfs.namenode.handler.count 600 dfs.support.append true
Advanced core-site Property Setting ipc.client.connection.maxidletime** 3600000 ipc.client.connect.timeout** 300000 ipc.server.listen.queue.size 3300 Some HAWQ Resources
Date Type Formating Functions: https://www.postgresql.org/docs/8.2/static/functions-formatting.html Date Time Functions: https://www.postgresql.org/docs/8.2/static/functions-datetime.html Hawq Date Functions: http://tapoueh.org/blog/2013/08/20-Window-Functions HAWQ is better with dates; can automatically handle ’08/01/2016’ and ’01-Aug-2016’ PostGreSQL Cheat Sheet Commands: http://www.postgresonline.com/downloads/special_feature/postgresql83_psql_cheatsheet.pdf System Catalog Tables: http://hdb.docs.pivotal.io/131/docs-hawq-shared/ref_guide/system_catalogs/catalog_ref-tables.html HAWQ Toolkit
Make sure and make use of the Hawq Toolkit: http://hdb.docs.pivotal.io/211/hawq/reference/toolkit/hawq_toolkit.html How to find the data files for specific tables: https://discuss.pivotal.io/hc/en-us/articles/204072646-Pivotal-HAWQ-find-data-files-for-specific-tables Size of table on Disk: select * from hawq_toolkit.hawq_size_of_table_disk; How to find the Size of Database: select sodddatname, sodddatsize/(1024*1024) from hawq_toolkit.hawq_size_of_database; How to find the Size of Partitioned Tables: select * hawq_toolkit.hawq_size_of_partition_and_indexes_disk Tip to find how many segments for a Hawq Table SELECT gp_segment_id, COUNT(*)
FROM <<table>>
GROUP BY gp_segment_id
ORDER BY 1; Creating Tables <<TBD>
Make SURE AFTER YOU CREATE THE TABLE ANALYZE: As an Example: vacuum analyze device.priority_counter_hist_rand; Loading Data to Tables <<TBD> Potential HAWQ Errors Too many open files in system To fix this check the value for fs.file-max in /etc/sysctl.conf. If configured a value that is lower than the total # of open files for the entire system at a given point (lsof | wc -l) then we would have increase this. To increase this value follow the below steps
Open Files: lsof | wc -l ulimit -a | grep open Edit the following line in the /etc/sysctl.conf file: fs.file-max = value #value is the new file descriptor limit that you want to set. Apply the change by running the following command:# /sbin/sysctl -p We can disable over-commit temporarily: echo 0 > /proc/sys/vm/overcommit_memory For permanent solution:
Add vm.overcommit_memory = 0 in /etc/sysctl.conf #fs.file-max=65536 fs.file-max=2900000 #Added for Hortonworks HDB kernel.threads-max=798720 vm.overcommit_memory=0
... View more
11-11-2016
06:18 PM
5 Kudos
Here are the Requirements: Total Data Size - Uncompressed: 13.5TB; Compressed: 2 TB Large Virtual Fact Table, View containing a Union All of 3 Large Tables, 11 Billion Records in Total Size Another view taking the large virtual fact table, with consecutive Left Out Joins on 8 Dimension Tables, so that no matter what 11 Billion records is always the result. There is timestamp data that you can use to filter rows by. Suppose you were given the following. How would you begin configuring Hortonworks for Hive? Would you focus on storage? How can we configure for compute? Lets assume: Platform: AWS Data Node Instance: r3_4xlarge Cores: 16 RAM: 122 GB EBS Storage: 2 x 1TB Disks So where do we begin? First Some Quick Calculations: Memory per Core: 122GB/16 = 7.625; Approximate 8 GB per CPU Core This means our largest Container Size PER Node per core is 8 GB
However we should not reserve all 16 Cores to Hadoop. Some Cores are need for OS and other processes. Let's Assume 14 Cores is reserved for YARN. Memory Allocated for All YARN containers on a node = No. of Virtual Cores x Memory Per Core
114688 MB = 14 * 8192 MB (8 *1024)
Note Also At 8 GB, we can run in parallel 14 Tasks (Mappers or Reducers), one per CPU, without wasting RAM. We can certainly run container sizes less than 8GB if we wish, Since our Optimal Container Size per Node is 8 GB, our Yarn Minimum Container Size must be a factor of 8GB to prevent wastage of memory, that is: 1,2,4,8 However Tez Container Size for Hive is a multiple of Yarn Minimum Container Size
Memory Settings YARN Hive TEZ Running Application Error
... View more
Labels:
02-07-2019
03:12 PM
I have tried with the following parms hive.tez.auto.reducer.parallelism=true; hive.tez.min.partition.factor=0.25 hive.tez.max.partition.factor=2.0 set hive.exec.reducers.bytes.per.reducer = 134217728; My output is of size 2.5 GB (2684354560 bytes) and based on the formula given above, i was expecting max(1,min(1099,2684354560/ 134217728))*2 = max(1,min(1099,20))*2 = max(1,20)*2 = 40 reducers. but my query was assigned only 5 reducers, i was curious why? and are there any other parameters that can reflect the no. of reducers. below is the query that i am using : truncate table target_tab ;
INSERT INTO TABLE target_tab
SELECT * FROM src_tab WHERE 1=1 ORDER BY a, b,c
... View more
04-26-2019
02:51 PM
Do you have latest recommendations? Most of our hadoop processing is on Hive/Tez and Spark.
... View more