Member since
04-11-2016
174
Posts
29
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3398 | 06-28-2017 12:24 PM | |
2572 | 06-09-2017 07:20 AM | |
7134 | 08-18-2016 11:39 AM | |
5319 | 08-12-2016 09:05 AM | |
5491 | 08-09-2016 09:24 AM |
06-23-2016
09:25 AM
Accumulo(as always)gave a nasty surprise, was unable to log-in the shell using the credentials I provided during the installation, the tracer failed to start. As per this thread, I kept a simple text password and then executed the commands you provided
... View more
06-21-2016
04:59 PM
2 Kudos
Hi @Kaliyug Antagonist. The answers above from @slachterman and @mqureshi are excellent. Here is another way (at a higher-level) to look at this problem. Here are some tips to plan out a DR strategy for the smoldering datacenter problem that is mentioned above. 1. Use the term Disaster Recovery instead of Backup. This gets the administrators to move away from the RDBMS-like idea that they can simply run a script and recover the entire cluster. 2. Discuss RTO/RPO and let the business answers drive the architecture. RTO and RPO requirements need to be defined by the
Business - these requirements drive all decisions around Disaster recovery A 1-hour/1-hour RTO/RPO is
wildly different (cost and architecture) from a 2-week/1-day RTO/RPO. When they choose the RTO/RPO requirements they are also choosing the required cost & architecture. By having well-defined RTO/RPO requirements you will avoid having an over-engineered solution (which may be far too expensive) and will also avoid having an under-engineered solution (which may fail precisely when you need it most - during a Disaster event) 3.'Band’ your data
assets into different categories for RTO/RPO purposes. Example: Band 1 = 1 hour RTO. Band 2 = 1 day RTO. Band 3 = 1 week RTO, Band
4 = 1 month RTO, Band 5 = Not required in the event of a
disaster You would be surprised how
much data can wait in the event of a SEVERE crash. For example, datasets that are used to provide a report that is distributed once per month - they should never require a 1-hour RTO. Hope that helps.
... View more
06-20-2016
09:55 AM
Option 1: reformat: you will need not only to "copyFromLocal" but also recreate the file system. See for example this for details. Option 2: Exit safe mode and find out where you are. I'd recommend this one. You can also find out what caused the trouble, maybe all corrupted blocks are on a bad disk or something like that. You can share the list of files you are uncertain whether to restore them or not.
... View more
06-13-2016
09:35 AM
@Kaliyug Antagonist Setup a Local Repository in this scenario. Refer to Ambari Install guide for the same.
... View more
06-07-2016
01:45 PM
2 Kudos
@Kaliyug Antagonist No this does not qualify for 'Temporary Access to Internet' case in the Hortonworks doc . We have to download the required packages and then install them
... View more
06-07-2016
05:26 PM
@Kaliyug Antagonist The permissions that are given to the ambari agent user include those required to create service accounts, install packages, and start/stop all services, run commands as the service accounts, etc. Once the sudo rules are in place, you can install, start, stop, etc., all of the various services in the HDP stack.
... View more
06-01-2016
06:15 PM
You are correct, use LVM for OS disks, but not data disks. In the end, the filesystem choice doesn't make a huge difference. ext4 everywhere would simply the overall design and allow for the ability to resize filesystems online in the future. Allocating a larger amount of storage to the OS filesystems does simplify the install. Otherwise, during the Ambari install wizard, you need to go through each of the service's configurations and change "/var/log" to one of the data disk mount points (i.e. /opt/dev/sdb as an example above). If you allocated more storage to the OS (and subsequently made /usr say 30GB and /var/log 200GB), you would not have to change as much during the Ambari install. Either approach is viable, so I would suggest discussing with your OS admin team to see if they have a preference. Also note that I'm referring to daemon logs (namenode, resource manager, etc) that end up in /var/log, versus application logs. The yarn settings you show above are for the yarn application logs and local scratch space. You want to follow that same pattern in production.
... View more
05-27-2016
12:38 AM
I see you are using constant value for partition column. You might be hitting this issue: https://issues.apache.org/jira/browse/HIVE-12893
... View more
05-29-2016
10:14 PM
@Kaliyug Antagonist "Does this mean that I have to explicitly set the no. of reducers on the Hive prompt ? Is it mandatory for the CORRECT insertion of data ? Its not mandatory for the correct insertion but for the performance. If you have a hundred you have a hundred files and the smapis divided between them ( all values for one ending up in the same file ) if you have 10 you will have ten files. So there is a direct correlation with load speed ( and to a lesser extent query performance as well and yeah buckets might be your better bet "Unfortunately, there is only one where condition(where smapiname_ver ='dist_1'), so I am left only with one column on which partitioning is already considered." So once you use buckets you don't use distribute by anymore its either or sort you specify it in the table definition https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-BucketedSortedTables see how they specify the sorted by keyword in the table definition? If you then load data into it you hive will do the distribute/sort stiuff itself.
... View more
05-24-2016
12:48 PM
1 Kudo
Alternatively use kerberos and kerberize the HDFS UI. In this case only SPNEGO enabled browsers will be able to access the ui and you will have the same filesystem access restrictions as users have when directly accessing hdfs.
... View more
- « Previous
- Next »