About kaliyugantagoni

kaliyugantagoni · ‎06-23-2016

Accumulo(as always)gave a nasty surprise, was unable to log-in the shell using the credentials I provided during the installation, the tracer failed to start. As per this thread, I kept a simple text password and then executed the commands you provided

bpreachuk · ‎06-21-2016

Hi @Kaliyug Antagonist. The answers above from @slachterman and @mqureshi are excellent. Here is another way (at a higher-level) to look at this problem. Here are some tips to plan out a DR strategy for the smoldering datacenter problem that is mentioned above. 1. Use the term Disaster Recovery instead of Backup. This gets the administrators to move away from the RDBMS-like idea that they can simply run a script and recover the entire cluster. 2. Discuss RTO/RPO and let the business answers drive the architecture. RTO and RPO requirements need to be defined by the Business - these requirements drive all decisions around Disaster recovery A 1-hour/1-hour RTO/RPO is wildly different (cost and architecture) from a 2-week/1-day RTO/RPO. When they choose the RTO/RPO requirements they are also choosing the required cost & architecture. By having well-defined RTO/RPO requirements you will avoid having an over-engineered solution (which may be far too expensive) and will also avoid having an under-engineered solution (which may fail precisely when you need it most - during a Disaster event) 3.'Band’ your data assets into different categories for RTO/RPO purposes. Example: Band 1 = 1 hour RTO. Band 2 = 1 day RTO. Band 3 = 1 week RTO, Band 4 = 1 month RTO, Band 5 = Not required in the event of a disaster You would be surprised how much data can wait in the event of a SEVERE crash. For example, datasets that are used to provide a report that is distributed once per month - they should never require a 1-hour RTO. Hope that helps.

pminovic · ‎06-20-2016

Option 1: reformat: you will need not only to "copyFromLocal" but also recreate the file system. See for example this for details. Option 2: Exit safe mode and find out where you are. I'd recommend this one. You can also find out what caused the trouble, maybe all corrupted blocks are on a bad disk or something like that. You can share the list of files you are uncertain whether to restore them or not.

vpoornalingam · ‎06-13-2016

@Kaliyug Antagonist Setup a Local Repository in this scenario. Refer to Ambari Install guide for the same.

chennuri_gouris · ‎06-07-2016

@Kaliyug Antagonist No this does not qualify for 'Temporary Access to Internet' case in the Hortonworks doc . We have to download the required packages and then install them

emaxwell · ‎06-07-2016

@Kaliyug Antagonist The permissions that are given to the ambari agent user include those required to create service accounts, install packages, and start/stop all services, run commands as the service accounts, etc. Once the sudo rules are in place, you can install, start, stop, etc., all of the various services in the HDP stack.

skumpf · ‎06-01-2016

You are correct, use LVM for OS disks, but not data disks. In the end, the filesystem choice doesn't make a huge difference. ext4 everywhere would simply the overall design and allow for the ability to resize filesystems online in the future. Allocating a larger amount of storage to the OS filesystems does simplify the install. Otherwise, during the Ambari install wizard, you need to go through each of the service's configurations and change "/var/log" to one of the data disk mount points (i.e. /opt/dev/sdb as an example above). If you allocated more storage to the OS (and subsequently made /usr say 30GB and /var/log 200GB), you would not have to change as much during the Ambari install. Either approach is viable, so I would suggest discussing with your OS admin team to see if they have a preference. Also note that I'm referring to daemon logs (namenode, resource manager, etc) that end up in /var/log, versus application logs. The yarn settings you show above are for the yarn application logs and local scratch space. You want to follow that same pattern in production.

tsaito · ‎05-27-2016

I see you are using constant value for partition column. You might be hitting this issue: https://issues.apache.org/jira/browse/HIVE-12893

bleonhardi · ‎05-29-2016

@Kaliyug Antagonist "Does this mean that I have to explicitly set the no. of reducers on the Hive prompt ? Is it mandatory for the CORRECT insertion of data ? Its not mandatory for the correct insertion but for the performance. If you have a hundred you have a hundred files and the smapis divided between them ( all values for one ending up in the same file ) if you have 10 you will have ten files. So there is a direct correlation with load speed ( and to a lesser extent query performance as well and yeah buckets might be your better bet "Unfortunately, there is only one where condition(where smapiname_ver ='dist_1'), so I am left only with one column on which partitioning is already considered." So once you use buckets you don't use distribute by anymore its either or sort you specify it in the table definition https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-BucketedSortedTables see how they specify the sorted by keyword in the table definition? If you then load data into it you hive will do the distribute/sort stiuff itself.

bleonhardi · ‎05-24-2016

Alternatively use kerberos and kerberize the HDFS UI. In this case only SPNEGO enabled browsers will be able to access the ui and you will have the same filesystem access restrictions as users have when directly accessing hdfs.

Online	Offline
Last Visited	‎03-18-2020 10:21 AM

Member Since	‎04-11-2016 02:31 PM
Last Visited	‎03-18-2020 10:21 AM
Posts	174
Kudos received	29

Cloudera Community

Re: NiFi custom processor custom log, logging in t...

Re: Separate log file for custom processor

Re: Sqoop imported more records than source

Re: Sqoop import to HCatalog/Hive - table not visi...

Re: HDFS Space not reclaimed

Re: Accumulo under-replication issue

Re: Cluster 'back-up' - does it make sense ?

Re: Single block created for file, now showing mis...

Re: HDP installation via Ambari - a doubt about th...

Re: HDP 2.4 auto.install - which Internet sites ar...

Re: HPD 2.4 auto. install(using Ambari) - does a '...

Re: HDP 2.4 installation on prod. cluster - filesy...

Re: Hive INSERT failing for a large table

Re: Part-2 : Join involving 24 billion X 1 to 8 mi...

Re: Quickly secure the access to the cluster via h...