Member since
02-22-2017
4
Posts
0
Kudos Received
0
Solutions
06-27-2017
06:44 PM
5 Kudos
Hi @Gunjan Dhawas For point 2 - Also as you mentioned its nodemanager which will communicate with container, so can nodemanager directly communicate with containers which are running on different nodes or it will go through RM to get container information. Nodemanagers are basically YARN’s per-node agent, and takes care of the individual compute nodes in a Hadoop cluster. This includes keeping up-to date with the ResourceManager (RM), overseeing containers’ life-cycle management; monitoring resource usage (memory, CPU) of individual containers, tracking node-health, log’s management and auxiliary services which may be exploited by different YARN applications.
So Nodemanagers are the nodes on which containers are launched. So yes, nodemanagers directly monitors the containers and their resource consumption. For point 1 - "The application code executing within the container then provides necessary information (progress, status etc.) to its ApplicationMaster via an application-specific protocol.", so how the application master monitor the status of containers which are running on different node than applicatioMaster. Once the applicationMaster negotiates the resources with RM, it will launch the container by providing container launch specification to the NodeManager. The launch specification includes the necessary information to allow the container to communicate with the ApplicationMaster itself. Thus ApplicationMaster gets the progress/status via application-specific protocol provided in the container launch specification
... View more
02-23-2017
04:54 PM
1 Kudo
@Gunjan Dhawas based by the wiki, it will take 8 bytes. INT/INTEGER (4-byte signed integer, from -2,147,483,648 to 2,147,483,647)
BIGINT (8-byte signed integer, from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807) https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-IntegralTypes(TINYINT,SMALLINT,INT/INTEGER,BIGINT) and https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-TypeSystem
... View more
11-04-2017
12:19 PM
Hi @Jeff Watson. You are correct about SAS use of String datatypes. Good catch! One of my customers also had to deal with this. String datatype conversions can perform very poorly in SAS. With SAS/ACCESS to Hadoop you can set the libname option DBMAX_TEXT (added with SAS 9.4m1 release) to globally restrict the character length of all columns read into SAS. However for restricting column size SAS does specifically recommends using the VARCHAR datatype in Hive whenever possible. http://support.sas.com/documentation/cdl/en/acreldb/67473/HTML/default/viewer.htm#n1aqglg4ftdj04n1eyvh2l3367ql.htm Use Case
Large Table, All Columns of Type String: Table A stored in Hive has 40 columns, all of type String, with 500M rows. By default, SAS Access converts String to $32K. So, 32K in length for char. The math for this size table yields 1.2MB row length x 500M rows. This causes the system to come to a halt - Too large to store in LASR or WORK. The following techniques can be used to work around the challenge in SAS, and they all work:
Use char and varchar in Hive instead of String. Set the libname option DBMAX_TEXT to globally restrict the character length of all columns read in In Hive do "SET TBLPROPERTIES SASFMT" to add formats for SAS on schema in HIVE. Add formatting to SAS code during inbound reads
example: Sequence Length 8 Informat 10. format 10. I hope this helps.
... View more