About gnovak

gnovak · ‎07-11-2017

@swathi thukkaraju You can use the dict.update() method: read the default config first to a dictionary, then read your config and call default.update(yours). But keep an eye on the fact that your two configs are not compatible: in your config "path_details" and "db_details" are the children of "parser_config", while in the default config they are on the same level. If we assume that the original config is correct, then you should update swathi_configure.json like this: [{ "parser_config": { "vista": { "Parser_info": { "module_name": "A", "class_name": "a", "standardformat1": { "transaction": "filename", "terminal": "filenme2", "session": "filename3" } } } }, "path_details": { "parent_path": "wasbs://XXXX@XXXXXstorage.blob.core.windows.net/tenantShortName/" }, "db_details": { "datawarehouse_url": "", "datawarehouse_username": "", "datawarehouse_password": "" } }] With these two files the following code does what you want: import json from pprint import pprint with open('config.json') as default_file, \ open('swathi_configure.json') as current_file: # [0]: take the first and only item from your list # If you have more items, use a for loop default = json.load(default_file)[0] current = json.load(current_file)[0] default.update(current) # default is your merged config now pprint(default)

gnovak · ‎07-05-2017

@pavan p What kind of jobs exactly are you looking for? For example you can find long running YARN applications on the ResourceManager UI: select running applications (<RM address>:8088/cluster/apps/RUNNING) and sort by StartTime.

gnovak · ‎07-04-2017

Hi, jq can be found in most Linux distributions. If you want to use basic unix commands, maybe try date -d $((1497691710912 / 1000)) Or maybe you can use python, it's also part of every distributions. import json from datetime import datetime def timestamp_to_str(timestamp): return datetime.fromtimestamp(timestamp / 1000).strftime('%Y-%m-%d') def search(timestamp): with open('a') as f: data = json.loads(f.read()) for cluster in data: cluster['original_timestamp'] = timestamp_to_str(cluster['original_timestamp']) if cluster['original_timestamp'] == timestamp: yield cluster

gnovak · ‎07-03-2017

@Anurag Mishra I recommend jq for this. Assuming you have a list of the JSON objects you pasted like this: [{"cluster_name":...}, {"cluster_name": ...}] you can use this command to convert the original_timestamps to date: jq '.[].original_timestamp |= (. / 1000 | strftime("%Y-%m-%d"))' your.json To filter by original timestamp you can add this select to the query: jq '.[].original_timestamp |= (. / 1000 | strftime("%Y-%m-%d")) | map(select(.original_timestamp == "<<YOUR FILTER DATE>>"))' your.json for example: jq '.[].original_timestamp |= (. / 1000 | strftime("%Y-%m-%d")) | map(select(.original_timestamp == "2017-06-17"))' your.json

gnovak · ‎07-03-2017

@Triffids G The dfsadmin report is not relevant in this case, the "No space left on device" concerns the NameNode, not the DataNodes. Check "dfs.namenode.name.dir", I'm pretty sure that it points to a volume that is in fact full. Note that you can use comma separated paths, so I'd suggest to add a directory from the newly added partition too and restart the NameNode.

gnovak · ‎06-20-2017

@btandel It's not a must, usually the default (FIFO) scheduling policy works fine, because in a usual use case you need to be "fair" (in a sense) among the queues, not within one queue. But if you need equal resource sharing within one queue, it does make perfect sense.

gnovak · ‎06-20-2017

@btandel 1) Minimum user limit percentage, this is the definition from the documentation, I think this is as clear as it gets: "Each queue enforces a limit on the percentage of resources allocated to a user at any given time, if there is demand for resources. The user limit can vary between a minimum and maximum value. The former (the minimum value) is set to this property value and the latter (the maximum value) depends on the number of users who have submitted applications. For e.g., suppose the value of this property is 25. If two users have submitted applications to a queue, no single user can use more than 50% of the queue resources. If a third user submits an application, no single user can use more than 33% of the queue resources. With 4 or more users, no user can use more than 25% of the queues resources. A value of 100 implies no user limits are imposed. The default is 100. Value is specified as a integer." 2) Fair ordering policy. Check this documentation: Using Flexible Scheduling Policies The two both concerns a single queue's scheduling policy: minimum-user-limit-percentage defines how the queue's resources are distributed among users and the ordering policy defines in which order the submitted jobs will be executed. If the minimum user limit is 100%, it means that there are no actual limits in place, so the fair ordering policy will do its best to give all the jobs "fair" amount of resources. 3) "if i set the Minimum user limit to 50 % and, user1 job is utilizing 100 % of cluster resource, then user2 submit job who requires 20 % of cluster resource then will the resource get distributed as 80% and 20% or will it be 50% - 50%" It will be 80-20%, because user2 doesn't need any more resources. If they needed let's say 60%, or more, than the distribution would be 50-50%.

gnovak · ‎06-19-2017

@Marcus Aidley You can run Spark in local mode against a kerberized cluster. Here are some configuration values to check: In spark-defaults.conf (in an HDP cluster: /etc/spark/conf/spark-defaults.conf); spark.history.kerberos.enabled true spark.history.kerberos.keytab /your/path/to/spark.headless.keytab spark.history.kerberos.principal your-principal@YOUR.DOMAIN In spark-env.sh make sure you have export HADOOP_CONF_DIR=/your/path/to/hadoop/conf In core-site.xml hadoop.security.authentication: kerberos

gnovak · ‎06-06-2017

@Xiong Duan I'm afraid, as of now, there is no other way to remove dead/decommissioned datanodes from the WebUI (NameNode state) than restarting the NameNode.

gnovak · ‎06-01-2017

You need to set yarn.scheduler.capacity.queue-mappings-override.enable to true, if you want to override the setting from mapred-site.xml (queue 1) with your default mapping (queue 2).

Online	Offline
Last Visited	‎08-26-2019 03:26 AM

Member Since	‎03-11-2016 12:20 PM
Last Visited	‎08-26-2019 03:26 AM
Posts	73
Kudos received	16

Cloudera Community

Re: NiFi ConvertRecord StringIndexOutOfBoundsExcep...

Re: Move file from one HDFS directoy to another us...

Re: how to get data from Yarn Resource Manager RES...

Re: Yarn queue has fair ordering policy only on le...

Re: YARN - Is there a metric in RM for Number of C...

Re: python how to call configuration file controll...

Re: How can we Find long running jobs in hadoop cl...

Re: convert timestamp to date format

Re: convert timestamp to date format

Re: No space left on device VS hdfs dfsadmin -repo...

Re: YARN scheduler regarding intra queue resource ...

Re: YARN scheduler regarding intra queue resource ...

Re: Is it possible run Spark in local mode against...

Re: Hi,I want to remove hdfs dead datanode in WebU...

Re: Yarn queue Mapping