Member since
03-11-2016
73
Posts
16
Kudos Received
16
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1047 | 08-21-2019 06:03 AM | |
34770 | 05-24-2018 07:55 AM | |
4544 | 04-25-2018 08:38 AM | |
6269 | 01-23-2018 09:41 AM | |
1986 | 10-11-2017 09:44 AM |
07-11-2017
12:36 PM
@swathi thukkaraju You can use the dict.update() method: read the default config first to a dictionary, then read your config and call default.update(yours). But keep an eye on the fact that your two configs are not compatible: in your config "path_details" and "db_details" are the children of "parser_config", while in the default config they are on the same level. If we assume that the original config is correct, then you should update swathi_configure.json like this: [{
"parser_config": {
"vista": {
"Parser_info": {
"module_name": "A",
"class_name": "a",
"standardformat1": {
"transaction": "filename",
"terminal": "filenme2",
"session": "filename3"
}
}
}
},
"path_details": {
"parent_path": "wasbs://XXXX@XXXXXstorage.blob.core.windows.net/tenantShortName/"
},
"db_details": {
"datawarehouse_url": "",
"datawarehouse_username": "",
"datawarehouse_password": ""
}
}]
With these two files the following code does what you want: import json
from pprint import pprint
with open('config.json') as default_file, \
open('swathi_configure.json') as current_file:
# [0]: take the first and only item from your list
# If you have more items, use a for loop
default = json.load(default_file)[0]
current = json.load(current_file)[0]
default.update(current)
# default is your merged config now
pprint(default)
... View more
07-05-2017
01:04 PM
1 Kudo
@pavan p What kind of jobs exactly are you looking for? For example you can find long running YARN applications on the ResourceManager UI: select running applications (<RM address>:8088/cluster/apps/RUNNING) and sort by StartTime.
... View more
07-04-2017
07:03 AM
Hi, jq can be found in most Linux distributions. If you want to use basic unix commands, maybe try date -d $((1497691710912 / 1000)) Or maybe you can use python, it's also part of every distributions. import json
from datetime import datetime
def timestamp_to_str(timestamp):
return datetime.fromtimestamp(timestamp / 1000).strftime('%Y-%m-%d')
def search(timestamp):
with open('a') as f:
data = json.loads(f.read())
for cluster in data:
cluster['original_timestamp'] = timestamp_to_str(cluster['original_timestamp'])
if cluster['original_timestamp'] == timestamp:
yield cluster
... View more
07-03-2017
10:57 AM
@Anurag Mishra I recommend jq for this. Assuming you have a list of the JSON objects you pasted like this: [{"cluster_name":...}, {"cluster_name": ...}] you can use this command to convert the original_timestamps to date: jq '.[].original_timestamp |= (. / 1000 | strftime("%Y-%m-%d"))' your.json
To filter by original timestamp you can add this select to the query: jq '.[].original_timestamp |= (. / 1000 | strftime("%Y-%m-%d")) | map(select(.original_timestamp == "<<YOUR FILTER DATE>>"))' your.json for example: jq '.[].original_timestamp |= (. / 1000 | strftime("%Y-%m-%d")) | map(select(.original_timestamp == "2017-06-17"))' your.json
... View more
07-03-2017
08:17 AM
@Triffids G The dfsadmin report is not relevant in this case, the "No space left on device" concerns the NameNode, not the DataNodes. Check "dfs.namenode.name.dir", I'm pretty sure that it points to a volume that is in fact full. Note that you can use comma separated paths, so I'd suggest to add a directory from the newly added partition too and restart the NameNode.
... View more
06-20-2017
01:29 PM
@btandel It's not a must, usually the default (FIFO) scheduling policy works fine, because in a usual use case you need to be "fair" (in a sense) among the queues, not within one queue. But if you need equal resource sharing within one queue, it does make perfect sense.
... View more
06-20-2017
08:53 AM
@btandel 1) Minimum user limit percentage, this is the definition from the documentation, I think this is as clear as it gets: "Each queue enforces a limit on the percentage of resources allocated to a user at any given time, if there is demand for resources. The user limit can vary between a minimum and maximum value. The former (the minimum value) is set to this property value and the latter (the maximum value) depends on the number of users who have submitted applications. For e.g., suppose the value of this property is 25. If two users have submitted applications to a queue, no single user can use more than 50% of the queue resources. If a third user submits an application, no single user can use more than 33% of the queue resources. With 4 or more users, no user can use more than 25% of the queues resources. A value of 100 implies no user limits are imposed. The default is 100. Value is specified as a integer." 2) Fair ordering policy. Check this documentation: Using Flexible Scheduling Policies The two both concerns a single queue's scheduling policy: minimum-user-limit-percentage defines how the queue's resources are distributed among users and the ordering policy defines in which order the submitted jobs will be executed. If the minimum user limit is 100%, it means that there are no actual limits in place, so the fair ordering policy will do its best to give all the jobs "fair" amount of resources. 3) "if i set the Minimum user limit to 50 % and, user1 job is utilizing 100 % of cluster resource, then user2 submit job who requires 20 % of cluster resource then will the resource get distributed as 80% and 20% or will it be 50% - 50%" It will be 80-20%, because user2 doesn't need any more resources. If they needed let's say 60%, or more, than the distribution would be 50-50%.
... View more
06-19-2017
06:51 AM
@Marcus Aidley You can run Spark in local mode against a kerberized cluster. Here are some configuration values to check: In spark-defaults.conf (in an HDP cluster: /etc/spark/conf/spark-defaults.conf); spark.history.kerberos.enabled true spark.history.kerberos.keytab /your/path/to/spark.headless.keytab spark.history.kerberos.principal your-principal@YOUR.DOMAIN In spark-env.sh make sure you have export HADOOP_CONF_DIR=/your/path/to/hadoop/conf In core-site.xml hadoop.security.authentication: kerberos
... View more
06-06-2017
01:06 PM
@Xiong Duan I'm afraid, as of now, there is no other way to remove dead/decommissioned datanodes from the WebUI (NameNode state) than restarting the NameNode.
... View more
06-01-2017
06:39 AM
You need to set yarn.scheduler.capacity.queue-mappings-override.enable to true, if you want to override the setting from mapred-site.xml (queue 1) with your default mapping (queue 2).
... View more
- « Previous
- Next »