Member since
05-05-2016
147
Posts
223
Kudos Received
18
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1729 | 12-28-2018 08:05 AM | |
1427 | 07-29-2016 08:01 AM | |
1069 | 07-29-2016 07:45 AM | |
3277 | 07-26-2016 11:25 AM | |
562 | 07-18-2016 06:29 AM |
12-28-2018
08:05 AM
OK, It seems that the issue with stop_all script in Ambari UI somehow catch up the Zeppelin too into its script. Now if i Start Zeppelin manually(which was not running earlier), i can use stop all services option. At least the error message is helpful.
... View more
12-28-2018
07:54 AM
Thanks @Akhil S Naik for reply. I think its not database inconsistency issue. But i have tried with your suggestion. Its not working. Even i cant see individual service stop drop-down at far right corner. Its just showing busy.
... View more
12-28-2018
07:05 AM
capture.pngThanks in advance for trying to help me. I am running HDP 3.0.1 VM and get an error whenever try to stop all services using ambari. Below is the error message:- Error message: java.lang.IllegalArgumentException: Invalid transition for servicecomponenthost, clusterName=Sandbox, clusterId=2, serviceName=ZEPPELIN, componentName=ZEPPELIN_MASTER, hostname=sandbox-hdp.hortonworks.com, currentState=STARTING, newDesiredState=INSTALLED I have attached the screen shot also for this issue. Thanks...
... View more
Labels:
04-24-2017
01:24 PM
Even after changing the database to 'mysql' or 'postgresql' have not start Druid Superset service successfully and error message is below:- Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/DRUID/0.9.2/package/scripts/superset.py", line 169, in <module>
Superset().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 314, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/DRUID/0.9.2/package/scripts/superset.py", line 108, in start
self.configure(env, upgrade_type=upgrade_type)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 117, in locking_configure
original_configure(obj, *args, **kw)
File "/var/lib/ambari-agent/cache/common-services/DRUID/0.9.2/package/scripts/superset.py", line 84, in configure
user=params.druid_user)
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'source /etc/superset/conf/superset-env.sh ; /usr/hdp/current/druid-superset/bin/superset db upgrade' returned 1. /usr/hdp/current/druid-superset/lib/python3.4/importlib/_bootstrap.py:1161: ExtDeprecationWarning: Importing flask.ext.sqlalchemy is deprecated, use flask_sqlalchemy instead.
spec.loader.load_module(spec.name)
/usr/hdp/current/druid-superset/lib/python3.4/importlib/_bootstrap.py:1161: ExtDeprecationWarning: Importing flask.ext.script is deprecated, use flask_script instead.
spec.loader.load_module(spec.name)
Loaded your LOCAL configuration
Traceback (most recent call last):
File "/usr/hdp/current/druid-superset/bin/superset", line 84, in <module>
from superset.cli import manager
File "/usr/hdp/current/druid-superset/lib/python3.4/site-packages/superset/__init__.py", line 36, in <module>
utils.pessimistic_connection_handling(db.engine.pool)
File "/usr/hdp/current/druid-superset/lib/python3.4/site-packages/flask_sqlalchemy/__init__.py", line 816, in engine
return self.get_engine(self.get_app())
File "/usr/hdp/current/druid-superset/lib/python3.4/site-packages/flask_sqlalchemy/__init__.py", line 833, in get_engine
return connector.get_engine()
File "/usr/hdp/current/druid-superset/lib/python3.4/site-packages/flask_sqlalchemy/__init__.py", line 493, in get_engine
info = make_url(uri)
File "/usr/hdp/current/druid-superset/lib/python3.4/site-packages/sqlalchemy/engine/url.py", line 194, in make_url
return _parse_rfc1738_args(name_or_url)
File "/usr/hdp/current/druid-superset/lib/python3.4/site-packages/sqlalchemy/engine/url.py", line 240, in _parse_rfc1738_args
return URL(name, **components)
File "/usr/hdp/current/druid-superset/lib/python3.4/site-packages/sqlalchemy/engine/url.py", line 60, in __init__
self.port = int(port)
ValueError: invalid literal for int() with base 10: ''
... View more
04-24-2017
01:24 PM
Thanks Devin, I have tried changing database for superset but having similar issue:- Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/DRUID/0.9.2/package/scripts/superset.py", line 169, in <module>
Superset().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 314, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/DRUID/0.9.2/package/scripts/superset.py", line 108, in start
self.configure(env, upgrade_type=upgrade_type)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 117, in locking_configure
original_configure(obj, *args, **kw)
File "/var/lib/ambari-agent/cache/common-services/DRUID/0.9.2/package/scripts/superset.py", line 84, in configure
user=params.druid_user)
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'source /etc/superset/conf/superset-env.sh ; /usr/hdp/current/druid-superset/bin/superset db upgrade' returned 1. Traceback (most recent call last):
File "/usr/hdp/current/druid-superset/bin/superset", line 10, in <module>
import gunicorn.app.base
ImportError: No module named gunicorn.app.base
... View more
04-24-2017
10:34 AM
No luck, please see above my reply to Deepesh! Thanks!!!
... View more
04-21-2017
10:34 AM
Team, i am tracking this error separately, see below. If anybody finds the solution please mark me. Thanks in Advance!!! https://www.linkedin.com/pulse/hdp-26-issue-druid-superset-start-fail-importerror-module-kumar
... View more
04-21-2017
07:42 AM
Me too unable to open given link for solution!!!!! https://community.hortonworks.com/questions/96619/druid-superset-wont-start-on-hdp-26.html Access Denied We're sorry, but you do not have permission to do the activity you attempted. If you believe this to be in error, please contact the site administrator(s).
... View more
04-11-2017
02:34 PM
First of all thanks to all who answer this question as i am struggling with this issue from many hours but unable to resolve it. I have HDP2.5 running on my dev environment. After making a cleanup of all services except ZooKeeper i am not able to add any service using Ambari UI. The error message is below:- 500 status code received on POST method for API: /api/v1/stacks/HDP/versions/2.5/recommendations Error message: Server Error Error screen shot is also attached for reference. I have restarted the Ambari-server many times but no success. Thanks again.
... View more
Labels:
04-11-2017
12:39 PM
I have manually removed all services from my Hadoop dev cluster and only service is not getting delete is ZooKeeper Server. as it says "The ZooKeeper service can't be deleted, at least one service must be installed." How to remove Zookeeper so that i can start a fresh installation for Hadoop Cluster? Note: I am unable to add services keeping old ZooKeeper because of below error:- 500 status code received on POST method for API: /api/v1/stacks/HDP/versions/2.5/recommendations. Thanks in Advance!!!
... View more
Labels:
10-07-2016
03:42 PM
2 Kudos
PostgreSQL extension PG-Strom, allows users to customize the data scan and run queries faster. CPU-intensive work load is identified and transferred to the GPU to take advantage of the powerful GPU parallel execution ability to complete the data task. The combination of few number of core processors, RAM bandwidth, and the GPU has a unique advantage. GPUs typically have hundreds of processor cores and RAM bandwidths that are several times larger than CPUs. They can handle large numbers of computations in parallel, so their operations are very efficient. PG-Storm based on two basic ideas:
On-the-fly native GPU code generation. Asynchronous pipeline execution mode. Below figure shows how query is submitted to execution engine and during query optimization phase, PG-Storm detects whether a given query is fully or partially executable on the GPU, and then determines whether the query can be transferred. If the query can be transferred, PG-Storm creates the source code for the GPU native binaries on the fly, starting the real-time compilation process before the execution phase. Next, PG-Storm loads the extracted rowset into the DMA cache (the size of a buffer is defaulted to 15MB) and asynchronously starts DMA transfers and GPU core execution. The CUDA platform allows these tasks to be executed in the background, so PostgreSQL can run the current process ahead of time. Through GPU acceleration, these asynchronous correlation slices also hide the general delay. After loading PG-Strom, running SQL on the GPU does not require special instructions. It allows the user to customize the way PostgreSQL is scanned, and provides additional workarounds for scan/join logic that can be run on the GPU. If the expected cost is reasonable, Task Manager places the custom scan node instead of the built-in query execution logic. The graph below shows the benchmark results for PG-Strom and PostgreSQL. The abscissa is the number of tables, and the ordinate is the query execution time. In this test, all relevant internal relations can be loaded into the GPU RAM on a one-time basis, pre-aggregation greatly reduces the number of rows the CPU needs to process. For more details, test code can be viewed https://wiki.postgresql.org/wiki/PGStrom As can be seen from this figure, PG-Strom is much faster than PostgreSQL alone. Here are a few ways you can improve the performance of PostgreSQL: 1. Similar vertical expansion 2. Heterogeneous vertical expansion 3. Horizontal expansion PG-Strom uses a heterogeneous longitudinal extension approach that maximizes hardware benefits for workload characteristics. In other words, the PG-Strom allocates simple, large numbers of numerical calculations on GPU devices before running on the CPU core. https://www.linkedin.com/pulse/pg-storm-let-postgresql-run-faster-gpu-mukesh-kumar?trk=prof-post Evolution, Right...
... View more
- Find more articles tagged with:
- Data Processing
- FAQ
- gpudb
- postgres
08-26-2016
06:43 AM
Thanks @lgeorge for your response. i have tried with and without new consumer and same error message.
... View more
08-25-2016
11:52 AM
Mirror command below:- ./kafka-run-class.sh kafka.tools.MirrorMaker --consumer.config /usr/hdp/current/kafka-broker/config/consumer_mirr.properties --producer.config /usr/hdp/current/kafka-broker/config/producer_mirr.properties --whitelist MukeshTest --new.consumer
... View more
08-25-2016
11:50 AM
Hi, I am following steps on Kafka-Mirror given at "http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.6/bk_kafka-user-guide/bk_kafka-user-guide-20160628.pdf" and having two separate cluster but when i run kafka.tools.MirrorMaker i am getting below Error:- [2016-08-25 17:20:00,081] WARN The configuration serializer.class = kafka.serializer.DefaultEncoder was supplied but isn't a known config. (org.apache.kafka.clients.producer.ProducerConfig)
[2016-08-25 17:20:00,136] ERROR Exception when starting mirror maker. (kafka.tools.MirrorMaker$)
org.apache.kafka.common.config.ConfigException: Missing required configuration "bootstrap.servers" which has no default value.
at org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:148)
at org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:49)
at org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:56)
at org.apache.kafka.clients.consumer.ConsumerConfig.<init>(ConsumerConfig.java:336)
at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:541)
at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:523)
at kafka.tools.MirrorMaker$$anonfun$4.apply(MirrorMaker.scala:330)
at kafka.tools.MirrorMaker$$anonfun$4.apply(MirrorMaker.scala:328)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.Range.foreach(Range.scala:141)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at kafka.tools.MirrorMaker$.createNewConsumers(MirrorMaker.scala:328)
at kafka.tools.MirrorMaker$.main(MirrorMaker.scala:246)
at kafka.tools.MirrorMaker.main(MirrorMaker.scala)
Exception in thread "main" java.lang.NullPointerException
at kafka.tools.MirrorMaker$.main(MirrorMaker.scala:276)
at kafka.tools.MirrorMaker.main(MirrorMaker.scala)
consumer_mirr.properties zookeeper.connect=sourceHOST:2181
zookeeper.connection.timeout.ms=6000
group.id=test-consumer-group-mirror
consumer.timeout.ms=5000
shallow.iterator.enable=true
mirror.topics.whitelist=app_log
producer_mirr.properties metadata.broker.list=targetHOST:6667
request.required.acks=0
producer.type=async
compression.codec=none
serializer.class=kafka.serializer.DefaultEncoder
queue.enqueue.timeout.ms=-1
max.message.size=1000000
queue.time=1000
Your help is really appriciated here. Thanks in Advance!!!
... View more
Labels:
- Labels:
-
Apache Kafka
08-17-2016
07:18 AM
I do remember that i have't deleted or moved amswer yesterday but i think admin moved it orURL's tag change or something else as i can see problem persist. https://community.hortonworks.com/answers/51757/view.html I think HCC needs to look these issues more closely so that users would not feels the trails...
... View more
08-16-2016
01:27 PM
I am getting "Access Denied" error continuously. Steps to reproduce are:-1. Go to My Profile. 2. See question i have posted i.e ."Mukesh Kumar added an answer to the question "install LLAP but unable to find useful resource. Please help..."" i dont know about you but i am able to reproduce it many time using my credentials.. attached is screen shot.
... View more
- Tags:
- hcc
07-29-2016
11:12 AM
Hi @Himanshu Rawat There are two approach for this problem. 1. Create partition wise separate files using unix or any other tool and load them on individually in static partitions like below:- ALTER TABLE Unm_Parti ADD PARTITION (Department='A')
location '/user/mukesh/HIVE/HiveTrailFolder/A'; ALTER TABLE Unm_Parti ADD PARTITION (Department='B')
location '/user/mukesh/HIVE/HiveTrailFolder/B'; ALTER TABLE Unm_Parti ADD PARTITION (Department='C')
location '/user/mukesh/HIVE/HiveTrailFolder/C'; 2. Create external table and put file into external table HDFS location, we can call it as staging table. Now create final partition table and load it using dynamic partition enable:- 1. set hive.exec.dynamic.partition=true
This enable dynamic partitions, by default it is false.
2. set hive.exec.dynamic.partition.mode=nonstrict
We are using the dynamic partition without a static
partition (A table can be partitioned based
on multiple columns in hive) in such case we have to
enable the non strict mode. In strict mode we can use
dynamic partition only with a Static Partition. Now use below statement to load data:- INSERT OVERWRITE TABLE Final_Table PARTITION(c2) SELECT c1, c4,c3,c2 FROM stage_table;
... View more
07-29-2016
09:20 AM
Yes, Because it is required to map schema, the last column in file is partitioned column. But if you are loading from another table then in select statement keep your partitioned column last.
... View more
07-29-2016
08:01 AM
3 Kudos
We can create partition on both External as well as Managed tables. Yes we need to define partition before creating the tables. More on performance related go to below link https://community.hortonworks.com/questions/15161/can-we-apply-the-partitioning-on-the-already-exist.html See below example of a partitionon External table. CREATE EXTERNAL TABLE `myTable`(
`ossc_rc` string,
`subnetwork1` string)
PARTITIONED BY (
`part` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://location/ready'
TBLPROPERTIES (
'transient_lastDdlTime'='1433520068')
;
... View more
07-29-2016
07:45 AM
1 Kudo
Hope this help, type below on google and in result page your have release notes of all versions:- cloudbreak release notes site:sequenceiq.com
... View more
07-29-2016
06:58 AM
1 Kudo
I hope metastore already existed, but not in complete form. Therefore follow below steps and modify database like MySql as per your requirement: Before you run hive for the first time, run
schematool -initSchema -dbType derby If you already ran hive and then tried to initSchema and it's failing:
mv metastore_db metastore_db.tmp Re run
schematool -initSchema -dbType derby Run hive again Also if you change directories, the metastore_db created above won't be found.
... View more
07-27-2016
10:48 AM
1 Kudo
You can create external table and map schema and move file to HDFS, CREATE EXTERNAL TABLE IF NOT EXISTS Cars(
Name STRING,
Miles_per_Gallon INT,
Cylinders INT,
Displacement INT,
Horsepower INT,
Weight_in_lbs INT,
Acceleration DECIMAL,
Year DATE,
Origin CHAR(1))
COMMENT 'Data about cars from a public database'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
location '/user/<username>/visdata'; hdfs dfs -copyFromLocal cars.csv /user/<username>/visdata Now create ORC table :- CREATE TABLE IF NOT EXISTS mycars(
Name STRING,
Miles_per_Gallon INT,
Cylinders INT,
Displacement INT,
Horsepower INT,
Weight_in_lbs INT,
Acceleration DECIMAL,
Year DATE,
Origin CHAR(1))
COMMENT 'Data about cars from a public database'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS ORC; Insert the data from the external table to the Hive ORC table. INSERT OVERWRITE TABLE mycars SELECT * FROM cars;
... View more
07-27-2016
10:39 AM
Hi @Arun A K, it has been observed that most of the time is consumed when we write data for downstream in Cassandra as single node is serving to Cassandra cluster. Now we are planning to increase create multiple Cassandra nodes inside the Hadoop cluster for fast writing. I'll keep you update on progress.
... View more
07-27-2016
10:18 AM
Thanks @Arun A K, i'll verify suggestions on my test case let you know progress if get.
... View more
07-26-2016
02:36 PM
1 Kudo
Just received feedback from developers that using above approach there are able to utilize 61 virtual cores out of
64. But performance is still the bottleneck means file still taking same time. Anybody have idea whats wrong going on?
... View more
07-26-2016
11:25 AM
1 Kudo
I think applying different
memory parameter sizes are the best we can do with respect to file size to
optimize spark performance except if we have already tuned underlining program. As i don’t know the operation my team is performing in program but i have suggested need to verify below :- We can set parallelism at rdd like below:- Val rdd
=sc.textFile(“somefile”,8) Second major factor on
performance is because of security like wire encryption having 2x overhead and
data encryption(Ranger KMS) could cause 15 to 20% overhead. Note: Kerberos have no impact. Another parameter that need look
is what is the default queue for your spark-submit job, if this is going to
default queue and then override using below to more specialized queue with
below parameter --queue <if you have queue's
setup> Please let me know if we check anything else to gain performance....
... View more
07-26-2016
08:58 AM
2 Kudos
I have 8 node amazon cluster and I am trying to optimize my spark job but unable to bring down program execution below 15 minutes. I have tried executing my spark job with different memory parameters but it not accept and always execute with 16 executors even when i supply 21 or 33. Please help me what are the possible reasons as below is my command.. nohup hadoop jar
/var/lib/aws/emr/myjar.jar spark-submit
--deploy-mode cluster --num-executors 17 --executor-cores 5 --driver-cores 2
--driver-memory 4g --class
class_name s3:validator.jar
-e runtime -v true -t true -r true & Observation: When i pass 3 executes it default take 4 and execution is longer but other parameters have no effect.
... View more
Labels:
- Labels:
-
Apache Spark
07-22-2016
07:48 AM
1 Kudo
Someone please help me to point to a repository where i find ready made OS for Hadoop Installation. Means i don't want to spent time with other configurations like Java, Python, rmp, yum, network issue etc. and looking for OS which is just i download and start testing few hadoop components. Actually i have few images but they are giving problem here and there before reaching to the point where i'd start my actual Hadoop installation... Unknown ftp urls where OS images are present are also welcome...
... View more
- Tags:
- hadoop
- Hadoop Core
- OS
Labels:
- Labels:
-
Apache Hadoop
07-20-2016
02:48 PM
I am getting Unhanded Error on search page of examslocal(https://www.examslocal.com/) Below is error detail and attached is screen shot of error.. Unhandled Error
You are signed in as mkumar13@xavient.com
sign off An error has occured A run time error was generated while rendering . The exception message is: ID4223: The SamlSecurityToken is rejected because the SamlAssertion.NotOnOrAfter condition is not satisfied. NotOnOrAfter: '7/18/2016 1:52:35 PM' Current time: '7/20/2016 2:42:19 PM'
... View more
07-20-2016
02:24 PM
2 Kudos
Heterogeneous Storage in HDFS Hadoop version 2.6.0 introduced a new feature heterogeneous storage. Heterogeneous storage can be different according to each play their respective advantages of the storage medium to read and write characteristics. This is very suitable for cold storage of data. Data for the cold means storage with large capacity and where high read and write performance is not required, such as the most common disk for thermal data, the SSD can be used to store this way. On the other hand when we required efficient read performance, even in rate appear able to do ten times or a hundred times the ordinary disk read and write speed, or even data directly stored memory, lazy loaded hdfs. HDFS heterogeneous storage characteristics are when we do not need to build two separate clusters to store cold thermal class II data within a cluster can be done, so this feature is still very large practical significance. Here I introduce heterogeneous storage type, and if the flexible configuration of heterogeneous storage!
Ultra cold data storage, hard disk storage is very inexpensive - bank notes video system scenario IO read and write large-scale deployment scenarios, providing order - the default storage type
Type SSD storage - Efficient data query visualization, external data sharing, improve performance.
RAM_DISK - For extreme performance.
Hybrid disc - an ssd or a hdd + sata or sas
HDFS Storage Type ARCHIVE - Archival storage is for very dense storage and is useful for rarely accessed data. This storage type is typically cheaper per TB than normal hard disks. DISK - Hard disk drives are relatively inexpensive and provide sequential I/O performance. This is the default storage type. SSD - Solid state drives are useful for storing hot data and I/O-intensive applications. RAM_DISK - This special in-memory storage type is used to accelerate low-durability, single-replica writes. HDFS Storage Policies has six preconfigured storage policies Hot - All replicas are stored on DISK. Cold - All replicas are stored ARCHIVE. Warm - One replica is stored on DISK and the others are stored on ARCHIVE. All_SSD - All replicas are stored on SSD. One_SSD - One replica is stored on SSD and the others are stored on DISK. Lazy_Persist - The replica is written to RAM_DISK and then lazily persisted to DISK.
Next article i'll show practical usage with HDFS storage settings and a Storage Policy for HDFS Using Ambari, to be continue..
... View more
- Find more articles tagged with:
- FAQ
- hadoop
- Hadoop Core
- HDFS
- storage
Labels: