Member since
02-17-2019
26
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
419 | 05-19-2020 08:47 AM |
03-07-2021
06:56 PM
Hi experts, We have a Big Data cluster that is in production. It's in Cloudera CDH 6.3.4 and Cloudera Manager. We want to reallocate memory between Impala and Yarn from time to time, according to the usage requirement. Is it just a task to redo the Allocation % among the static service pools and restart the whole cluster, or there are something else to pay attention to? Thank you, Vincent
... View more
01-21-2021
11:53 AM
Hi experts:
The Hadoop version coming with CDH-6.3.4 is Hadoop 3.0.0-cdh6.3.4. The Apache Spark web site does not have a prebuilt tarball for Hadoop 3.0.0, so I downloaded "spark-3.0.1-bin-hadoop3.2.tgz". Untar'red and tried it on our CDH 6.3.4 cluster.
Simple Spark line counting works fine. But in a pyspark session 'show tables' in a hive database working fine, but creating a table fails with an error as:
pyspark.sql.utils.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table messages1. Invalid method name: 'get_table_req';
That is very similar to what is described here:
https://stackoverflow.com/questions/63476121/hive-queries-failing-with-unable-to-fetch-table-test-table-invalid-method-name
I tried to replace these hive related jars under Spark 3.0.1 jars subdirectory with the correspondent ones in /opt/cloudera/parcels/CDH-6.3.4-1.cdh6.3.4.p0.6626826/jars, it does not help - failed with different error.
Does anyone have some experience with running Spark 3 in a CDH 6.3.x cluster? Can you suggest anything to try?
Your help is greatly appreciated!
Regards.
Vincent
... View more
- Tags:
- CDH-6.3.4
- spark-3.0.1
09-03-2020
07:24 AM
Hi experts:
There is a node on which the DataNode process is restarted frequently by the supervisord. Other nodes in the cluster with same hardware and configurations do not see such issues. We are on the version CDH-5.15.2-1. Could you please advise where to look for the reason? Thank you.
In the log file 'hadoop-cmf-hdfs-DATANODE-compute-1-14.local.log.out', for today we see:
bash-4.1# grep -B 2 "STARTUP_MSG: Starting DataNode" hadoop-cmf-hdfs-DATANODE-compute-1-14.local.log.out
2020-09-03 02:49:08,033 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting DataNode -- 2020-09-03 03:48:31,912 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting DataNode -- 2020-09-03 05:25:37,999 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting DataNode -- 2020-09-03 08:26:25,445 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting DataNode -- 2020-09-03 08:42:48,882 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting DataNode
These correspond to what is the supervisord log /var/log/cloudera-scm-agent/supervisord.log:
2020-09-03 02:49:06,297 INFO exited: 64450-hdfs-DATANODE (terminated by SIGKILL; not expected) 2020-09-03 02:49:07,300 INFO spawned: '64450-hdfs-DATANODE' with pid 94527 2020-09-03 02:49:07,300 INFO Increased RLIMIT_MEMLOCK limit to 4294967296 2020-09-03 02:49:27,361 INFO success: 64450-hdfs-DATANODE entered RUNNING state, process has stayed up for > than 20 seconds (startsecs)
2020-09-03 03:48:31,094 INFO exited: 64450-hdfs-DATANODE (terminated by SIGKILL; not expected) 2020-09-03 03:48:31,166 INFO spawned: '64450-hdfs-DATANODE' with pid 107591 2020-09-03 03:48:31,166 INFO Increased RLIMIT_MEMLOCK limit to 4294967296 2020-09-03 03:48:51,368 INFO success: 64450-hdfs-DATANODE entered RUNNING state, process has stayed up for > than 20 seconds (startsecs)
2020-09-03 05:25:36,275 INFO exited: 64450-hdfs-DATANODE (terminated by SIGKILL; not expected) 2020-09-03 05:25:37,277 INFO spawned: '64450-hdfs-DATANODE' with pid 127966 2020-09-03 05:25:37,278 INFO Increased RLIMIT_MEMLOCK limit to 4294967296 2020-09-03 05:25:57,338 INFO success: 64450-hdfs-DATANODE entered RUNNING state, process has stayed up for > than 20 seconds (startsecs)
2020-09-03 08:26:23,687 INFO exited: 64450-hdfs-DATANODE (terminated by SIGKILL; not expected) 2020-09-03 08:26:24,690 INFO spawned: '64450-hdfs-DATANODE' with pid 18960 2020-09-03 08:26:24,690 INFO Increased RLIMIT_MEMLOCK limit to 4294967296 2020-09-03 08:26:44,752 INFO success: 64450-hdfs-DATANODE entered RUNNING state, process has stayed up for > than 20 seconds (startsecs)
2020-09-03 08:42:47,139 INFO exited: 64450-hdfs-DATANODE (terminated by SIGKILL; not expected) 2020-09-03 08:42:48,142 INFO spawned: '64450-hdfs-DATANODE' with pid 22506 2020-09-03 08:42:48,142 INFO Increased RLIMIT_MEMLOCK limit to 4294967296 2020-09-03 08:43:08,205 INFO success: 64450-hdfs-DATANODE entered RUNNING state, process has stayed up for > than 20 seconds (startsecs)
... View more
07-12-2020
06:29 PM
Hi Experts, as we know there was a recent news - Redash has joined Databricks. My question is that can we use Redash to do visualization from data stored in a cluster managed with Cloudera CDH software? Thank you!
... View more
06-10-2020
12:15 PM
Maybe the approach I am taking is impractical in CDH 6.3.3. What are the recommended ways to use GPU in CDH 6.3.3? From the release notes, Node Labels is a YARN Unsupported Features. Thank you!
... View more
06-08-2020
01:07 PM
Hi Experts:
I set up a small test cluster using CM and CDH 6.3.3. There are four data-nodes including one with a GPU card, and other three without any GPU card. It seems that the NodeManager correctly identify the GPU card. Also run '/bin/nvidia-smi' on the node, it reports correct result.
# for m in data-node1 data-node2 data-node3 data-node4; do echo $m; echo ----------; ssh $m 'g
rep "total resource" /var/log/hadoop-yarn/hadoop-cmf-yarn-NODEMANAGER-data-node*.log.out | tail -n 1'; done
data-node1
----------
2020-06-08 19:08:40,530 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as data-node1.c.nyu-xeep-eosp-xbmo.internal:8041 with total resource of <memory:11692, vCores:16>
data-node2
----------
2020-06-08 19:08:40,606 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as data-node2.c.nyu-xeep-eosp-xbmo.internal:8041 with total resource of <memory:11692, vCores:16>
data-node3
----------
2020-06-08 19:08:41,756 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as data-node3.c.nyu-xeep-eosp-xbmo.internal:8041 with total resource of <memory:11692, vCores:16>
data-node4
----------
2020-06-08 19:19:58,876 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as data-node4.c.nyu-xeep-eosp-xbmo.internal:8041 with total resource of <memory:11248, vCores:16, yarn.io/gpu: 1>
Below is what I have for "Fair Scheduler XML Advanced Configuration Snippet (Safety Valve)":
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<allocations>
<queue name="root">
<weight>1.0</weight>
<schedulingPolicy>drf</schedulingPolicy>
<aclSubmitApps>*</aclSubmitApps>
<aclAdministerApps> sysadmins</aclAdministerApps>
<queue name="default">
<weight>1.0</weight>
<schedulingPolicy>drf</schedulingPolicy>
</queue>
<queue name="gpu">
<weight>1.0</weight>
<maxResources>vcores=10, memory-mb=10240, yarn.io/gpu=1</maxResources>
<minResources>vcores=1, memory-mb=1024, yarn.io/gpu=1</minResources>
<schedulingPolicy>drf</schedulingPolicy>
</queue>
<queue name="users" type="parent">
<weight>1.0</weight>
<schedulingPolicy>drf</schedulingPolicy>
</queue>
</queue>
<defaultQueueSchedulingPolicy>fair</defaultQueueSchedulingPolicy>
<queuePlacementPolicy>
<rule name="specified" create="true"/>
<rule name="nestedUserQueue" create="true">
<rule name="default" create="true" queue="users"/>
</rule>
<rule name="default"/>
</queuePlacementPolicy>
</allocations>
Then I followed the example 'Distributed-shell + GPU without Docker' as described on below URL page:
https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/UsingGpus.html
But the applications were sent to nodes without any GPU card in both cases:
$ yarn jar /opt/cloudera/parcels/CDH/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar -jar /opt/cloudera/parcels/CDH/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar -shell_command /bin/nvidia-smi -container_resources memory-mb=3072,vcores=1,yarn.io/gpu=1 -num_containers 1
$ yarn jar /opt/cloudera/parcels/CDH/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar -jar /opt/cloudera/parcels/CDH/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar -shell_command /bin/nvidia-smi -container_resources memory-mb=3072,vcores=1,yarn.io/gpu=1 -num_containers 1 -queue root.gpu
Could you advise where else I need to do to make the scheduling functioning correctly? Thank you!
... View more
06-03-2020
07:08 AM
Hi paras, you are very helpful. The error disappears now after making the configuration modification as you suggested. Thank you!
... View more
06-02-2020
07:38 AM
Hi Experts:
Following the instruction as in https://docs.cloudera.com/documentation/enterprise/6/latest/topics/install_cm_cdh.html
, we set up a test CDH 6.3.3 cluster. We also enabled Kerberos for security. The Hive, Impala, HBase command line clients all make connections and are functioning basically. Most parts of Hue works too, except the HBase Browser throws error - "Api Error: TSocket read 0 bytes". Please see below the Hue access.log and HBase ThriftServer log. What to check further to resolve the issue? Thank you!
/var/log/hue/access.log
---------------------------
[02/Jun/2020 07:16:11 -0700] INFO 173.2.217.185 xe46 - "POST /hbase/api/getClusters HTTP/1.1" returned in 11ms [02/Jun/2020 07:16:11 -0700] INFO 173.2.217.185 xe46 - "POST /notebook/api/autocomplete/default HTTP/1.1" returned in 152ms [02/Jun/2020 07:16:12 -0700] INFO 173.2.217.185 xe46 - "POST /hbase/api/getTableList/HBase HTTP/1.1" returned in 96ms [02/Jun/2020 07:16:12 -0700] ERROR 173.2.217.185 xe46 - "POST /desktop/log_js_error HTTP/1.1"-- JS ERROR: {"msg":"Uncaught SyntaxError: Unexpected token ':'","url":"https://35.226.68.232:8888/hue/hbase/#HBase","line":2,"column":10, "stack":"SyntaxError: Unexpected token ':'\n at w (https://35.226.68.232:8888/static/desktop/js/bundles/hue/vendors~hue~notebook-bundle-ba716af7db7997b47d29.a4ce11024956.js:37:676)\n at Function.globalEval (https://35.226.68.232:8888/static/desktop/js/bundles/hue/vendors~hue~notebook-bundle-ba716af7db7997b47d29.a4ce11024956.js:37:2584)\n at text script (https://35.226.68.232:8888/static/desktop/js/bundles/hue/vendors~hue~notebook-bundle-ba716af7db7997b47d29.a4ce11024956.js:48:76954)\n at https://35.226.68.232:8888/static/desktop/js/bundles/hue/vendors~hue~notebook-bundle-ba716af7db7997b47d29.a4ce11024956.js:48:73527\n at C (https://35.226.68.232:8888/static/desktop/js/bundles/hue/vendors~hue~notebook-bundle-ba716af7db7997b47d29.a4ce11024956.js:48:73644)\n at XMLHttpRequest.<anonymous> (https://35.226.68.232:8888/static/desktop/js/bundles/hue/vendors~hue~notebook-bundle-ba716af7db7997b47d29.a4ce11024956.js:48:76224)"} [02/Jun/2020 07:16:12 -0700] INFO 173.2.217.185 xe46 - "POST /desktop/log_js_error HTTP/1.1" returned in 3ms
/var/log/hbase/hbase-cmf-hbase-HBASETHRIFTSERVER-master-node1.c.nyu-xeep-eosp-xbmo.internal.log.out
----------------------------
2020-06-02 14:16:12,002 INFO org.apache.hadoop.hbase.thrift.ThriftServerRunner: Effective user: hue 2020-06-02 14:16:12,007 ERROR org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer: Thrift error occurred during processing of message. org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got ffffff80 at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:503) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27) at org.apache.hadoop.hbase.thrift.ThriftServerRunner.lambda$setupServer$0(ThriftServerRunner.java:656) at org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2020-06-02 14:16:12,019 INFO org.apache.hadoop.hbase.thrift.ThriftServerRunner: Effective user: hue 2020-06-02 14:16:12,020 ERROR org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer: Thrift error occurred during processing of message. org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got ffffff80 at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:503) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27) at org.apache.hadoop.hbase.thrift.ThriftServerRunner.lambda$setupServer$0(ThriftServerRunner.java:656) at org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2020-06-02 14:16:12,031 INFO org.apache.hadoop.hbase.thrift.ThriftServerRunner: Effective user: hue 2020-06-02 14:16:12,032 ERROR org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer: Thrift error occurred during processing of message. org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got ffffff80 at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:503) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27) at org.apache.hadoop.hbase.thrift.ThriftServerRunner.lambda$setupServer$0(ThriftServerRunner.java:656) at org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
... View more
05-27-2020
11:35 AM
Hi Experts: It seems that the Spark Thrift Server is not officially supported in Cloudera CDH. How do people use Tableau to do visualization with data stored in a Cloudera CDH cluster? Thank you! Can you recommend any other visualization tools?
... View more
05-19-2020
08:47 AM
Hi paras, resetting the hostnames to long names got me moving forward again. Thank you!
... View more
05-18-2020
05:42 PM
The cluster consists of six VM nodes in GCP: 2 master nodes, 1 login node, and 3 data nodes.
... View more
05-18-2020
05:40 PM
Hi, Following the instruction described on this page: https://docs.cloudera.com/documentation/enterprise/6/latest/topics/installation.html I made some good progress until up to step 6: Install CDH and Other Software: https://docs.cloudera.com/documentation/enterprise/6/latest/topics/install_software_cm_wizard.html Now I encountered error. Please see the picture below: I completely wiped out the databases, retried twice, and still seeing the error. Could you advise where to check to fix the trouble, and move forward? Thank you!
... View more
Labels:
05-18-2020
05:29 PM
The downloading works for me. Thanks whoever helped me.
... View more
05-15-2020
09:39 AM
Hi, With some experts' help, I get one Licenseinfo.zip file. Inside the file, there are two files: *-201907-240656_cloudera_license.txt *-201907-240656_info.txt In the second file, there are login and password: login: xxxxxxxx password: yyyyyy From this page https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cm_6_version_download.html the 6.3.3 Cloudera Manager repo is https://<username>:<password>@archive.cloudera.com/p/cm6/6.3.3/redhat7/yum/cloudera-manager.repo So I tried the following, and it's not working: $ wget https://xxxxxxxx:yyyyyy@archive.cloudera.com/p/cm6/6.3.3/redhat7/yum/cloudera-manager.repo --2020-05-11 12:52:04-- https://xxxxxxxx:*password*@archive.cloudera.com/p/cm6/6.3.3/redhat7/yum/cloudera-manager.repo Resolving archive.cloudera.com... 199.232.36.167 Connecting to archive.cloudera.com|199.232.36.167|:443... connected. HTTP request sent, awaiting response... 401 Authentication required <1589208855> Connecting to archive.cloudera.com|199.232.36.167|:443... connected. HTTP request sent, awaiting response... 401 Access denied <1589208855> Authorization failed. Can somebody shed some light on this? Thanks again!
... View more
05-05-2020
06:27 PM
1 Kudo
Thank you @CY Jervis. That's very kind of you.
... View more
05-05-2020
11:49 AM
Hi Experts: We would like to download and test Cloudera Manager and CDH 6.3.3. We understand that from this page: https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cm_6_version_download.html#cm_6_version_download A 60-day trial can be enabled to provide access to the full set of Cloudera Enterprise Cloudera Enterprise features. Cloudera Enterprise can be enabled permanently with the appropriate license. We filled out the contact sales form, to request enabling our download and try 6.3.3 out. Could you please knowledgeable people can help us to get this move forward? Thank you!
... View more
Labels:
07-19-2019
12:17 PM
The issue is resolved with these steps: From the CM web page, click YARN (MR2 Included) -> Actions -> Install YARN MapReduce Framework JARs.
... View more
07-19-2019
06:05 AM
Hi experts, I am setting up a small test cluster, using CDH 6.2.0. So far it seems that HDFS, YARN start up fine. And spark-submit runs, and the applications generate results as expected. But teragen does not run. It looks like a MapReduce setup issue. Could you please give some hints? Thank you! $ hadoop jar /opt/cloudera/parcels/CDH/jars/hadoop-mapreduce-examples-3.0.0-cdh6.2.0.jar teragen 10000000 testgen_output WARNING: Use "yarn jar" to launch YARN applications. 19/07/19 08:57:19 INFO client.RMProxy: Connecting to ResourceManager at c41-12/172.16.2.121:8032 java.io.FileNotFoundException: File does not exist: hdfs://c41-12:8020/user/yarn/mapreduce/mr-framework/3.0.0-cdh6.2.0-mr-framework.tar.gz at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:145) at org.apache.hadoop.fs.AbstractFileSystem.resolvePath(AbstractFileSystem.java:488) at org.apache.hadoop.fs.FileContext$25.next(FileContext.java:2225) at org.apache.hadoop.fs.FileContext$25.next(FileContext.java:2221) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.resolve(FileContext.java:2227) at org.apache.hadoop.fs.FileContext.resolvePath(FileContext.java:607) at org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:460) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:146) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588) at org.apache.hadoop.examples.terasort.TeraGen.run(TeraGen.java:304) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.examples.terasort.TeraGen.main(TeraGen.java:308) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:313) at org.apache.hadoop.util.RunJar.main(RunJar.java:227)
... View more
Labels:
07-01-2019
10:40 AM
Hello Experts, Is there a way to read compressed (.gzip, .zip or other formats) CSV files from a directory with pyspark, to create a dataframe? Thank you!
... View more
06-21-2019
04:59 AM
Hi Experts, Is it possible to configure YARN on a data node to use half of cpu cores and half of total memory, and when necessary e.g. we are very busy, modify YARN configuration to expand to all available resources (cpu cores, memory)? And when not that busy, shrink back to half of the resouces? Can YARN be that elastic? Thank you!
... View more
Labels:
05-23-2019
06:48 PM
Thanks a lot for your reply Harsh. These sound great. Can you give some pointers to some learning materials on both methods, i.e. examples, blogs, URLs or books etc?
... View more
05-18-2019
05:43 AM
We have ten millions image and video files, are looking for efficient ways to store them in Hadoop (HDFS ...), and analyze them with tools available in the Hadoop ecosystem. I understand HDFS prefer big files. These image files are small, they are under ten megabytes. Please advise. Thanks very much!
... View more
Labels:
04-29-2019
08:02 AM
Hi Wilfred, Without node labels, is it possible to have a few nodes reserved for some users' exclusive usage? Thanks Vincent
... View more
04-29-2019
07:56 AM
It's very nice to know the upcoming updates. So with the feature in place, we would be able to assign some nodes exclusively to certain users (e.g. user userA, application appX) for their processing requirements. The question following naturally is about data storage locations: can other users' applications still read and write to the assigned nodes? will appX output files be written to the assigned nodes only, will appX be allowed to read input file blocks from all nodes in the cluster? This is too much I know. Thanks a lot!
... View more
04-25-2019
10:41 AM
I double that, but I check here anyway. Our system is on CDH-5.15.2. The resource manager and job scheduler is YARN: $ yarn version Hadoop 2.6.0-cdh5.15.2 Subversion http://github.com/cloudera/hadoop -r c97bcbf0cba923467d45f5519b1953f436c64f12 Compiled by jenkins on 2018-11-13T13:53Z Compiled with protoc 2.5.0 From source with checksum 9d2d5b887383c7d4b811372f867c6440 This command was run using /opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/jars/hadoop-common-2.6.0-cdh5.15.2.jar There are two users, they want isolated environment to run some experiments. I wonder if we can reserve two nodes for their use. Any suggestions and discussion are very welcome. Best Regards, Vincent
... View more
Labels: