About mbigelow

mbigelow · ‎02-13-2017

Why does the search have KB and Community as different options? I would want to search both if I had an issue and was trying to see if others had encountered and fixed it already.

mbigelow · ‎02-13-2017

I think some kind of encouragement in the use of tags would benefit overall. I honestly, always skip over the topics and head to the list of most recent questions. Down playing the topics and increasing the visibility of tags would help increase tag usage. I view them as more valuable as then the number of topics could increase without having a huge table of topics. It would be more straightforward in post and in finding specific posts. I don't want to say, just be like StackOverflow (even though it is a really awesome community site), but their use of tags is more to what I am thinking.

mbigelow · ‎02-13-2017

In my short time, I have seen a number of duplicates or questions with similar answers. This could be addressed as one user mentioned above, just removed duplicate post. A milder option would be to allow a post to be marked as duplicate (possible limited to certain groups). And push down duplicates in the search.

mbigelow · ‎02-13-2017

Lets step back, instead of trying to hunt down were it is set on the client side mark dfs.replication to final in your configs. This will prevent any clients from changing it at run time. <property> <name>dfs.replication</name> <value>2</value> <final>true</final> </property>

mbigelow · ‎02-13-2017

In HDFS, you tell it which disk to use and it will fill up those disk. There is the ability to set how much space on those disks are reserved for non-DFS data but it doesn't actual prevent the disk from being filled up. The issue at hand is that the smaller disk will fill up faster, so at some point they will not allow any more write operations and the cluster will have no way to balance itself out. This causes issue with HDFS replication and placement, along with hotspotting in MR, Spark, and any other jobs. Say for instance if you primarily operation on the last days worth of data for 80% of your jobs. At some point you will hit critical mass were those jobs, are running mostly on the same set of nodes. You could set the reserved non-DFS space to different values using Host Templates in CM. This would then at least give you a warning when you are approaching filling up the smaller disk, but then at that point the larger disk would have free space that isn't getting used. This is why it is strongly encourage to not have different hardware. If possible upgrade the smaller set. A possible option would be to use Heterogeneous storage. With it you can designate pools, so the larger nodes would be in one pool and the smaller in the other. Each ingestion point would need to set which pool it would use and you can set how many replicas go to each. This is a big architectural change those and should be carefully reviewed to see if it benefits your use case(s) in anyway. So, simply, use the same hardware or you will more than likely run into issues.

mbigelow · ‎02-13-2017

What libraries or archive files is the job using? It seems to be trying to connect to some URL using a library somewhere and failing to open that file.

mbigelow · ‎02-13-2017

A failure to deploy the client configs indicates that the CM Agents are not working correctly. What is the status of the agents and any errors in their logs?

mbigelow · ‎02-13-2017

I had this issue when trying to add the Keytrustee parcel to my local repo. I can't recall the exact error but it was related to it not reading the manifest.json file correctly or it not being correct. Anyway, I used the other method in the linked provided by @truonala and hosted it temporarily using Python. This worked for me. python -m SimpleHTTPServer 8900

mbigelow · ‎02-13-2017

Disregard my mention of the hdfs-site.xml not being under the oozie process directory. It was under yarn-conf sub-directory.

mbigelow · ‎02-13-2017

ok, on the server running oozie run 'ps -ef | grep oozie'. Find the oozie.config.dir value and search it for the configuration files. If there is an hdfs-site.xml there, check it for the repl factor. Looking at my own CDH 5 cluster I see now that oozie is different than other services it is under /run/cloudera-scm-agent. I don't know if yours will be since you still didn't have the agent process directory under /var/run. I also don't have a hdfs-site.xml under my oozie process directory.

Online	Offline
Last Visited	‎03-25-2019 05:55 PM

Member Since	‎08-16-2016 08:51 PM
Last Visited	‎03-25-2019 05:55 PM
Posts	642
Kudos received	129

Cloudera Community

Re: Configuring the HDFS superuser in Kerberos

Re: Hive process crash

Re: Upgrade from CDH 5.11 Express to Enterprise

Re: Adding user to Cloudera Manager using REST AP...

Re: Running in non-interactive mode, and data appe...

Re: How can we better serve you with this communit...

Re: How can we better serve you with this communit...

Re: How can we better serve you with this communit...

Re: NameNode alerting on Blocks under replicated e...

Re: Hadoop data nodes with different disk space

Re: java.lang.IllegalStateException(zip file close...

Re: Installation of Hadoop using Cloudera manager ...

Re: Cloudera Manager(5.10.0) Installation Stuck at...

Re: NameNode alerting on Blocks under replicated e...

Re: NameNode alerting on Blocks under replicated e...