Member since
06-21-2016
40
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
552 | 05-11-2017 03:42 PM |
06-15-2022
06:26 AM
Operator error, I forgot the balancer was an hdfs command and not something in /usr/bin
... View more
06-14-2022
01:25 PM
We had an HDP 2.5.3.0 cluster that had an HDFS balancer CLI command that is not found with HDP 3.1.5.0. I am adding new nodes the the 3.1.5 cluster and it appears there is some HDFS balancing going on, albeit slowly. Did the CLI go away with 3.1.5.0 or am I missing something?
... View more
Labels:
- Labels:
-
HDFS
-
Hortonworks Data Platform (HDP)
03-24-2022
11:32 AM
I have a small cluster running HDP 3.1.5.6091 with Ambari 2.7.5.0 the I inherited
The Yarn Timeline service starts but gets a constant critical alert
ATSv2 HBase Application The HBase application reported a 'FAILED' state. Check took 2.261s
The hadoop-yarn-timelinereader.log shows
failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused:server.name:17020
I added that port (17020) to the firewall on the server in question, restarted the Timeline Service V2.0 Reader service but still getting the same error.
Any suggestions would be appreciated
... View more
Labels:
02-22-2021
07:53 AM
I am running HDP-2.5.3.0, ambari 2.4.2.0
I have a cluster node/host that died over the weekend and I am not able to resurrect it
Through ambari I am able to put it in maintenance mode.
I would like to delete this node from the cluster. It's role was datanode and yarn Nodemanager.
Went choosing the delete option for Host Actions I get a warning that the DataNode and Nodemanager should be decommissioned first to prevent loss but those options are not available via ambari.
Is there another option for decommissioning or is my only option to delete given the status of the node?
Thanks.
... View more
Labels:
12-20-2018
11:03 PM
fixed the problem, one of the file systems was at 100%, delete some old user cache data and things are working again
... View more
12-20-2018
10:33 PM
I installed patches and bounced a node on my cluster and now the datanode won't start 2018-12-20 14:26:07,962 ERROR datanode.DataNode (BPServiceActor.java:run(772)) - Initialization failed for Block pool <registering> (Datanode Uuid fa50d7aa-c305-47bd-9935-6d8f947e0d27) service to m02.pnl.gov/192.168.41.52:8020. Exiting.
java.io.IOException: All specified directories are failed to load. Redhat 6.10 Ambari 2.5.0.3 Stack 2.7.3.2.6
... View more
Labels:
- Labels:
-
Apache Hadoop
09-12-2018
09:50 PM
Thanks, this did work for me! Is there a way to configure the hadoop cluster to use a specific installed version of python?
... View more
09-11-2018
09:50 PM
I have two versions on python installed (2.6 and 2.7) Spark jobs run thru shell in pyspark are picking up one version of Python (2.7). Jobs submitted to the cluster via yarn are picking up the 2.6 version of python. How can I get yarn jobs to point to the 2.7 version?
... View more
- Tags:
- Hadoop Core
- python
04-23-2018
10:28 PM
I lost a drive on one of my data nodes that apparently stored some ambari-metrics collection stuff. Besides the hadoop directory structure that was created when the datanode was restarted I see a var/lib/ambari-metrics-collector that was created as well replaced the drive and everything is back up except the metrics collector. Getting these errors 22:22:12,598 WARN [main-SendThread(datanode03.foo:61181)] ClientCnxn:1146 - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused 22:22:04,011 WARN [main] DefaultPhoenixDataSource:84 - Unable to connect to HBase store using Phoenix.
java.sql.SQLException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the locations
... View more
Labels:
- Labels:
-
Apache Ambari
03-22-2018
06:06 PM
Rebooted my cluster last week and I am seeing unusually high CPU loads on 3 datanodes. Nothing changed, except the reboot. Load average being 120-150 with the others having a load of less than 5. Looking at top, it seems to be yarn java processes that are running. All nodes are running Java 1.7.0 Running HDP 2.5.3 ambari 2.4.2 37 datanodes all configured the same Dell R515 servers 2 8-core AMD CPU's RHEL 6 64-bit 256GB memory same hard drives The 3 nodes in question only differ by CPU. They are running AMD Opteron 4386 vs AMD Opteron 4284 on all the other datanodes. I have been running this cluster for over year with no problems. Any ideas as to why this is happening? I have looked at bios setting to see if I had some configured different but they are all the same
... View more
- Tags:
- cpu
03-07-2018
03:05 PM
My cluster nodes have two versions on python installed. 2.6.6 in /usr/bin 2.7.12 in /usr/local/bin. I installed some python modules for the 2.7.12 version associated with geolocation. When I run a job locally on one of the nodes it runs fine. When it is submitted through yarn I get the following “ImportError: No module named ipaddress” ipaddress is one of the modules I installed. I suspect yarn is using the 2.6.6 version of python. How can I determine if this is the case and if it is how can I define yarn to use the python in /usr/local/bin? Thanks
... View more
Labels:
- Labels:
-
Apache YARN
03-01-2018
05:05 PM
NameNode Heap Usage (weekly) alert - I get these periodically Critical is set to 60%, warning at 50% minimum heap set to 1000M. The variance for this alert is 504MB which is 66% of the 7560MB average (4536MB is the limit). Is this a concern? If so, what should I be looking at?
... View more
Labels:
- Labels:
-
Apache Hadoop
01-26-2018
05:22 PM
Running Ambari 2.4.2 and HDP 2.5.3.0 It looks like my cluster is defined to use mysql for the hive metastore. The log file is being written to /var/run/mysqld and is 239GB and growing and is close to filling up the /var file system Can I delete or rotate out the log and do I need to shutdown hive to do this? Is there a way to reduce the level of logging or to automatically rotate this log?
... View more
Labels:
- Labels:
-
Apache Hive
01-03-2018
06:13 PM
running a job in hive, writes cache to /var/hadoop/yarn/local/usercache This is causing the /var file system to fill up resulting in node manage failures on some nodes and the job to hang. Is it possible to direct usercache to a different location?
... View more
- Tags:
- Hive
- nodemanager
- YARN
Labels:
- Labels:
-
Apache Hive
-
Apache YARN
12-21-2017
07:33 PM
Running this in the ambari hive 2.0 view. Trying to join 2 tables into a new table. Query seems to run then stalls at 100% and returns with an error but no specific error message is provided in the hive view. Where can I find the error message or at least logs associated with the job ID provided by the hive view?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hive
09-25-2017
10:05 PM
adding new nodes to my cluster, I have one that I cannot get the ambari-agent to start RHEL 6.9, HDP-2.6.0.3, ambari 2.5.0.3, ambari-agent-2.5.0.3-7.x86_64 I get the following error in the ambari-agent log ERROR 2017-09-25 14:59:26,625 Controller.py:502 - Controller thread failed with exception: Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 489, in run
self.register = Register(self.config) File "/usr/lib/python2.6/site-packages/ambari_agent/Register.py", line 35, in __init__
self.hardware = Hardware(self.config) File "/usr/lib/python2.6/site-packages/ambari_agent/Hardware.py", line 52, in __init__
self.hardware.update(Facter(self.config).facterInfo()) File "/usr/lib/python2.6/site-packages/ambari_agent/Facter.py", line 571, in facterInfo
facterInfo = super(FacterLinux, self).facterInfo() File "/usr/lib/python2.6/site-packages/ambari_agent/Facter.py", line 248, in facterInfo
'ipaddress': self.getIpAddress(), File "/usr/lib/python2.6/site-packages/ambari_agent/Facter.py", line 81, in getIpAddress
return socket.gethostbyname(self.getFqdn().lower()) gaierror: [Errno -2] Name or service not known
... View more
Labels:
- Labels:
-
Apache Ambari
09-18-2017
09:44 PM
A datanode service on one of my cluster nodes is down because I lost a hard drive/file system. I have a drive on order but it may be a couple of days until I have it in hand. What is the impact of continuing to run in this state?
... View more
- Tags:
- Hadoop Core
- HDFS
Labels:
- Labels:
-
Apache Hadoop
07-31-2017
04:59 PM
running HDP 2.5.3 The root file system filled up on the node that runs hive metastore and hiveserver2. Both processes were stopped. I cleared up disk space but could not get either process to start hiveserver2 error yarn.exceptions.ApplicationNotFoundException: Application with id "application_1496410945618_25770" doesn't exist in RM hivemetastore error transport exception could not create ServerSocket I suspect it was a job that didn't finish or die cleanly because of the disk filling up. There were no jobs running in the yarn RM I tried stopping all the hive processes and restarting yarn and then start hive. Still had the same problem. I resorted to stopping all the services on the node where the disk filled up and bouncing the node. This fixed the problem as hiveserver2 and hive metastore now run. Any ideas of a better way to troubleshoot and resolve something like this?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache YARN
-
Cloudera Manager
06-12-2017
03:14 PM
Thanks, adding the jar to HIVE_AUX_JARS_PATH in hive-env.sh got SerDe working in zeppelin
... View more
05-30-2017
05:18 PM
Trying to get serde working with zeppelin and/or the spark shell. Running the following in either val links =
sqlContext.sql("SELECT * FROM test_links LIMIT 10")
links.show() Produces an error ERROR hive.log: error in initSerDe: java.lang.ClassNotFoundException Class org.apache.hive.hcatalog.data.JsonSerDe not found
java.lang.ClassNotFoundException: Class org.apache.hive.hcatalog.data.JsonSerDe not found Tried added this to /etc/spark/conf/hive-site.xml <property> <name>hive.aux.jars.path</name>
<value>file:///usr/hdp/2.5.3.0-37/hive2/lib/hive-hcatalog-core.jar</value> </property> But am still getting the error.
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache Zeppelin
05-19-2017
07:28 PM
I can run the cli, getting some read-only database errors. I get the scala> prompt and can issue commands (import sys.process._ and "ls -al".!) whcih work. Trying these same things from the zeppelin notebook cause the job to hang (contantly in a running 0% state)
... View more
05-18-2017
06:55 PM
Running spark or pyspark from the zeppelin notebook hangs. Looking in the zeppelin-interpreter-spark log I am seeing the following ERROR [2017-05-18 11:51:39,130] ({BoneCP-pool-watch-thread} PoolWatchThread.java[fillConnections]:118) - Error in trying to obtain a connection. Retrying in 7000ms
java.sql.SQLException: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection.
at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.setReadOnly(Unknown Source)
at com.jolbox.bonecp.ConnectionHandle.setReadOnly(ConnectionHandle.java:1324)
at com.jolbox.bonecp.ConnectionHandle.<init>(ConnectionHandle.java:262)
at com.jolbox.bonecp.PoolWatchThread.fillConnections(PoolWatchThread.java:115)
at com.jolbox.bonecp.PoolWatchThread.run(PoolWatchThread.java:82)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.SQLException: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) I have oozie installed and have verified the ownership of the oozie-db oozie:hadoop
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache Zeppelin
05-18-2017
02:11 PM
HDP 2.5.3 Ambari 2.4.2 18 data nodes 190TB HDFS disk usage is at about 92% ~15TB free with critical alarms or warnngs on most all the data nodes Percent DataNodes With Available Space is alarming as well Are the best practice recommendations for setting these thresholds, for managing the percent of HDFS disk usage? Are there concerns for running HDFS disk usage above a certain percentage?
... View more
Labels:
- Labels:
-
Apache Hadoop
05-11-2017
03:42 PM
I figured out what was wrong. There are 2 HDFS configuration groups on this cluster. One is set up for the datanodes. I just needed to add the new servers to that group
... View more
05-11-2017
03:17 PM
HDP 2.5.3 I have a cluster that has 34 datanodes, each with (11) 1.2TB disks for hdfs. I added three new nodes but these only have (9) 1.2TB disks for hdfs. The new datanodes have been added but it seems that not all the file systems are seen by hdfs When I look at one of the older datanodes, in hdfs-site.xml for dfs.datanode.data.dir all the files systems are listed (disk1 - disk11). On the new nodes, only disk1 thru disk6 are listed even though disk7, 8 and 9 are configure and mounted as file systems. Question. How do I get these nodes to recognize the other disks? Can I edit the hdfs-site.xml and add them to the list? If so, what are the steps? I don't seem to be able to do this thru ambari.
... View more
- Tags:
- Hadoop Core
- HDFS
Labels:
- Labels:
-
Apache Hadoop
05-09-2017
05:38 PM
HDP 2.5.3 Spark 1.6.x.2.5 Zeppelin Notebook 0.6.0.2.5 Just installed zeppelin, trying to run some of the examples. On the zeppelin notebook server I have python 2.7 installed (/usr/bin/python). On the other nodes in the cluster I have python 2.6 (/usr/bin/python) and 2.7 (/usr/local/bin/python) installed. I am runnning the "Hello World" example in zeppelin. Most steps run, but some fail with this error Exception: Python in worker has different version 2.6 than that in driver 2.7, PySpark cannot run with different minor versions In the stack trace I can see the node that I assume is the worker. It is not a spark client but I created /etc/spark/conf/spark-env.sh with the following export PYSPARK_PYTHON=/usr/local/bin/python export PYSPARK_DRIVER_PYTHON=python
pointing to python 2.7, the same as the version that is running on the Zeppelin notebook. I am still getting the error. I am leery about removing the python 2.6 on the other nodes but am not sure how to get around this.
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache Zeppelin
05-05-2017
07:19 PM
43 nodes in total I do not have the metrics_whitelist file in /etc/ambari-metrics-collector/conf
... View more
05-05-2017
07:18 PM
running embedded mode. Added the ports, restarted iptables and ambari-metrics. Still getting spinners for graphs on the ambari-metrics page and on individual host pages, everything else seems to be OK. I haven't tried blowing away the var/lib/ambari-metrics directory No errors in any of the logs
... View more
05-05-2017
02:40 PM
I think my problem is with a FW. If I stop the FW on and ambari server and the ambari-metrics-collector server, things seem OK. Besides 6188, are there any other ports that AMC use?
... View more
05-04-2017
06:36 PM
HDP 2.5.3 37 datanodes I had 34 data nodes that are identical with regards to CPU,. memory, storage. I recently added 3 data nodes with the same CPU and memory but less storage (9TB as opposed to 11TB on the older nodes) I ran hdfs balancer, which chugged for a while and moved data to the 3 new nodes. Problem is, that 3 file systems on each of the 3 new nodes do not seem to be getting data. I have run the balancer from the CLI and the output states the cluster is balanced.The older nodes (with more storage) are at about 50% disk utilization, the 3 new nodes are at about 30% disk utilization
... View more
Labels:
- Labels:
-
Apache Hadoop