Member since
01-08-2014
88
Posts
15
Kudos Received
11
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3685 | 10-29-2015 10:12 AM | |
3275 | 11-27-2014 11:02 AM | |
4063 | 11-03-2014 01:49 PM | |
2377 | 09-30-2014 11:26 AM | |
4737 | 09-21-2014 11:24 AM |
03-24-2017
06:59 AM
Apache Phoenix is tied in to specific versions of HBase. Due to maintaining compatibility across CDH5 releases, version 1.2.0-cdh5.10.0 has some api incompatibilities with Apache HBase 1.2.0. Unfortunately, some of these API differences are ones that matter to Phoenix. Generally, this means some small code changes are needed to get Phoenix to build against CDH jars. Note that Phoenix on CDH is not something Cloudera supports. Still, you can take a look at the Cloudera Labs git repo for what kind of changes are needed. We never made a Phoenix 4.6 build, IIRC. But we did make a Phoenix 4.7
... View more
04-05-2016
06:23 AM
Please open a new discussion thread for your issue. Older solved threads are unlikely to receive an appropriate amount of attention. I'd recommend you post your MapReduce issue over in the batch processing forum. Be sure to include you version of CDH, a complete stack trace, and the command you used to launch the job.
... View more
10-29-2015
10:12 AM
They are. 5.3.8 (Oct 20th) happened after 5.4.7 (Sep 18th). The next release of 5.4 after the 5.3.8 release will have the fix.
... View more
02-03-2015
07:58 AM
1 Kudo
Each file uses a minimum of one block entry (though that block will only be the size of the actual data). So if you are adding 2736 folders each with 200 files that's 2736 * 200 = 547,200 blocks. Do the folders represent some particular partitioning strategy? Can the files within a particular folder be combined into a single larger file? Depending on your source data format, you may be better off looking at something like Kite to handle the dataset management for you.
... View more
11-27-2014
11:02 AM
As Mike previously mentioned, those configuration files don't exist when the cluster is handled by CM. It sounds like the underlying problem might be incorrect host name resolution. Accumulo and Hadoop require forward and reverse DNS to be set up correctly. You should not have IP addresses in your configuration files. If the problem is incorrect host names, you can check a few things 1) What does CM think the name of the hosts are? If you go to http://cm.example.com:7180/cmf/hardware/hosts (where "cm.example.com" is the name of your CM host), what is listed in the "name" column? It should be all full qualified domain names. 2) What does the host think its name is? Log into each of the cluster machines and run the "hostname" command. It should return a fuly qualified domain name and this name should match the one found in the "name" column above. 3) What do Accumulo processes think the host names are? You can see this by looking inside of ZooKeeper. Because ZooKeeper is used to maintain critical information for Accumulo, you should be very careful while dealing with it directly. It's also important to note that this information is deep in the internals of Accumulo; you must not presume it will be the same across versions. Below I'll show an example from a cluster running Accumulo 1.6.0-cdh5.1.0. Connect to zookeeper and see what shows up for tablet servers in the /accumulo/%UUID%/tservers node $ zookeeper-client -server zoo1.example.com,zoo2.example.com,zoo3.example.com
Connecting to zoo1.example.com,zoo2.example.com,zoo3.example.com
... SNIP ...
2014-11-27 10:50:11,499 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=zoo1.example.com,zoo2.example.com,zoo3.example.com sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@8d80be3
Welcome to ZooKeeper!
2014-11-27 10:50:11,535 [myid:] - INFO [main-SendThread(zoo2.example.com:2181):ClientCnxn$SendThread@975] - Opening socket connection to server zoo2.example.com/10.17.72.3:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration)
JLine support is enabled
2014-11-27 10:50:11,546 [myid:] - INFO [main-SendThread(zoo2.example.com:2181):ClientCnxn$SendThread@852] - Socket connection established to zoo2.example.com/10.17.72.3:2181, initiating session
2014-11-27 10:50:11,560 [myid:] - INFO [main-SendThread(zoo2.example.com:2181):ClientCnxn$SendThread@1235] - Session establishment complete on server zoo2.example.com/10.17.72.3:2181, sessionid = 0x349ace5c95e63c4, negotiated timeout = 30000
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: zoo1.example.com,zoo2.example.com,zoo3.example.com(CONNECTED) 0] ls /accumulo/e8f3afdf-a59c-4ebd-ae0c-15d47b9dd5e1/
users problems monitor root_tablet hdfs_reservations gc
table_locks namespaces recovery fate tservers tables
next_file tracers config masters bulk_failed_copyq dead
[zk: zoo1.example.com,zoo2.example.com,zoo3.example.com(CONNECTED) 0] ls /accumulo/e8f3afdf-a59c-4ebd-ae0c-15d47b9dd5e1/tservers
[tserver1.example.com:10011,tserver2.example.com:10011,tserver3.example.com:10011,tserver4.example.com:10011,tserver5.example.com:10011]
[zk: zoo1.example.com,zoo2.example.com,zoo3.example.com(CONNECTED) 1] The UUID in the top level /accumulo node is the internal id used to track your Accumulo instance. If there are multiple of these, you can find the one for your current cluster by listing all instance information (presuming you have an Accumulo gateway on the node). This utility is also an Accumulo internal, so neither its name, usage, nor output format can be counted on across versions. $> accumulo org.apache.accumulo.server.util.ListInstances
INFO : Using ZooKeepers zoo1.example.com,zoo2.example.com,zoo3.example.com
Instance Name | Instance ID | Master
---------------------+--------------------------------------+-------------------------------
"accumulo" | e8f3afdf-a59c-4ebd-ae0c-15d47b9dd5e1 |master2.example.com:10010
"dedicated" | 496b74ab-316c-41bc-badb-4b908039f725 |
"dedicatum" | e49b451b-4607-4a0e-9dda-b49dc938080e | 4) Is HDFS confused? Can you use hdfs commands from inside / outside of the cluster? E.g. can you list the root directory? Can the Accumulo user list their home directory or the /accumulo directory?
... View more
11-03-2014
01:49 PM
1 Kudo
Current versions of Spark don't have a spark-assembly jar artifact (see for example maven central for upstream). The assembly is used internally by distributions when executing Spark. Instead you should have a dependency for whichever part of Spark you make use of, e.g. spark-core.
... View more
11-03-2014
12:36 PM
maven.jenkins.cloudera.com is an internal repository used in our internal build and publishing process. It is currently online but it is not available outside of Cloudera's internal network.
... View more
11-03-2014
11:06 AM
You should only rely on released (i.e. non-SNAPSHOT) versions in your own projects. CDH 5.2.0 was released 14 Oct, so there are no longer SNAPSHOT versions of the artifacts. See the CDH documentation on using maven for info on what the proper version string is for the component you wish to use: CDH 5.2.0 Maven artifact coordinates
... View more
09-30-2014
11:26 AM
This error is because you have configured a minimum required replication rather than a default level of replication. Some systems, like Sqoop 2, purposefully set a low replication level for temporary files that they aren't worried about losing. With a required minimum replication the namenode will reject these requests as invalid. The fix is to update the minimum required replication back to 1. Do this by resetting the property dfs.namenode.replication.min.
... View more
09-22-2014
05:22 PM
Hi! I'd be happy to help you with this new problem. To make things easier for future users, how about we mark my answer for the original thread topic and start a new one for this issue?
... View more
09-21-2014
11:24 AM
1 Kudo
Hi! The problem you are getting is a known limitation of Accumulo on small clusters. By default Accumulo attempts to use a replication factor of 5 for the metadata table, ignoring the " table.file. replication" setting. Normally, Cloudera Manager does not set a max replication factor. This causes under-replication warnings until you can correct either the number of nodes or manually adjust the replication setting on that table. In your cluster, it appears the "dfs.replication.max" setting has been adjusted to match your number of cluster nodes. This is causing Accumulo's attempts to create new files for its internal tables to fail. Unfortunately, I'm not sure this can be fixed without data loss. However, to recover you should first edit the "dfs.replication.max" setting for HDFS to be >= 5. Then you should adjust the replication on the metadata and root tables to be <= your number of DataNodes. After that it should be safe to lower dfs.replication.max again. Adjust the replication in the accumulo shell: $> config -t accumulo.metadata -s table.file.replication=3
$> config -t accumulo.root -s table.file.replication=3
... View more
09-21-2014
11:08 AM
To make sure we have the same context, I think you're working through the bulk ingest overview's example. Please correct me if that's wrong. Before running any of the accumulo examples, you need to do some user set up. None of them should be run as root nor any of the service principles (accumulo, hdfs, etc). The user that will run the data generation needs to be able to run MapReduce jobs. See the full docs for instructions on provisioning such a user. In short, ensure they have a user account on all worker nodes and that they have a user directory in HDFS (creating said home directory will require action as the hdfs super user). The user you created above will be used for the data generation step. If you are running on a secure cluster, you will need to use your kerberos password before submitting the job. Otherwise the generation step only requires an initial local login. The data loading step requires an Accumulo user. You should create a user via the Accumulo shell. Be sure to replace the instance name, zookeeper servers, and user/password given in the ARGS line with ones appropriate for your cluster. This loading should not be done as the Accumulo root user. Let me know if you have any further problems.
... View more
06-26-2014
09:58 AM
Nope, no setting. It should *just work*. Something is amiss with your browser, just not sure what. Is this a machine you control yourself or is it managed by an IT group? Can you try copy/pasting the link into your browser address bar instead of clicking on it? Is there another web browser on the machine you could attemp to use? (It's worth noting that hte CM5 requirements state the minimum Firefox version is 11, but that shouldn't impact the NameNode UI page and this doesn't feel like a browser compatibility issue.)
... View more
06-26-2014
09:48 AM
Curl works, so that's good news. DNS at least works at teh OS level. The current problem appears to be with your web browser then. What browser is it? Do you know if it's running any kind of filtering add on?
... View more
06-26-2014
09:33 AM
is the single system your local workstation? is it a VM? check via curl to rule out your browser. recheck that DNS resolution works for hte hostname CM thinks the node should be.
... View more
06-26-2014
09:27 AM
1 Kudo
- Can you verify that DNS works from your workstation to resolve the host that is running the namenode you're trying to look at? - The message sounds like a firewall issue. Are you sure there isn't a firewall between you and the namenode machine preventing access? One way to zero in on the problem is to copy the namenode ui link from CM adn then check it from the machine running the namenode to rule out network interference. For example, if the namenode UI link is http://namenode1.example.com:50070/ [root@namenode1 ~]# curl http://namenode1.example.com:50070/
<meta HTTP-EQUIV="REFRESH" content="0;url=dfshealth.jsp"/>
<html>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<head>
<title>Hadoop Administration</title>
</head>
<body>
<h1>Hadoop Administration</h1>
<ul>
<li><a href="dfshealth.jsp">DFS Health/Status</a></li>
</ul>
</body>
</html>
[root@namenode1 ~]#
... View more
06-26-2014
08:51 AM
Which version of CM are you using? In general, the link is on the HDFS service page on the top line and in the "HDFS Summary" section. On CM 5.0.2 the latter is right under the "configured capacity" display and is labeled "namenode web ui." Circled in red: .
... View more
06-26-2014
08:28 AM
1 Kudo
Does the CM page for the HDFS service report everything as healthy? Can you try browsing to the NameNode UI via the link provided on the CM page for the HDFS service?
... View more
06-25-2014
11:29 PM
I'd recommend just going through the upgrade. CDH4 -> CDH5 is relatively painless, especially if you're doing new development. My guess is it will take you longer to get through patching and building than it takes to upgrade.
... View more
06-25-2014
10:43 AM
Hiya! That ticket is marked as a duplicat of HDFS-3405, which is fixed in CDH5 (5.0.0+). You can see this by looking at the release notes for HDFS in CDH 5.0.0 and seraching for HDFS-3405.
... View more
06-16-2014
06:45 AM
1 Kudo
In 1.5.x, the Accumulo scripts rely on HADOOP_PREFIX to load the hadoop native libraries. They ignore LD_LIBRARY_PATH entirely. On a parcel install you should do something like the example we ship with 1.4.4-cdh4.5.0 accumulo-env.sh: if [ -z "$HADOOP_HOME" ]
then
test -z "$HADOOP_PREFIX" && export HADOOP_PREFIX=/opt/cloudera/parcels/CDH/lib/hadoop
export HADOOP_HOME=$HADOOP_PREFIX
else
HADOOP_PREFIX="$HADOOP_HOME"
fi
test -z "$HADOOP_CLIENT_HOME" && export HADOOP_CLIENT_HOME=/opt/cloudera/parcels/CDH/lib/hadoop/client-0.20
test -z "$HADOOP_CONF_DIR" && export HADOOP_CONF_DIR="$HADOOP_PREFIX/etc/hadoop"
test -z "$ZOOKEEPER_HOME" && export ZOOKEEPER_HOME=/opt/cloudera/parcels/CDH/lib/zookeeper accumulo-site.xml: <property>
<name>general.classpaths</name>
<value>
$ACCUMULO_HOME/lib/[^.].*.jar,
$ZOOKEEPER_HOME/zookeeper[^.].*-[0-9].*.jar,
$HADOOP_CONF_DIR,
$HADOOP_CLIENT_HOME/[^.].*-[0-9].*.jar,
$HADOOP_PREFIX/lib/[^.].*.jar,
</value>
<description>Classpaths that accumulo checks for updates and class files.
When using the Security Manager, please remove the ".../target/classes/" values.
</description>
</property> In particular, note that HADOOP_PREFIX points at the lib dir for Hadoop within the CDH parcel, rather than the top level CDH dir. That last line in the classpath for HADOOP_PREFIX/lib is just to get around ACCUMULO-2301 / ACCUMULO-2786. So you could further limit it to just be the jars needed to satisfy the jetty dependency for the Monitor. If you're doing this as a new install, you're better off just running 1.6.0. It updated how native library loading was done to more closely match the expectations of Hadoop (and just general native code loading), so it goes much smoother with parcels.
... View more
03-19-2014
07:06 AM
1 Kudo
That's the problem then. The parcel for Accumulo 1.4.4-cdh4.5.0 relies on the extensibility additions in Cloudera Manager 5 beta 2. Can you update your Cloudera Manager version and try again? While Cloudera Manager 5 is in beta, it can happily install and manage any of our stable CDH releases, including the recommended version for running our latest Accumulo release: CDH 4.5.0.
... View more
03-14-2014
10:16 PM
Hi Johndoe! It's probably worth you posting a top-level question about getting things to work with the latest Quickstart VM. FWIW, I have tested the latest install instructions along with our current release (Accumulo 1.4.4-cdh4.5.0) on the 4.4 Quickstart VM and everything worked fine. Can you post your accumulo-site.xml somewhere that I can review it?
... View more
02-19-2014
11:56 AM
Your accumulo installation does not have the example configuration directories. ($ACCUMULO_HOME/conf/examples). Looking at our docs I see we tell people to move the conf directory to set up initial configs, probably on the assumption that people won't be rebuilding. I'll file an internal ticket to get that corrected. In the mean time, I'd recommend restoring the $ACCUMULO_HOME/conf directory from the dist tarball.
... View more
02-19-2014
11:36 AM
Hi Chris! Can you restore your Accumulo installation to pristine and then test? You should generally try to isolate things so that you can test one issue at a time. One way I do this is to take a snapshot of the VM after I verify any particular major change (i.e. finishing the installation instructions for Accumulo on the QuickStart VM).
... View more
02-17-2014
03:41 PM
The issue here looks like you previously did a build with the cloudera user. That build added all of the target/ directories for maven with persmissions exclusive to the cloudera user (for writing). So now when the Accumulo user attempts to build, it can't write. To fix this for now: Stop Accumulo. The rest of these instructions will delete the set of jars Accumulo needs to run. As the cloudera user run these commands to clear out the directories [cloudera@localhost ~]$ cd /usr/lib/accumulo
[cloudera@localhost accumulo]$ mvn -Dhadoop.profile=2.0 clean
[INFO] Scanning for projects...
<....skip Maven downloading the internet....>
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] accumulo .......................................... SUCCESS [0.745s]
[INFO] cloudtrace ........................................ SUCCESS [0.010s]
[INFO] accumulo-start .................................... SUCCESS [0.005s]
[INFO] accumulo-core ..................................... SUCCESS [0.040s]
[INFO] accumulo-server ................................... SUCCESS [0.035s]
[INFO] accumulo-examples ................................. SUCCESS [0.027s]
[INFO] examples-simple ................................... SUCCESS [0.014s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.345s
[INFO] Finished at: Mon Feb 17 14:51:22 PST 2014
[INFO] Final Memory: 6M/57M
[INFO] ------------------------------------------------------------------------
[cloudera@localhost accumulo]$ If that fails, you might have a mix of permissions between the cloudera and accumulo user. Run the clean command from above as root. Repeat the command line building from before as the accumulo user Start Accumulo again To fix this sustainably, you need to separate out building things from deploying them. This is especially true if you want to use a graphic IDE as the cloudera user. To do that, I'd recommend Place the code you want to work on somewhere for the cloudera user to interact with. I generally do this in a subdirectory of the home directory. [cloudera@localhost ~]$ mkdir -p projects/accumulo
[cloudera@localhost ~]$ tar -C projects/accumulo --strip=1 -xzf accumulo-1.4.3-cdh4.3.0-dist.tar.gz
[cloudera@localhost ~]$ cd projects/accumulo/
[cloudera@localhost accumulo]$ ls -a
. .. bin CHANGES cloudera conf contrib docs lib LICENSE logs NOTICE pom.xml README src test walogs
[cloudera@localhost accumulo]$ Out of habit, I would track whatever changes happen locally [cloudera@localhost accumulo]$ git config --global user.name "Sean Busbey"
[cloudera@localhost accumulo]$ git config --global user.email "busbey@cloudera.com"
[cloudera@localhost accumulo]$ git init
Initialized empty Git repository in /home/cloudera/projects/accumulo/.git/
[cloudera@localhost accumulo]$ git commit --allow-empty -m "Initial commit"
[master (root-commit) 6943bc0] Initial commit
[cloudera@localhost accumulo]$ git add *
[cloudera@localhost accumulo]$ git commit -m "Cloudera release 1.4.3-cdh4.3.0"
[master b4a9911] Cloudera release 1.4.3-cdh4.3.0
1551 files changed, 338162 insertions(+), 0 deletions(-)
<.... skip 1551 file paths ....>
[cloudera@localhost accumulo]$ git status
# On branch master
nothing to commit (working directory clean)
[cloudera@localhost accumulo]$ git log
commit b4a99119da3de3f9445b793ebd338dcbb310ace3
Author: Sean Busbey <busbey@cloudera.com>
Date: Mon Feb 17 15:05:50 2014 -0800
Cloudera release 1.4.3-cdh4.3.0
commit 6943bc00904b42c5ee6aec472e963f0188433267
Author: Sean Busbey <busbey@cloudera.com>
Date: Mon Feb 17 15:05:30 2014 -0800
Initial commit
[cloudera@localhost accumulo]$ Build however you're going to build (IDE or commandline). As an example, I would build via commandline maven. [cloudera@localhost accumulo]$ mvn -Dhadoop.profile=2.0 package
[INFO] Scanning for projects...
[WARNING]
<.... Skip warnings about plugin versions ....>
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] accumulo
[INFO] cloudtrace
[INFO] accumulo-start
[INFO] accumulo-core
[INFO] accumulo-server
[INFO] accumulo-examples
[INFO] examples-simple
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building accumulo 1.4.3-cdh4.3.0
[INFO] ------------------------------------------------------------------------
<.... skip maven downloading the internet ....>
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] accumulo .......................................... SUCCESS [3:11.272s]
[INFO] cloudtrace ........................................ SUCCESS [1:58.342s]
[INFO] accumulo-start .................................... SUCCESS [4:35.473s]
[INFO] accumulo-core ..................................... SUCCESS [59.175s]
[INFO] accumulo-server ................................... SUCCESS [1:53.334s]
[INFO] accumulo-examples ................................. SUCCESS [0.009s]
[INFO] examples-simple ................................... SUCCESS [7.238s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 12:46.212s
[INFO] Finished at: Mon Feb 17 15:21:16 PST 2014
[INFO] Final Memory: 80M/192M
[INFO] ------------------------------------------------------------------------
[cloudera@localhost accumulo]$ When you need to deploy the jars into your accumulo instance: replace the Accumulo deployment jars with those from the working directory and restart Accumulo (all with the accumulo user). [cloudera@localhost accumulo]$ sudo -u accumulo rm -f /usr/lib/accumulo/lib/*.jar
[cloudera@localhost accumulo]$ sudo -u accumulo cp lib/*.jar /usr/lib/accumulo/lib/
[cloudera@localhost accumulo]$ sudo su - accumulo
[accumulo@localhost ~]$ $ACCUMULO_HOME/bin/stop-all.sh
^CInvalid password or unable to connect to the master
Press Ctrl-C to cancel now, or force shutdown in 15 seconds
Utilities and unresponsive servers will be shut down in 5 seconds
stopping monitor on localhost.localdomain
stopping tracer on localhost.localdomain
stopping monitor on localhost.localdomain
stopping unresponsive tablet servers (if any) ...
stopping unresponsive tablet servers hard (if any) ...
Cleaning tablet server and logger entries from zookeeper
Cleaning all server entries in zookeeper
[accumulo@localhost ~]$ $ACCUMULO_HOME/bin/start-all.sh
Starting tablet servers and loggers .... done
Starting tablet server on localhost.localdomain
Starting logger on localhost.localdomain
2014-02-17 15:34:41,763 [server.Accumulo] INFO : Attempting to talk to zookeeper
2014-02-17 15:34:42,136 [server.Accumulo] INFO : Zookeeper connected and initialized, attemping to talk to HDFS
2014-02-17 15:34:47,210 [server.Accumulo] INFO : Connected to HDFS
Starting master on localhost.localdomain
Starting garbage collector on localhost.localdomain
Starting monitor on localhost.localdomain
Starting tracer on localhost.localdomain
[accumulo@localhost ~]$ exit
logout
[cloudera@localhost accumulo]$ Note that this process is only needed because you're altering the jars that make up Accumulo itself. Under normal operations, you should not be altering this source nor copying jars into Accumulo's installation. Any jars you need, either for client side applications or for a custom iterator, should be deployed in their own location and referenced as appropriate from your client or server classpaths.
... View more
02-17-2014
03:21 PM
Yes. Those entries are so you can test changes to the Accumulo code by recompiling and restarting Accumulo (i.e. skipping the step of building jars and deploying them). I generally recommend never including those paths in Accumulo's classpath because They should never be present in a production deployment and it's a bad idea to intentionally introduce differences between dev and production (in this case being packaged in jars) The main code base doesn't build unless you use the package goal anyways, so you'll already have jars ready to go If you are building directly within your Accumulo installation (which you should not do), the default package goal will give you your changes loaded upon restart just as easily, because the package goal will have put the repackaged Accumulo jars into the lib directory.
... View more