Member since
12-09-2015
22
Posts
2
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2399 | 04-25-2016 05:31 PM | |
2113 | 12-09-2015 02:40 PM |
10-07-2016
10:49 AM
I'm looking for a faster way of getting a row count from a partition. select count(*) does a good job, but it takes too long and requires too many resources. I'm trying to use hive --orcfiledump command to get the correct value, finding lines that look like this: Stripe: offset: 3 data: 18297929 rows: 76771 tail: 338 index: 6535 and adding up the "rows" values. Unfortunately, the total I'm getting is significantly off. In one case, count(*) gets 309250764 (which matches the rows output from my reducer), but orcfiledump tells me there are 312247462 rows. Can anyone help me understand the difference?
... View more
09-17-2016
09:46 AM
I have dozens of tables with daily partitions, some of which require concatenation after creation, some of which don't. I'm not sure what to expect when I call concatenate on these partitions. Should it produce (bytecount/blocksize) files of just under the blocksize? Should it produce (square root of line count) files of indeterminate size? Is there a way to tune it? Specifically, I'm trying to reduce my small file problem, but I don't want to call concatenate on partitions if it won't actually do anything. Thanks in advance.
... View more
06-23-2016
01:50 PM
Is there a command line interface by which I can set up, start, and stop services running on a slave node? I could just hit those end points and do everything manually, but then Cloudera would consider those changes to be an error state.
... View more
06-16-2016
09:30 AM
I haven't found a way to do that, but you could drop the jar into a directory specified in Hive Auxiliary JARs Directory in the configuration.
... View more
06-13-2016
11:53 AM
It looks like this fix didn't make it into 5.7.1: https://issues.apache.org/jira/browse/HBASE-15152 http://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_fixed_in_57.html Note that, as a work-around, you can manually add the prefix-tree jars to your classpath.
... View more
05-10-2016
12:58 PM
it looks like it can't find PrefixTreeCodec, which is clearly in <parcels>/CDH/jars/hbase-prefix-tree-1.2.0-cdh5.7.0.jar Can someone tell me why this warning is showing up any time I call a hive query?
... View more
04-28-2016
09:52 AM
1 Kudo
Here's a link to the Quickstart VM for 5.7: http://www.cloudera.com/downloads/quickstart_vms/5-7.html Hope that helps.
... View more
04-26-2016
05:25 PM
Sounds like you're on the right track. I had the same issue the first time I performed an upgrade. Had to roll it back and try again. Good thing I backed it all up.
... View more
04-25-2016
05:31 PM
This issue was corrected by using the sqlite3 command line interface and (alter table search_collection add column owner_id int NULL) No issues were seen after that.
... View more
04-25-2016
02:00 PM
For reference, the table in the database looks like this: CREATE TABLE "search_collection" ( "properties" text NOT NULL, "sorting_id" integer NOT NULL, "name" varchar(40) NOT NULL, "facets_id" integer NOT NULL, "enabled" bool NOT NULL, "label" varchar(100) NOT NULL, "is_core_only" bool NOT NULL, "result_id" integer NOT NULL, "cores" text NOT NULL, "id" integer NOT NULL PRIMARY KEY); and the create table command in hue looks like this: db.create_table('search_collection', ( ('properties', self.gf('django.db.models.fields.TextField')(default='{}')), ('sorting', self.gf('django.db.models.fields.related.ForeignKey')(to=orm['search.Sorting'])), ('name', self.gf('django.db.models.fields.CharField')(max_length=40)), ('facets', self.gf('django.db.models.fields.related.ForeignKey')(to=orm['search.Facet'])), ('enabled', self.gf('django.db.models.fields.BooleanField')(default=True, blank=True)), ('label', self.gf('django.db.models.fields.CharField')(max_length=100)), ('is_core_only', self.gf('django.db.models.fields.BooleanField')(default=False, blank=True)), ('result', self.gf('django.db.models.fields.related.ForeignKey')(to=orm['search.Result'])), ('cores', self.gf('django.db.models.fields.TextField')(default='{}')), ('id', self.gf('django.db.models.fields.AutoField')(primary_key=True)), )) So the field "owner_id" is definitely not there, and it looks like it shouldn't be, so why is it an error? Can I just create an owner_id field with some default value, or will that mess up something worse?
... View more
04-25-2016
01:07 PM
I'm following the instructions: http://www.cloudera.com/documentation/enterprise/5-5-x/topics/cdh_ig_hue_database.html When dumping the sqlite database, I get the following error: sudo -u hue /opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/hue/build/env/bin/hue dumpdata > /tmp/huedump.json
CommandError: Unable to serialize database: no such column: search_collection.owner_id We started on version 5.3.3, upgraded to 5.4.7, and have now completed an upgrade to 5.7.0. Can anyone tell me how to fix/get around this?
... View more
04-25-2016
09:36 AM
Make sure you do a "yum clean all" before doing the installation to clean caches.
... View more
04-25-2016
09:29 AM
Recently upgraded from 5.5 to 5.7. When I execute hive commands from the command line, I now get the following exception. The line it throws this on calls " Logger . getLogger( " org.apache.hadoop.hbase " );". Can anyone tell me why this is, and what I can do to make this go away? log4j:WARN Caught Exception while in Loader.getResource. This may be innocuous.
java.lang.SecurityException: Invalid signature file digest for Manifest main attributes
at sun.security.util.SignatureFileVerifier.processImpl(SignatureFileVerifier.java:286)
at sun.security.util.SignatureFileVerifier.process(SignatureFileVerifier.java:239)
....
at org.apache.log4j.helpers.Loader.getResource(Loader.java:97)
at org.apache.log4j.LogManager.<clinit>(LogManager.java:107)
at org.apache.log4j.Logger.getLogger(Logger.java:104)
at org.apache.hadoop.hbase.util.MapreduceDependencyClasspathTool.main(MapreduceDependencyClasspathTool.java:66)
2016-04-25 16:04:10,850 WARN [main] mapreduce.TableMapReduceUtil: The hbase-prefix-tree module jar containing PrefixTreeCodec is not present. Continuing without it.
... View more
04-25-2016
09:23 AM
It looks like you installed Java as hduser instead of as root. This will prevent other users from accessing it. Could you try uninstalling java, and re-installing it as root?
... View more
04-25-2016
09:17 AM
Having more recently upgraded from 5.3 to 5.5, the upgrade from 5.5 to 5.7 went a lot smoother. The hardest part I found was managing the psql database. Can you tell me what part, specifically, you found issues with?
... View more
04-14-2016
03:02 PM
1 Kudo
I'm attempting to build a UDF that uses fasterxml jackson implementation. Currently it attempts to call functions that don't exist in whatever version it's actually accessing. I've noticed that there are five different versions of the Jackson jar in the parcels/jar directory. Which one should I target for my project?
... View more
- Tags:
- Jackson
01-20-2016
12:21 PM
We perform a lot of distcp jobs - an order of magnitude more than our other types of jobs - and they are overwhelming our historyserver, resulting in GC Overhead Exceptions left and right. Is there a way to either not store their history in the history server, or clean them out on a faster schedule than the rest of the jobs?
... View more
12-30-2015
10:49 AM
Thanks, Romain. Not sure how to tag this answer. It terminates the search on the issue, but I don't think I can fairly call it a solution.
... View more
12-23-2015
08:02 AM
Everything was installed via Cloudera Manager, with parcels.
... View more
12-22-2015
06:02 PM
We're on Centos, and I'm attempting to upgrade our version of Hue (running on CDH 5.4) to the newer version, so that we can make use of the Zookeeper browser. Unfortunately, the instructions don't actually follow the condition of the server. For instance, the first step is to shut down the Hue service: sudo service hue stop The server is telling me that hue is not a valid service. When I looked for it, it insisted that Hue wasn't an installed package, either. I suspect that this has something to do with the Parcel install. Are there alternate instructions for bumping the version of Hue without installing a new version of CDH?
... View more
12-09-2015
02:40 PM
This issue was resolved by removing the 5.3.6 libraries from the cluster. It seems that Cloudera leaves the old libraries in the path after the new libraries are installed.
... View more
12-09-2015
01:28 PM
I just upgraded my cluster from 5.3.6 to 5.4.8, and can no longer access my ORCFile formatted tables from Hive. It's giving me the following exception. CDH 5.4.8 states that it's using Hive 1.1.0, although this specific error is reported as existing in Hive 1.2, and is correctable by upgrading the Kryo library to the latest version. Has anyone else seen this? Is there a way of fixing it less severe than rolling back my installation? org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index: 109, Size: 75
Serialization trace:
operatorId (org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator)
childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
... View more