About RobV

RobV · ‎06-24-2015

To be complete, yes you need to use the safety valves to get the correct order of the coprocessors. You also need to set the HFile version to 3, else Hbase won't start with these coprocessors. I find this last one odd, because Hbase 1.0 should use 3 by default, as per the docs. Anyway, use the hbase documentation sample config as a sample of which setting you need where. http://archive.cloudera.com/cdh5/cdh/5/hbase-1.0.0-cdh5.4.0/book.html#security.example.config

RobV · ‎05-20-2015

Instaling CDH5.4 with kerberos security gives me the opportunity to make grants to namespaces etc, but I want to enable visibility labels as well, which seem to be disabled by default. Cloudera documentation only tells me this feature is experimental, but not how to enable it. Apache Book shows to add the proper coprocessors, but it also mentions the proper order of the coprocessors. As from: http://archive.cloudera.com/cdh5/cdh/5/hbase-1.0.0-cdh5.4.0/book.html#security.example.config ..I tried adding "org.apache.hadoop.hbase.security.visibility.VisibilityController" via the cloudera manager, but when reviewing the config changes, I see that the order is not correct, it's adding the Visibility Label in from of the (apparantly default AccessControler and TokenProvider, which is the incorrect order. Any other way to enable this feature or to maintain the proper order?

RobV · ‎09-02-2014

To be clear, you wil also only get the column families, not the columns within those. You didn't define those at create-time either, but just to be complete 🙂

RobV · ‎08-27-2014

To answer my own question: No this is not possible at this time, since Solr only started supporting nested documents since 4.5 and CDH5.1 is at 4.4 right now. Even if this becomes available in a future release the question will be whether or not this can be easily integrated and used with Kites morphlines. For getting the job done I had to switch to using ElasticSearch, which does support nested documents and used Flume's ElasticSearchSink. Flume's official documetation on elasticsearch and avro is lacking and I had to patch flume code to get it working with UTF-8 charset and Json, but it's working nonetheless. Hope I can move this dataflow to the better integrated SolrCloud in the future.

RobV · ‎08-21-2014

I have an avro input source, going through a morphline into Solr. For example the following structure: { "username" : "alex" "date" : "21-08-2014" "attachments" : [ "documents" : [ { "title": "test" "tags" : [ "a", "b", "c" ] }, { "optional1" : "test2" "title" : "test2" } ], "source" : "school" ] } I can extract with extractAvroPath, like so: ... { extractAvroPaths { flatten : true paths : { /my_user : /username # this works fine /my_attachments : "/attachments[]" /my_documents : "/attachments[]/documents[]" } } } ..... The problem being that /my_attachments or /my_documents now contain raw json/avro structures instead of a single field. How would I go about 'unwrapping' these fields so that they are all part of one solr document, while still retaining their context of the document they belong to?

RobV · ‎08-21-2014

Srry too quick on the post trigger: this was just a matter of quoting the mapping: /my_attachments : "/attachments[]" The question remains how to map the structure to a solr index, but will post that in the appropriate section.

RobV · ‎08-21-2014

I'm having trouble extracting a nested structure from my avro data. _attachment_body=[ { "username" : "alex" "date" : "21-08-2014" "attachments" : [ "documents" : [ { "title": "test" "tags" : [ "a", "b", "c" ] }, { "optional1" : "test2" "title" : "test2" } ], "context" : "school" ] } Extracting the paths with Avro: ... { extractAvroPaths { flatten : true paths : { /my_user : /username # this works fine # all three of these result in the same error message, with different patemeters /my_attachments : /attachments[] /my_documents : /attachments[]/documents[] /my_contexts : /attachments[]/documents[]/context } } } Results in the following error message: com.typesafe.config.ConfigExceptionWrongType: morph-solr.conf: 30: Cannot concatenate object or list with a non-object-or-list, ConfigString("/my_attachments") and SimpleConfigList([]) are not compatible. Eventualy I would like to map the fields to a solr index. So if its possible to extract the nested structures, the followup question would be how to map those to a solr schema, but lets take it one step at a time 🙂

RobV · ‎04-17-2014

I'm wondering if there ever is a reason for solr to be in the root of a zookeeper install. Shouldn't it always be in some path inside '/'? In that case --zk being '/' would indicate a problem, either in configuration or in the user making a mistake, something you could alert on or even refuse to run. Adding the prompt on --force would be a great step and I see the use of the -y option.

RobV · ‎04-16-2014

Yes we manage the cluster with CM. Reading your reply I'm now sure the new edge node we added did not get a 'deploy client config' so was missing the proper settings. Not knowing this at the time, the solrctl did not work as expected(without the proper client configs) I remember manually adding them to the solrctl command, most likely without the required /solr root, resulting in the wipe of zookeeper /. Thanks for clearing this up. Still for a solr CLI tool to default back to '/' of the entire quorum, without any notice and clearing it with a --force is pretty scarry and not what you expect as an end user of a solr specific tool. Thanks for filing the reports, Rob

RobV · ‎04-03-2014

I had an 'interesting' experience setting up cloudera search as an addition to a not to shabby hbase cluster. Problems started when I created a collection with a trailing '/ ' , which is not allowed apparently. In hindsight I now know that this created a item in the overseer queue, which could not be processed, blocking all further requests. Showing up in the logs as the overseer being in a loop. When I did not know this I tried a 'solrctl init', which did not work. After reading the warnings that this could mess up any previous solr state, which we didn't have, i continued using "solrctl init --force". I was a little surprised to see that the entire /hbase entry in zookeeper was wiped clean and all of hbase being in a state of panic, losing it's entire administration. Revering back to zookeeper snapshots got my hbase back up and running, but I'm still baffled on: 1. How could this have happened? 2. If this is even a remote possibility of this command, I would recommend adding some extra red flags around the documentation recommending this option. I'm running CDH4.5 with solr 1.1.

Online	Offline
Last Visited	‎06-05-2018 02:41 PM

Member Since	‎01-30-2014 11:24 AM
Last Visited	‎06-05-2018 02:41 PM
Posts	25

Cloudera Community

Re: index nested structure

Re: extractAvroPaths nested structure

Re: CDH5.4 Enabling Visibility Labels

CDH5.4 Enabling Visibility Labels

Re: Hbase table schema

Re: index nested structure

index nested structure

Re: extractAvroPaths nested structure

extractAvroPaths nested structure

Re: solrctrl init --force wiped zookeeper hbase en...

Re: solrctrl init --force wiped zookeeper hbase en...

solrctrl init --force wiped zookeeper hbase entrie...