Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Avro type information shown with indexed fields when trying out Cloudera Search using QuickStart VM.

Solved Go to solution

Avro type information shown with indexed fields when trying out Cloudera Search using QuickStart VM.

New Contributor

I'm trying out the Cloudera QuickStart VM and found it pretty straightforward to try out basic WordCount M/R example, Hive queries on sample CSVs. I was most eager to try out Cloudera Search.

 

I followed the steps from the blog here. The ~/datasets/batch-tweets.sh script seemed to run fine - the MapReduceIndexer took 3 to 4 minutes and jobs seemed to succeed. I could see what looks like a Lucene index in HDFS under /solr/batch_tweets/core_node1/data/index. So far so good. I fired up the Hue Solr Search tool and tried customizing how search results are formatted. This works partially but each field in a set of results is preceded by what looks like Avro type information e.g. if the template looks like: {{text}} {{user_name}} the results preview shows the following:

org.apache.avro.util.Utf8:tweet text 10782 org.apache.avro.util.Utf8:fake user10782

 

I also tried using avro-tools to read the sample data that the batch-tweets script pulls in for indexing:

java -jar ~/avro-tools-1.7.3.jar tojson  /usr/share/doc/search-1.0.0/examples/test-documents/sample-statuses-20120906-141433-medium.avro | less

 

The avro files seemed to read just fine.

 

Is it possible that there's been some change to the QuickStart VM since the blog was posted last summer? Any suggestions welcome.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Avro type information shown with indexed fields when trying out Cloudera Search using QuickStart

Expert Contributor
Make sure to run search-1.1.0.

4 REPLIES 4

Re: Avro type information shown with indexed fields when trying out Cloudera Search using QuickStart

New Contributor

A small additional piece of information is that by exploring the contents of the SOLR index via the Solr Admin web UI I can see that certain fields do indeed seem to be indexed with "org.apache.avro.util.Utf8:" prefix on the original strings. The fields in question are:

  • user_screen_name
  • user_location
  • text
  • user_name
  • source

From the batch_tweets.sh script I can see how it invokes the MapReduceIndexerTool pointing at the batch_tweets_indir location in HDFS (which contains the input data in avro format). From what I can understand I believe the morphline may be key to processeding the input data in HDFS and passing on to the indexer. Doe anybody know if that's a good place to dig further or should I look into the source code for MapReduceIndexerTool?

Re: Avro type information shown with indexed fields when trying out Cloudera Search using QuickStart

Expert Contributor
I think this has been fixed in more recent versions of Cloudera Search.

Highlighted

Re: Avro type information shown with indexed fields when trying out Cloudera Search using QuickStart

New Contributor

That's good to know. I believe I have the most recent quickstart VM (4.4.0-1). Are updated versions of the VMs made available regularly? Or do you know if this something that can be "patched" within the VM? (I'd like to demo something based on the search functionality with a view to requesting that our enterprise (cloudera but not sure what CDH version yet) cluster have Search enabled..)

 

Re: Avro type information shown with indexed fields when trying out Cloudera Search using QuickStart

Expert Contributor
Make sure to run search-1.1.0.

Don't have an account?
Coming from Hortonworks? Activate your account here