Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Pig to ElasesticSearch getting ERROR

Highlighted

Pig to ElasesticSearch getting ERROR

Expert Contributor

Trying to store pig expected output into elastic-search index.

But getting String index out of range: -1 exception

Expected-output:-

(google_1473682742_265278445560,{(Thu Apr 12 17:38:47 +0000 2012,190494185374220289,190494185374220289,google اااااح الاجواء بتاعت سكس حااااارر منو الفحل اللي يبي اسوي له  فولو يسوي رتويت,<a href="http://blackberry.com/twitter" rel="nofollow">Twitter for BlackBerry®</a>,[hashtags#[],user_mentions#[],urls#[]],false,,0,false,),(Thu Apr 12 17:38:47 +0000 2012,190494185382608899,190494185382608899,kpit 味も素っ気もない人間とは…。,<a href="http://tapbots.com/tweetbot" rel="nofollow">Tweetbot for iOS</a>,[hashtags#[],user_mentions#[],urls#[]],false,,0,false,)})

describe output;-

output: {pattern: chararray,tweets: {(lowertweets::created_at: chararray,lowertweets::id: chararray,lowertweets::id_str: chararray,lowertweets::text: chararray,lowertweets::source: chararray,lowertweets::entities: map[chararray],lowertweets::favorited: boolean,lowertweets::favorite_count: long,lowertweets::retweet_count: long,lowertweets::retweeted: boolean,lowertweets::place: map[chararray])}}

script:-

STORE A INTO 'google_1473673952_265276863360/tweets' USING org.elasticsearch.hadoop.pig.EsStorage('es.nodes = ip:9200');

Curl Script:-

curl -XPUT 'http://hostname:9200/google_1473673952_265276863360/_mapping/tweets' -d ' {     "tweets" :{   	"properties" : {            "pattern" : " {"type" : "string", "store" : true},	"created_at" : {"type" : "string", "store" : true },"id" : {"type" : "string", "store" : true }, "id_str" : {"type" : "string", "store" : true },"text" : {"type" : "string", "store" : true },"source" : {"type" : "string", "store" : true },"entities" : {"type" : "string", "store" : true },"favorited" : {"type" : "boolean", "store" : true },"favorite_count" : {"type" : "string", "store" : true },"retweet_count" : {"type" : "string", "store" : true },"retweeted" : {"type" : "boolean", "store" : true },"place" : {"type" : "string", "store" : true } }}}'

Error:-

java.lang.Exception: java.io.IOException: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) Caused by: java.io.IOException: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:479) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:442)

I tried by changing the datatypes in curl but it didn't worked me

Any help

7 REPLIES 7
Highlighted

Re: Pig to ElasesticSearch getting ERROR

@Mohan V

I believe the problem may be that you are defining your mapping to use string for some of the data elements, however, they are nested element types in Pig. For example, look at entities:

lowertweets::entities: map[chararray]

In your template you have this:

"entities" : {"type" : "string", "store" : true }

So Elasticsearch is expecting that Entities is a string field, not a nested object field. This is also true for place:

lowertweets::place: map[chararray])
"place" : {"type" : "string", "store" : true }

You may want to look at: https://www.elastic.co/guide/en/elasticsearch/hadoop/current/pig.html

es.mapping.pig.tuple.use.field.names true

And this bug may be relevant:

https://issues.apache.org/jira/browse/PIG-3646

Highlighted

Re: Pig to ElasesticSearch getting ERROR

Expert Contributor

thanks for your reply Michael Young.

I tried what you have suggested me.

But still getting the same issue.

here is what i have tried,

STORE A INTO 'google_1473673952_265276863360/tweets' USING org.elasticsearch.hadoop.pig.EsStorage('es.nodes = ip:9200','es.mapping.pig.tuple.use.field.names = true');

Error:-

Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1967)

Do I need to change the curl script.If it is, then how can we give the datatype as array to map in es curl mapping.

Please suggest as I am completely new to this

Highlighted

Re: Pig to ElasesticSearch getting ERROR

@Mohan V

Yes, you need to change the template you are passing to Elasticsearch with the curl command. Here is the documentation for nested objects: https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-mapping.html

Here is an example of a nested definition:

{
  "mappings": {
    "blogpost": {
      "properties": {
        "comments": {
          "type": "nested", 
          "properties": {
            "name":    { "type": "string"  },
            "comment": { "type": "string"  },
            "age":     { "type": "short"   },
            "stars":   { "type": "short"   },
            "date":    { "type": "date"    }
          }
        }
      }
    }
  }
}

For the nested objects, entities and place in your case, you need to specify type "nested" instead of "string". Then addd a "properties" node with the content that is expected in that. In the example above you might see data that looks like:

{
  "blogpost" : {
    "comments" : {
      "name" : "Bob",
      "comment" : "This is my comment",
      "age": 43,
      "stars" : 5,
      "date" : "20160914"
    }
  }
}
Highlighted

Re: Pig to ElasesticSearch getting ERROR

Expert Contributor

I tried what you have suggested Michael Young.

Curl Script:-

curl -XPUT 'http://hostname:9200/google_1473673952_265276863360/_mapping/tweets' -d '{  "tweets" : {     "properties" : {  "comments": {  "type": "nested",  "properties": {  "pattern" : {"type" : "string", "store" : true},  "created_at" : {"type" : "string", "store" : true },  "id" : {"type" : "string", "store" : true },  "id_str" : {"type" : "string", "store" : true },  "text" : {"type" : "string", "store" : true },  "source" : {"type" : "string", "store" : true },  "entities" : {"type" : "string", "store" : true },  "favorited" : {"type" : "boolean", "store" : true },  "favorite_count" : {"type" : "long", "store" : true },  "retweet_count" : {"type" : "long", "store" : true },  "retweeted" : {"type" : "boolean", "store" : true },  "place" : {"type" : "string", "store" : true }  }  }  }  }}'

when i tried this I got the same error ie. string out of bound exception.

I have changed the curl script and tried this.

curl -XPUT 'http://hostname:9200/google_1473673952_265276863360/_mapping/tweets' -d '
{
  "tweets" : { 
 "properties" : {
  "comments": {  
"type": "nested",  
"properties": {  
	"pattern" : {"type" : "string", "store" : true},
	  "created_at" : {"type" : "string", "store" : true },
	  "id" : {"type" : "string", "store" : true },  
	"id_str" : {"type" : "string", "store" : true }, 
	 "text" : {"type" : "string", "store" : true },
	  "source" : {"type" : "string", "store" : true },
	  "entities" :{  
		"properties" : {  
			"type": "nested",  
			"properties": { 
				 "urls": {"type": "string"},
				  "hashtags": {"type": "string"}, 
				 "user_mentions": {"type": "string"},  
				"symbols": {"type": "string"}  
					}  
				}  
			},
	"favorited" : {"type" : "boolean", "store" : true },
	"favorite_count" : {"type" : "long", "store" : true },
	"retweet_count" : {"type" : "long", "store" : true },
	"retweeted" : {"type" : "boolean", "store" : true },
	"place" :"properties":{
           "comments":{
            "type": "nested",
            "properties": {
                      }
                 }
             } 
  }  }  }  }}'
But i am unable to map it.getting error as 

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"mapper [comments.entities] of different type, current_type [string], merged_type [ObjectMapper]"}],"type":"illegal_argument_exception","reason":"mapper [comments.entities] of different type, current_type [string], merged_type [ObjectMapper]"},"status":400}

Please suggest me what i am missing.

Highlighted

Re: Pig to ElasesticSearch getting ERROR

@Mohan V

I believe you have an issue with your template. You have an extra properties value under entites:

"entities" :{
  "properties" : {
     "type": "nested",
     "properties": { 

You should instead have:

"entities" :{
  "type": "nested",
  "properties": { 

You can also try it without "type" : "nested" like this:

"entities" :{
  "properties": { 
Highlighted

Re: Pig to ElasesticSearch getting ERROR

Expert Contributor

I am soo thankful to your reply Michael Young.

I have tried everything what you have suggested, but no luck.

And also when i tried to map a single attribute that is pattern from the below output

describe a;

a: {pattern: chararray,tweets: {(lowertweets::created_at: chararray,lowertweets::id: chararray,lowertweets::id_str: chararray,lowertweets::text: chararray,lowertweets::source: chararray,lowertweets::entities: map[chararray],lowertweets::favorited: boolean,lowertweets::favorite_count: long,lowertweets::retweet_count: long,lowertweets::retweeted: boolean,lowertweets::place: map[chararray])}}

Index and mapping were done without any issues, but when i try to store it using

pig script:-

STORE A INTO 'google_1473673952_265276863360/tweets' USING org.elasticsearch.hadoop.pig.EsStorage('es.nodes = hostname:9200','es.mapping.pig.tuple.use.field.names = true');

then again the same issue.

Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1967)

Please help me.

Highlighted

Re: Pig to ElasesticSearch getting ERROR

@Mohan V

The error message you are seeing is coming from Pig correct? What do the Elasticsearch logs indicate is happening on that side?

Looking at your initial tweet example, I wonder if the problem may be related to a Left-To-Right, Right-To-Left language issue causing a problem. I can't say that I've seen it with your particular example before, but it can be known to cause issues.

Don't have an account?
Coming from Hortonworks? Activate your account here