Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Issue while store streaming data into Druid Datastore

Highlighted

Issue while store streaming data into Druid Datastore

Contributor

Hi All, I want to load data into druid from Kafka topic. I have created the specification file in JSON format. I have already fetched the data as CSV format into Kafka broker.

Sample data:

timestamp,open,high,low,close,volume 
2018-07-20 05:08:00,1990.8000,1991.5500,1990.8000,1991.0000,1321 
2018-07-20 05:07:00,1991.1000,1991.1500,1990.6000,1991.0500,2387 
2018-07-20 05:06:00,1991.0000,1991.3000,1991.0000,1991.1000,1776 
2018-07-20 05:05:00,1991.7500,1991.8000,1990.5000,1991.0500,5988 
2018-07-20 05:04:00,1991.9500,1992.0000,1991.7500,1991.7500,1646 
2018-07-20 05:03:00,1992.0000,1992.0500,1991.8500,1991.9500,3272 

Now, I want to push this data into Druid datastore named "stockexchange".

supervisor.json

[ { "dataSource" : [ { "spec" : { "dataSchema" : { "granularitySpec" : { "queryGranularity" : "none", "type" : "uniform", "segmentGranularity" : "hour" }, "dataSource" : "stockexchange", "parser" : { "type" : "string", "parseSpec" : { "format" : "csv", "timestampSpec" : { "format" : "auto", "column" : "timestamp" }, "columns" : ["timestamp","open","high","low","close","volume"], "dimensionsSpec" : { "dimensions" : ["open","high","low","close","volume"] } } }, }, "ioConfig" : { "type" : "realtime" }, "tuningConfig" : { "type" : "realtime", "intermediatePersistPeriod" : "PT10M", "windowPeriod" : "PT10M", "maxRowsInMemory" : 75000 } }, "properties" : { "task.partitions" : "2", "task.replicants" : "2", "topicPattern" : "stockexchange.*", "topicPattern.priority" : "1" } } ], "properties" : { "zookeeper.connect" : "ip-xxx-xx-xxxx-xx.ec2.internal:2181", "zookeeper.timeout" : "PT20S", "druid.selectors.indexing.serviceName" : "druid/overlord", "druid.discovery.curator.path" : "/druid/discovery", "kafka.zookeeper.connect" : "ip-xxx-xx-xxxx-xx.ec2.internal:2181", "kafka.group.id" : "xxxx-xxxxx-xxxx", "consumer.numThreads" : "2", "commit.periodMillis" : "15000", "reportDropsAsExceptions" : "false" } } ] 

But, when I post data into Druid datastore using following curl command, it throws following error:

Command:

curl -X POST -H 'Content-Type: application/json' -d @supervisor.json http://ec2-xxx-xx-xxxx-xxxx.compute-1.amazonaws.com:8090/druid/indexer/v1/supervisor 

Error:

{"error":"Unexpected token (START_OBJECT), expected VALUE_STRING: need JSON String that contains type id (for subtype of io.druid.indexing.overlord.supervisor.SupervisorSpec)\n at [Source: HttpInputOverHTTP@1f0c750b; line: 1, column: 2]"}

I have searched for this error, but I didn't find any particular solution.

Please kindly help me to solve this error.

Regards,

Jay.

2 REPLIES 2

Re: Issue while store streaming data into Druid Datastore

Expert Contributor

@JAy PaTel

It looks like you don't have properly formatted JSON. Try running your JSON through an online validator like this one before you submit to Druid: https://jsonformatter.curiousconcept.com/

I've identified the issue to be an extra comma you have in your spec, here's the corrected JSON:

[  
   {  
      "dataSource":[  
         {  
            "spec":{  
               "dataSchema":{  
                  "granularitySpec":{  
                     "queryGranularity":"none",
                     "type":"uniform",
                     "segmentGranularity":"hour"
                  },
                  "dataSource":"stockexchange",
                  "parser":{  
                     "type":"string",
                     "parseSpec":{  
                        "format":"csv",
                        "timestampSpec":{  
                           "format":"auto",
                           "column":"timestamp"
                        },
                        "columns":[  
                           "timestamp",
                           "open",
                           "high",
                           "low",
                           "close",
                           "volume"
                        ],
                        "dimensionsSpec":{  
                           "dimensions":[  
                              "open",
                              "high",
                              "low",
                              "close",
                              "volume"
                           ]
                        }
                     }
                  }
               },
               "ioConfig":{  
                  "type":"realtime"
               },
               "tuningConfig":{  
                  "type":"realtime",
                  "intermediatePersistPeriod":"PT10M",
                  "windowPeriod":"PT10M",
                  "maxRowsInMemory":75000
               }
            },
            "properties":{  
               "task.partitions":"2",
               "task.replicants":"2",
               "topicPattern":"stockexchange.*",
               "topicPattern.priority":"1"
            }
         }
      ],
      "properties":{  
         "zookeeper.connect":"ip-xxx-xx-xxxx-xx.ec2.internal:2181",
         "zookeeper.timeout":"PT20S",
         "druid.selectors.indexing.serviceName":"druid/overlord",
         "druid.discovery.curator.path":"/druid/discovery",
         "kafka.zookeeper.connect":"ip-xxx-xx-xxxx-xx.ec2.internal:2181",
         "kafka.group.id":"xxxx-xxxxx-xxxx",
         "consumer.numThreads":"2",
         "commit.periodMillis":"15000",
         "reportDropsAsExceptions":"false"
      }
   }
]

Re: Issue while store streaming data into Druid Datastore

New Contributor

@JAy PaTel

Additionally to the answer of anarasimham i think you have to specify the type Property at the start of your JSON String like that:

{  "type" : "index_hadoop",  
   "spec" : {  
	"dataSchema" : {....

On this Page you can find the following instructions:

propertydescriptionrequired?
typeThe task type, this should always be "index_hadoop".yes

Hope this helps,

Regards,

Michael