Support Questions
Find answers, ask questions, and share your expertise

Issue while store streaming data into Druid Datastore

Issue while store streaming data into Druid Datastore

Contributor

Hi All, I want to load data into druid from Kafka topic. I have created the specification file in JSON format. I have already fetched the data as CSV format into Kafka broker.

Sample data:

timestamp,open,high,low,close,volume 
2018-07-20 05:08:00,1990.8000,1991.5500,1990.8000,1991.0000,1321 
2018-07-20 05:07:00,1991.1000,1991.1500,1990.6000,1991.0500,2387 
2018-07-20 05:06:00,1991.0000,1991.3000,1991.0000,1991.1000,1776 
2018-07-20 05:05:00,1991.7500,1991.8000,1990.5000,1991.0500,5988 
2018-07-20 05:04:00,1991.9500,1992.0000,1991.7500,1991.7500,1646 
2018-07-20 05:03:00,1992.0000,1992.0500,1991.8500,1991.9500,3272 

Now, I want to push this data into Druid datastore named "stockexchange".

supervisor.json

[ { "dataSource" : [ { "spec" : { "dataSchema" : { "granularitySpec" : { "queryGranularity" : "none", "type" : "uniform", "segmentGranularity" : "hour" }, "dataSource" : "stockexchange", "parser" : { "type" : "string", "parseSpec" : { "format" : "csv", "timestampSpec" : { "format" : "auto", "column" : "timestamp" }, "columns" : ["timestamp","open","high","low","close","volume"], "dimensionsSpec" : { "dimensions" : ["open","high","low","close","volume"] } } }, }, "ioConfig" : { "type" : "realtime" }, "tuningConfig" : { "type" : "realtime", "intermediatePersistPeriod" : "PT10M", "windowPeriod" : "PT10M", "maxRowsInMemory" : 75000 } }, "properties" : { "task.partitions" : "2", "task.replicants" : "2", "topicPattern" : "stockexchange.*", "topicPattern.priority" : "1" } } ], "properties" : { "zookeeper.connect" : "ip-xxx-xx-xxxx-xx.ec2.internal:2181", "zookeeper.timeout" : "PT20S", "druid.selectors.indexing.serviceName" : "druid/overlord", "druid.discovery.curator.path" : "/druid/discovery", "kafka.zookeeper.connect" : "ip-xxx-xx-xxxx-xx.ec2.internal:2181", "kafka.group.id" : "xxxx-xxxxx-xxxx", "consumer.numThreads" : "2", "commit.periodMillis" : "15000", "reportDropsAsExceptions" : "false" } } ] 

But, when I post data into Druid datastore using following curl command, it throws following error:

Command:

curl -X POST -H 'Content-Type: application/json' -d @supervisor.json http://ec2-xxx-xx-xxxx-xxxx.compute-1.amazonaws.com:8090/druid/indexer/v1/supervisor 

Error:

{"error":"Unexpected token (START_OBJECT), expected VALUE_STRING: need JSON String that contains type id (for subtype of io.druid.indexing.overlord.supervisor.SupervisorSpec)\n at [Source: HttpInputOverHTTP@1f0c750b; line: 1, column: 2]"}

I have searched for this error, but I didn't find any particular solution.

Please kindly help me to solve this error.

Regards,

Jay.

2 REPLIES 2

Re: Issue while store streaming data into Druid Datastore

Expert Contributor

@JAy PaTel

It looks like you don't have properly formatted JSON. Try running your JSON through an online validator like this one before you submit to Druid: https://jsonformatter.curiousconcept.com/

I've identified the issue to be an extra comma you have in your spec, here's the corrected JSON:

[  
   {  
      "dataSource":[  
         {  
            "spec":{  
               "dataSchema":{  
                  "granularitySpec":{  
                     "queryGranularity":"none",
                     "type":"uniform",
                     "segmentGranularity":"hour"
                  },
                  "dataSource":"stockexchange",
                  "parser":{  
                     "type":"string",
                     "parseSpec":{  
                        "format":"csv",
                        "timestampSpec":{  
                           "format":"auto",
                           "column":"timestamp"
                        },
                        "columns":[  
                           "timestamp",
                           "open",
                           "high",
                           "low",
                           "close",
                           "volume"
                        ],
                        "dimensionsSpec":{  
                           "dimensions":[  
                              "open",
                              "high",
                              "low",
                              "close",
                              "volume"
                           ]
                        }
                     }
                  }
               },
               "ioConfig":{  
                  "type":"realtime"
               },
               "tuningConfig":{  
                  "type":"realtime",
                  "intermediatePersistPeriod":"PT10M",
                  "windowPeriod":"PT10M",
                  "maxRowsInMemory":75000
               }
            },
            "properties":{  
               "task.partitions":"2",
               "task.replicants":"2",
               "topicPattern":"stockexchange.*",
               "topicPattern.priority":"1"
            }
         }
      ],
      "properties":{  
         "zookeeper.connect":"ip-xxx-xx-xxxx-xx.ec2.internal:2181",
         "zookeeper.timeout":"PT20S",
         "druid.selectors.indexing.serviceName":"druid/overlord",
         "druid.discovery.curator.path":"/druid/discovery",
         "kafka.zookeeper.connect":"ip-xxx-xx-xxxx-xx.ec2.internal:2181",
         "kafka.group.id":"xxxx-xxxxx-xxxx",
         "consumer.numThreads":"2",
         "commit.periodMillis":"15000",
         "reportDropsAsExceptions":"false"
      }
   }
]

Re: Issue while store streaming data into Druid Datastore

@JAy PaTel

Additionally to the answer of anarasimham i think you have to specify the type Property at the start of your JSON String like that:

{  "type" : "index_hadoop",  
   "spec" : {  
	"dataSchema" : {....

On this Page you can find the following instructions:

propertydescriptionrequired?
typeThe task type, this should always be "index_hadoop".yes

Hope this helps,

Regards,

Michael