Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Druid indexing csv is extremely slow

Explorer

While I was trying to index a csv info druid for just 2 lines, it is extremely slow.

The csv consists of 2 lines:

1,0,90.95,"P",385,"MA",2018-05-08 
1,0,91,,6000,"MN",2018-05-08

The indexing json config file:

{ "type" : "index_hadoop", "spec" : { "dataSchema" : { "dataSource" : "hkexsales", "parser" : { "type" : "string", "parseSpec" : { "format" : "csv", "timestampSpec" : { "column" : "data_date", "format" : "iso" }, "columns" : ["stockcode","seq","price","flag","quantity","session","data_date"], "dimensionsSpec" : { "dimensions": ["stockcode","seq","price","flag","quantity","session"], "dimensionExclusions" : [], "spatialDimensions" : [] } } }, "metricsSpec" : [ { "type" : "count", "name" : "count" } ], "granularitySpec" : { "type" : "uniform", "segmentGranularity" : "DAY", "queryGranularity" : "NONE", "intervals" : [ "2013-08-31/2020-09-01" ] } }, "ioConfig" : { "type" : "hadoop", "inputSpec" : { "type" : "static", "paths" : "/data/HkexDayQuot/output/Sales/d180508e_1525922483177_skipheader2.htm" } }, "tuningConfig" : { "type" : "hadoop" } } }



Duration Spent is 8906764, it is unacceptable.

Please advise

1 REPLY 1

Explorer

do you have any solution for this problem ?