Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Druid indexing csv is extremely slow

Druid indexing csv is extremely slow

New Contributor

While I was trying to index a csv info druid for just 2 lines, it is extremely slow.

The csv consists of 2 lines:

1,0,90.95,"P",385,"MA",2018-05-08 
1,0,91,,6000,"MN",2018-05-08

The indexing json config file:

{ "type" : "index_hadoop", "spec" : { "dataSchema" : { "dataSource" : "hkexsales", "parser" : { "type" : "string", "parseSpec" : { "format" : "csv", "timestampSpec" : { "column" : "data_date", "format" : "iso" }, "columns" : ["stockcode","seq","price","flag","quantity","session","data_date"], "dimensionsSpec" : { "dimensions": ["stockcode","seq","price","flag","quantity","session"], "dimensionExclusions" : [], "spatialDimensions" : [] } } }, "metricsSpec" : [ { "type" : "count", "name" : "count" } ], "granularitySpec" : { "type" : "uniform", "segmentGranularity" : "DAY", "queryGranularity" : "NONE", "intervals" : [ "2013-08-31/2020-09-01" ] } }, "ioConfig" : { "type" : "hadoop", "inputSpec" : { "type" : "static", "paths" : "/data/HkexDayQuot/output/Sales/d180508e_1525922483177_skipheader2.htm" } }, "tuningConfig" : { "type" : "hadoop" } } }



Duration Spent is 8906764, it is unacceptable.

Please advise

Don't have an account?
Coming from Hortonworks? Activate your account here