About roshand

roshand · ‎10-23-2017

Hi @Slim Given that this dataset already loaded into HIVE, and the Hive table will be updated occasionally*. What are my chances in using druid to index this data and use superset to visualise the data(Without replicating in druid) ? And how would you recommend this approach ? *Will druid automatically update indexes when data is added to HIVE?

roshand · ‎10-13-2017

Thanks a lot for the quick reply. Let me do a setup with a smaller data set and get back to you with some questions. 😄

roshand · ‎10-12-2017

Referring to this article https://hortonworks.com/blog/apache-hive-druid-part-1-3/ by @Carter Shanklin We have a 12TB+ (growing ~8GB per day) data set of click stream data(user events) in Hive. The usecase is to run OLAP queries across the data set, for now mostly groupby. How will this combination perform in context to the data set ? Also how production ready is the combination.

roshand · ‎09-21-2017

@Sriharsha Chintalapani I getting the same error, cluster update didn't solve the issue. I'm trying to consume plain CSV(roshan,22) to Streamline via NIFI(to convert CSV to AVRO). Kafka > NIFI > Kafka. From Storm I'm getting the error similar to above. com.hortonworks.registries.schemaregistry.serde.SerDesException: Unknown protocol id [114] received while deserializing the payload at com.hortonworks.registries.schemaregistry.serdes.avro.AvroSnapsh My Flow is as follows, One branch directly to a Kafka topic and other one to serialize and publish to kafka. After running data through the PublishKafkaRecord the output simply removes the "," (roshan,22 turns to roshan22) and the above mentioned error appears in Storm. I'm really new to this stack any help would be appreciated. Avro schema { "type": "record", "name": "user", "fields": [ { "name": "name", "type": "string", "default": null }, { "name": "age", "type": "string", "default": null } ] } Update Attribute Processor Publish Kafka Record Processor CSVReader Controller Service AvroRecordSetWriter Controller Service

roshand · ‎09-20-2017

@mkalyanpur CSVReader 1.2.0.3.0.1.1-5 & AvroRecordSetWriter 1.2.0.3.0.1.1-5 are as follows. And my avro schema in the registry is similar to this with bunch of more string fields. { "type": "record", "name": "tracking_sdk_event", "fields": [ { "name": "timeStamp", "type": "long", "default": null }, { "name": "isoTime", "type": "string", "default": null } ] } @Bryan Bende After changing the "Schema Write Strategy" to "Hortonworks Content Encoded Schema Reference" I'm getting an error with the timeStamp field. I have attached an image of it.

roshand · ‎09-19-2017

I'm trying to upgrade a existing visualization(Kafka>Flink>Druid>Superset) solution to work with HWX SAM & Registry. Currently the NIFI Works as a HTTP proxy to collect events and push to kafka, I'm trying to convert the events(CSV) to avro in this stage and push to kafka so that SAM can consume. Output of the SplitContent is something similar to "abc,def,ghi,jkl,," I'm getting this error in storm UI com.hortonworks.registries.schemaregistry.serde.SerDesException: Unknown protocol id [49] received while deserializing the payload at com.hortonworks.registries.schemaregistry.serdes.avro.AvroSnapsho Is there something I should pay closer attention to when processing CSV? Troubleshooting recommendations ?

Online	Offline
Last Visited	‎09-05-2018 01:09 PM

Member Since	‎08-14-2017 11:50 AM
Last Visited	‎09-05-2018 01:09 PM
Posts	8

Cloudera Community

Re: Druid Hive combination for 12TB+ Dataset (OLAP...

Re: Druid Hive combination for 12TB+ Dataset (OLAP...

Druid Hive combination for 12TB+ Dataset (OLAP Use...

Re: SAM error: com.hortonworks.registries.schemare...

Re: Error while ingesting Plain CSV to SAM via NIF...

Error while ingesting Plain CSV to SAM via NIFI