Member since
07-05-2018
13
Posts
1
Kudos Received
0
Solutions
11-27-2018
02:55 PM
1 Kudo
@heta desai You can load data into druid in a few ways: Native batch (Druid loads data directly from S3, HTTP, NFS, or other networked storage.) Hadoop (Druid launches Hadoop Map/Reduce jobs to load data files) Kafka indexing service (Druid reads directly from Kafka.) Tranquility (You use Tranquility, a client side library, to push individual records into Druid.) More Informations here: http://druid.io/docs/latest/ingestion/index.html Example for Hadoop Batch Ingestion: You have a example JSON File (pageviews.json) like that: {"time": "2015-09-01T00:00:00Z", "url": "/foo/bar", "user": "alice", "latencyMs": 32}
{"time": "2015-09-01T01:00:00Z", "url": "/", "user": "bob", "latencyMs": 11}
{"time": "2015-09-01T01:30:00Z", "url": "/foo/bar", "user": "bob", "latencyMs": 45}
Then you have to build a json file (my-index-task.json) which explains how to load the data (look attachments). Then put the example json File into the hdfs. Finally set up a HTTP Request to your Druid Overload Server to perform the operation: curl -X 'POST' -H 'Content-Type:application/json' -d @my-index-task.json druidoverlordserver:8090/druid/indexer/v1/task If you want to query the stored data, you can also do this with HTTP Requests. Choose the right Querying Typ for your operation. More informations here: http://druid.io/docs/latest/querying/querying Here is an example for the timeseries query (query-file.json): {
"queryType": "timeseries",
"dataSource": "pageviews.json",
"granularity": {"type": "duration", "duration": 3600000},
"descending": "true",
"dimensions" : ["user"],
"aggregations": [
{
"type": "count",
"name": "totalCount"
}],
"intervals": [ "2015-09-01T00:00:00.000Z/2015-09-02T00:00:00.000Z" ]
}
Then set up the HTTP Request to your Druid Broker Server: curl -X POST 'druidbrokerserver:8082/druid/v2/?pretty' -H 'Content-Type:application/json' -d @query-file.json It is also possible to you the Hive SQL Layer to query data stored in druid via SQL-Statements. If you can do that, you have to enable the HiveServer2 Interactive. Then you can create an external table with the DruidStorageHandler, and refer to the existing druid table: CREATE EXTERNAL TABLE druid_wikiticker
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.datasource" = "wikiticker");
I hope I could helped. Please go to http://druid.io/ for more information. Regards, Michael
... View more
11-27-2018
06:39 AM
@Nikhil Ok I understand what you mean. I think you are right, this solution is only for spark applications.
... View more
11-26-2018
10:57 AM
Look here: http://www.hammerlab.org/2015/02/27/monitoring-spark-with-graphite-and-grafana/
... View more
11-13-2018
06:32 PM
@Joshua Adeleke Unfortunately, Phoenix (HBase) is not yet integrated in YARN by default (HDP 2.6.5 or lower, 3.x I don't know). So the ressources of HBase loads are managed outside of YARN. That's the reason why it is recommended that the YARN Property yarn.nodemanager.resource.memory-mb should not set to all your cluster ressources. There should be enough space for HBase workloads and things like the Operating System. However, there are some workarounds to manage Phoenix/HBase Ressources. Manage in YARN Use Spark with Phoenix JDBC Connector - Spark runs on YARN, but i think it's a little bit difficult to run interactive queries Use Apache - Slider “slides” existing long-running services like Apache HBase onto YARN (but haven't used it yet, so I don't know how it works) Using Hoya - https://de.hortonworks.com/blog/introducing-hoya-hbase-on-yarn/ Manage outside of YARN Use cgroups in Linux More ressources: https://de.slideshare.net/Hadoop_Summit/multitenant-multicluster-and-multicontainer-apache-hbase-deployments Regards, Michael
... View more
11-06-2018
03:45 PM
We had the same error. Additional to the answer above we did change specific the following properties: hbase-site.xml hbase.regionserver.handler.count from 30 to 40 phoenix.regionserver.index.handler.count from 30 to 40 hbase-env add (at export HBASE_REGIONSERVER_OPTS) -XX:ParallelGCThreads=8 ---------------- Additional you can increase the hbase.ipc.server.max.callqueue.size (default 1GB) Regards, Michael
... View more
10-25-2018
07:55 AM
You can run major compaction manually by running the following commands: hbase shell
major_compact 'TABLE_NAME'
You can also configure that compaction runs automatically by adding this properties in hbase-site.xml: hbase.regionserver.compaction.enabled
hbase.hregion.majorcompaction
hbase.hregion.majorcompaction.jitter
hbase.hstore.compactionThreshold
You can find more informations here: https://hbase.apache.org/book.html#_enabling But be careful, do only major compaction if all region are assigend. No Region should be in RIT (Region in Transition). Also major compaction is a heavyweight operation. So you should run it, when the cluster load is low. You can monitor the compaction in the HBase Master UI. Regards, Michael
... View more
09-28-2018
06:32 AM
@Roberto
Ayuso
Here you can find some important properties to improve the performance of oozie. Maybe that will help you. Regards, Michael
... View more
09-20-2018
01:14 PM
@Sowmya K The components of druid, coordinator, router, overlord and broker, are working closely together. Each of them has it's own task. You can find an overview here. Mainly they are coordinating the druid jobs. So I recommend that you put them together on a master node. Superset is an extra component. You can put it on another master node or on the same that does not matter. Durid also need's Druid Historical and Druid MiddleManager components. This you should install on the Data Nodes. Here is an answer to a similiar questions like your's with some more informations. https://community.hortonworks.com/questions/140030/druid-installation.html If you are intrested in an architecture overview of druid i can advise slide five of this presentation. https://www.slideshare.net/Hadoop_Summit/interactive-analytics-at-scale-in-apache-hive-using-druid-80145456 I hope i could help you. Regards, Michael
... View more
09-14-2018
03:06 AM
Hello, we want to implement HA for Livy. We use livy a lot in combination with Zeppelin. I already installed a second Livy for Spark2 Server over Ambari. In Zeppelin there is a property zeppelin.livy.url, which contains the URL of the Livy Server. Now with HA we have two running Livy Servers. How can i set both URL's of the Livy Server in that property, to have a automatic failover when one server crashes? Is that possilbe? I already tried to use the delimiter ',' and ';' between the URL's. For example: zeppelin.livy.url=http://livyserver1:8999,http://livyserver2:8999 Regards, Michael
... View more
Labels:
- Labels:
-
Apache Zeppelin
07-24-2018
05:59 AM
Hello, if i kill a oozie coordinator with the command below, the workflows of the coordinator which are running at this moment also gets killed. oozie job -oozie http://<ooziehost>:<port>/oozie -kill xxxxxxx-xxxxxxxxxxxxxx-oozie-oozi-C Is there a way to kill a coordinator and let the coordinator workflows which are in RUNNING mode at this time run to end? Background is that the workflows are ELT processes, including recieving messages of IBM MQ, do sqoop jobs and transform the data. The error handling if a workflow suddenly get killed is quite difficult. Thanks, Michael
... View more
Labels:
- Labels:
-
Apache Oozie
07-23-2018
02:15 PM
@JAy PaTel Additionally to the answer of anarasimham i think you have to specify the type Property at the start of your JSON String like that: { "type" : "index_hadoop",
"spec" : {
"dataSchema" : {.... On this Page you can find the following instructions: property description required? type The task type, this should always be "index_hadoop". yes Hope this helps, Regards, Michael
... View more
07-11-2018
11:23 AM
Hello, my Question is which Users are in the "public" group of Ranger? Is this group a wildcard only for all service users in the cluster (e.g. hive, spark ...) or is this group a wildcard for all users in the cluster, including the self made users? In this documnetation i found the part: What is the “public” group? Where did it come from? This is an internal group that serves as a wildcard for all system users. For me it's not completly clear what "system" users in this context means. Thanks, Michael
... View more
Labels:
- Labels:
-
Apache Ranger
07-09-2018
07:05 AM
Hello, do you manage the permissions for the HBase Tables with Ranger? We have the Problem that in the Ranger HBase Policies our Groupmapping don't work. We have to add the Users manually in the Ranger HBase Policicy that they have Permissions for the HBase Tables. Maybe you have the same issue after enabling Kerberos Authentication. Another Link that might be help is this one: https://community.hortonworks.com/content/supportkb/49037/phoenix-sqlline-query-on-larger-data-set-fails-wit.html It's not the same error, but maybe it gives you some ideas what Configuration must be changed. I hope something of this helps. Regards, Michael
... View more