Member since
10-01-2015
3933
Posts
1150
Kudos Received
374
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3659 | 05-03-2017 05:13 PM | |
| 3015 | 05-02-2017 08:38 AM | |
| 3278 | 05-02-2017 08:13 AM | |
| 3221 | 04-10-2017 10:51 PM | |
| 1684 | 03-28-2017 02:27 AM |
02-15-2016
12:09 AM
@Brenden Cobb since this is after-effect, I'd open a ticket with support. In my experience, i would backup all configs on every node, then try to restart agent on one node at a time as agent will advertise current config on a node to ambari server. Once you confirm everything is restored for that node, you can go to next node.
... View more
02-15-2016
12:06 AM
@Pedro Gandola HDP ships with 10gb size region size by default. Having more regions, in the order of 100-200 per RS is recommended. If your size is 30GB but fewer regions, consider reducing that. How many nodes do you have? Balancer will handle data locality until major compaction happens. I wouldn't mess with that. How often do you expect to apply config and do rolling restarts? You can increase time between RS restarts to minimize impact, you can increase replication factor but that may be overkill, you can enable read replicas and have read-only replicas available for more data availability.
... View more
02-14-2016
11:48 PM
@Paul Boal use this guide to work with hive udfs in spark http://hortonworks.com/hadoop-tutorial/apache-spark-1-4-1-technical-preview-with-hdp/ And here's example of invoking csvserde https://community.hortonworks.com/content/kbentry/8313/apache-hive-csv-serde-example.html
... View more
02-14-2016
05:45 PM
@Andrea Squizzato It's a jvm program and Windows is suppored, here's admin guide. https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html
... View more
02-14-2016
02:11 PM
@vshukla @Ram Sriharsha
... View more
02-14-2016
01:16 PM
3 Kudos
@Pedro Gandola splitting occurs when your regions grow to the max size (hbase.hregion.max.filesize) as defined in your hbase-site.xml http://hbase.apache.org/book.html#disable.splitting when you run major compaction, the data locality is restored. Run major compactions on a busy system in off-peak hours. balancer distributes regions across the cluster, runs every 5 minutes by default, do not turn it off. You can implement your own balancer and replace the default StochasticLoadBalancer class, not recommended unless you know what you're doing. Another option is to enable read replicas, so essentially you're duplicating data in a different region server. The secondary replicas are read-only and maximize your data availablity. All in all, it's more art than science and you need to experiment with many hbase properties to get an ultimate result.
... View more
02-14-2016
12:48 PM
@Revathy Mourouguessane spooling dir is good when you want to watch directory for new files. Syslog listens on a port. So if your logs land in a directory, you would use spooling dir. For hdfs you would use hdfs sink. When you master flume, check out Apache NiFi.
... View more
02-14-2016
12:40 PM
@Jim Fratzke what does your datanode log say?
... View more
02-13-2016
02:19 PM
@Zaher Mahdhi your question has many answers, I suggest you read our cluster planning guide http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_cluster-planning-guide/content/ch_hardware-recommendations_chapter.html
... View more