Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Outliers Detection in Hadoop

Highlighted

Outliers Detection in Hadoop

Hi guys, I'm doing a Big Data Analytics project and I am in the phase of detection of outliers. I only have experience using SAS, and in there I ussually use Histogram charts to detect the outliers. In Hadoop which component is used to identify this records? Normally you use Pig or Hive? Or just use any tool outside Hadoop like Python or Java? Many thanks!

1 REPLY 1

Re: Outliers Detection in Hadoop

Expert Contributor

Hello Pedro,

HDP doesn't really provide visualization capabilities, except through Apache Zeppelin via Spark. If you get your data set into hdfs, hive, or hbase, you can either connect directly to the data via a jdbc/odbc driver or extract a set (depending on performance) and use any visualization tool visualize it. Here are a few options.

SAS has a Hadoop connector that allows you to use SAS reporting and analytics.

http://www.sas.com/en_th/software/data-management/access-hadoop.html

Apache Zeppelin provides notebook capabilities and visualization for Spark

https://zeppelin.apache.org/

And there are a number of partner technologies that provide analytics against Hive, particularly AtScale

http://www.atscale.com/