Hi guys, I'm doing a Big Data Analytics project and I am in the phase of detection of outliers. I only have experience using SAS, and in there I ussually use Histogram charts to detect the outliers. In Hadoop which component is used to identify this records? Normally you use Pig or Hive? Or just use any tool outside Hadoop like Python or Java? Many thanks!
HDP doesn't really provide visualization capabilities, except through Apache Zeppelin via Spark. If you get your data set into hdfs, hive, or hbase, you can either connect directly to the data via a jdbc/odbc driver or extract a set (depending on performance) and use any visualization tool visualize it. Here are a few options.
SAS has a Hadoop connector that allows you to use SAS reporting and analytics.
Apache Zeppelin provides notebook capabilities and visualization for Spark
And there are a number of partner technologies that provide analytics against Hive, particularly AtScale