Member since
11-21-2017
6
Posts
1
Kudos Received
0
Solutions
01-06-2018
10:02 AM
HI, I am migrating my application from Hadoop 1.0.3 to Hadoop 2.2.0 and maven build had Hadoop-core marked as a dependency. Since Hadoop-core is not present for hadoop 2.2.0. I tried replacing it with Hadoop-client and Hadoop-common but I am still getting this error for ant.filter. Can anybody please suggest which artifact to use? previous config :
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.0.3</version>
</dependency>
New Config:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.2.0</version>
</dependency> Error [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project event: Compilation failure: Compilation failure:
[ERROR] /opt/teamcity/buildAgent/work/c670ebea1992ec2f/event/src/main/java/com/intel/event/EventContext.java:[27,36] package org.apache.tools.ant.filters does not exist
[ERROR] /opt/teamcity/buildAgent/work/c670ebea1992ec2f/event/src/main/java/com/intel/event/EventContext.java:[27,36] package org.apache.tools.ant.filters does not exist
[ERROR] /opt/teamcity/buildAgent/work/c670ebea1992ec2f/event/src/main/java/com/intel/event/EventContext.java:[180,59] cannot find symbol
[ERROR] symbol: class StringInputStream
[ERROR] location: class com.intel.event.EventContext Thank You Hari
... View more
Labels:
01-05-2018
10:53 AM
1 Kudo
Hi, I have some questions about the Hadoop Cluster data node failover: What happened the link is down between the name node and a data node (or between 2 data nodes) when the Hadoop cluster is processing some data? Does Hadoop cluster have any OOTB to recover this problem? What happens one data node is down when the Hadoop cluster is processing some data? Also, another question is about the Hadoop cluster hardware configuration. Let's say we will use our Hadoop cluster to process 100GB log files each day, how many data nodes do we need to set up? And for each data node hardware configuration(e.g. CPU, RAM, Harddisk)? Thank You Hari
... View more
11-25-2017
11:48 AM
Hi, How to connect to Hadoop in a Java program. Here are few details: I am taking input from the user in HTML form, using JSP to process the form data. I want to connect to Hadoop to fetch some data based on form inputs. How can I connect to Hadoop using Java in this case? Thanks Hari
... View more
Labels:
11-17-2017
10:25 AM
Hi,
This is kind of naive question but I am new to NoSQL paradigm and don't know much about it. So if somebody can help me clearly understand the difference between the HBase and Hadoop or if give some pointers which might help me understand the difference.
Till now, I did some research and acc. to my understanding Hadoop provides a framework to work with a raw chunk of data(files) in HDFS and HBase is database engine above Hadoop, which basically works with structured data instead of the raw data chunk. HBase provides a logical layer of HDFS just as SQL does. Is it correct?
Pls, feel free to correct me.
Thanks.
hari
... View more
Labels:
11-07-2017
06:08 AM
282down voteaccepted
Hi, MapReduce is a computing framework. HBase nothing to do with it. That said, you can efficiently fetch data from HBase by writing MapReduce jobs. Alternatively you can write sequential programs using other HBase APIs, such as Java, to put or fetch the data. But we use Hadoop, HBase etc to deal with gigantic amounts of data, so that doesn't make much sense. Using normal sequential programs would be highly inefficient when your data is too huge. Coming back to the first part of your question, Hadoop is basically 2 things: a Distributed FileSystem (HDFS) + a Computation or Processing framework (MapReduce). Like all other FS, HDFS also provides us storage, but in a fault tolerant manner with high throughput and lower risk of data loss (because of the replication). But, being a FS, HDFS lacks random read and write access. This is where HBase comes into picture. It's a distributed, scalable, big data store, modelled after Google's BigTable. It stores data as key/value pairs. Coming to Hive. It provides us data warehousing facilities on top of an existing Hadoop cluster. Along with that it provides an SQL like interface which makes your work easier, in case you are coming from an SQL background. You can create tables in Hive and store data there. Along with that you can even map your existing HBase tables to Hive and operate on them. While Pig is basically a dataflow language that allows us to process enormous amounts of data very easily and quickly. Pig basically has 2 parts: the Pig Interpreter and the language, PigLatin. You write Pig script in PigLatin and using Pig interpreter process them. Pig makes our life a lot easier, otherwise writing MapReduce is always not easy. In fact in some cases it can really become a pain. Thanks Hari
... View more