Member since
01-03-2018
11
Posts
0
Kudos Received
0
Solutions
10-03-2018
11:49 AM
I am new to Data Science and Big Data Frameworks. Lets say,I have a DataSet input in CSV. What I found from Google and other resources about a Data Analyst and Data Scientist daily job, Once user gets DataSet, first will manipulate with help of python pandas library which includes Data cleaning and other stuffs. Then User visualizes the datas using matplotlib and other techniques. User can write Machine Learning algorithms to get a prediction for some criterias. All the above workflows can be summarized into data analysis and prediction. Now, on the other account, I found out Pydoop(a Hadoop framework of Python) to do operations like Storage, processing etc I am bit confused, in the Data Analysis workflow mentioned above where pydoop stands exactly in that? Please guide me.
... View more
Labels:
04-18-2018
04:10 AM
Hi @Harald Berghoff Thanks for the information.
... View more
04-17-2018
09:51 AM
Hello All, My understanding was that Spark is an alternative to Hadoop. However, when trying to install Spark, the installation page asks for an existing Hadoop installation. I'm not able to find anything that clarifies that relationship. Secondly, Spark apparently has good connectivity to Cassandra and Hive. Both have sql style interface. However, Spark has its own sql. Why would one use Cassandra tutorial/Hive instead of Spark's native sql? Assuming that this is a brand new project with no existing installation? Help me out. Thanks
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Spark
03-13-2018
05:29 AM
I am new to azure development. I have to select database in azure to store big data. So i have to finalize data storage now. Strongly like to understand why Hadoop ( Non Microsoft ) in side azure ? ( I hope some strong reasons will be ) 1) Available Microsoft Azure storage (ex blobs etc ) can not perform like hadoop ? 2) Can not achieve something in azure but can achieve in hadoop ? 3) Perfoemance ? like this lots of question coming to me, Please provide clear ideas on this. Regards,
... View more
Labels:
- Labels:
-
Apache Hadoop
01-10-2018
01:16 PM
I wanted to switch from Hadoop 1.2.1 to Hadoop 2.2. In my project I'm using Maven and it can handle <dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency> without any problems, however changing the version to 2.2 in not working as it is not available in the central maven repository.
Any ideas how can I include Hadoop 2.2. in my maven-ized project? Regards Sarahjohn
... View more
- Tags:
- hadoop
- Hadoop Core
Labels:
- Labels:
-
Apache Hadoop
12-07-2017
06:45 AM
After installing Hadoop when I am trying to start start-dfs.sh it is showing following error message.
I have searched a lot and found that WARN is because I am using UBUNTU 64bit OS and Hadoop is compiled against 32bit. So its not an issue to work on.
But the Incorrect configuration is something I am worried about. And also not able to start the primary and secondary namenodes. sameer@sameer-Compaq-610:~$ start-dfs.sh
15/07/27 07:47:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
Starting namenodes on []
localhost: ssh: connect to host localhost port 22: Connection refused
localhost: ssh: connect to host localhost port 22: Connection refused
Starting secondary namenodes [0.0.0.0]
0.0.0.0: ssh: connect to host 0.0.0.0 port 22: Connection refused**
15/07/27 07:47:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable My current configuration: hdfs-site.xml <configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/sameer/mydata/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/sameer/mydata/hdfs/datanode</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.default.name </name>
<value> hdfs://localhost:9000 </value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration> Please find what I am doing wrong in configuration or somewhere else.? Thanks, Nicolewells
... View more
Labels:
- Labels:
-
Apache Hadoop
10-03-2017
11:03 AM
I understand Splunk Hadoop Connect is a free app and Hunk License depends on the no of Tasktrackers. We have Splunk Administration Enterprise in our organisation and the goal is to perform analytics on Hadoop data and send archived data to Hadoop from Indexes. I can achieve this via both Splunk Hadoop Connect and Hunk, but my doubt is what's the difference between these two w.r.t licensing, other than the bidirectional data movement that Hadoop Connect provides? Now if I get Splunk Hadoop Connect app, then the licensing will depend on what parameters?
... View more
Labels:
- Labels:
-
Apache Hadoop