Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Hortonworks running very slowly

Expert Contributor

Hello,

I'm new to Hadoop and I've deployed the sandbox into a VM with 32GB of RAM.

However Hive queries and everything run very, very slowly.

Can it be of the VM?
Also I don't have multinodes, only a single node... can this deteriorate (considerably) the performance?

Many thanks in advance.

Best regards

3 REPLIES 3

Cloudera Employee

It could be many things.

1. What volume of data is under consideration in the Hive queries?

2. What file format is the data stored in?

3. How was the data prepared and loaded (sorting, partitioning, etc.)?

4. etc.

There isn't enough information in your question to really give anyone a single answer which will help you. You may have to explore a bit and provide more details...

Yes, a single node has limitations. It's not that it is intentionally deteriorating the performance, but just that the system is designed for scaling through parallelism, and you have just a single node, so you are limiting the abilities of the software to scale (if that is what is needed)

Sandbox is meant for tutorials and exploration of simple capabilities on small data. If you want to try the actual HDP software on real data, you can install a small multi-node cluster using the HDP installation processes documented at docs.hortonworks.com.

Expert Contributor

@David Kaiser

Many thanks for your quick answer.
It's a small volume of data, just a couple of json files captured by flume (twitter) stored on hdfs and with some queries on hive.

Many thanks once more.
Kind regards

New Contributor

I am new to HDP Sandbox and also find it quite slow.

Using the example csv file from the getting started tutorial (https://hortonworks.com/tutorial/hadoop-tutorial-getting-started-with-hdp/), following cell takes 9 Seconds to execute.


%spark2
val geoLocationDataFrame = spark.read.format("csv").option("header", "true").load("hdfs:///tmp/data/geolocation.csv")
geoLocationDataFrame.createOrReplaceTempView("geolocation")

Took 9 sec. Last updated by anonymous at June 25 2019, 3:02:48 PM.


I would expect only a view ms to load some csv file.


My VM setup is based on VmWare and has 10 GB Ram and 4 x 3.2 GHz.


Is there some benchmark with reference numbers to know expected execution times or some sort of profiling tool to be able to easily find bottlenecks?




Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.