Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Mapreduce doesn't successfully do INSERT / CREATE TABLE from existing table operations.

avatar
Explorer

I created a table and in beeline (hive) and it worked quickly.

 

 

 

# Movies table
CREATE EXTERNAL TABLE movies (
  movieId INT,
  title STRING,
  genres STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
  "separatorChar" = ",",
  "quoteChar" = "\"",
  "escapeChar" = "\\"
)
STORED AS TEXTFILE
LOCATION '/user/hive/warehouse/movielens/movies'
TBLPROPERTIES ("skip.header.line.count"="1");

# Ratings table
CREATE EXTERNAL TABLE ratings (
  userId INT,
  movieId INT,
  rating DOUBLE,
  rating_timestamp BIGINT
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
  "separatorChar" = ",",
  "quoteChar" = "\"",
  "escapeChar" = "\\"
)
STORED AS TEXTFILE
LOCATION '/user/hive/warehouse/movielens/ratings'
TBLPROPERTIES ("skip.header.line.count"="1");

 

 

 

I am attempting: 

 

 

 

CREATE TABLE avg_movie_ratings AS
SELECT movieId, AVG(rating) AS avg_rating
FROM ratings
GROUP BY movieId;

 

 

 

which starts a map-reduce job, which is struck.
I have the hadoop and hive running. 

However,

The url to track the job: http://anushkahp14:8088/proxy/application_1716189650320_0005/  returns ERR_CONNECTION_REFUSED.

Please help.

4 REPLIES 4

avatar
Expert Contributor

@adsejnf, Welcome to Cloudera community!

Do you see any issues in the Hive logs?

Or try checking the application logs via CLI:

» yarn logs -applicationId <application ID> -appOwner <AppOwner>

avatar
Explorer

@tj2007, Thanks!

No logs were recorded,

anushkakundu@AnushkaHP14:~$ /opt/hadoop/bin/yarn logs -applicationId application_1716374810626_0001 -appOwner anushkakundu
2024-05-22 16:29:07,059 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8032
Can not find the logs for the application: application_1716374810626_0001 with the appOwner: anushkakundu

Kindly check my localhost:8088, 

adsejnf_0-1716375414545.png

It says: 

Log Aggregation Status: 

NOT_START

could it be an issue?

avatar
Super Collaborator

The diagnostics message in YARN RM UI indicates that the application has been added to the scheduler but has not yet been activated. The message provides details about the reason for skipping the ApplicationMaster (AM) assignment. Let's break down the components of the message for a better understanding:

Diagnostic Message Breakdown

  1. Application is added to the scheduler and is not yet activated.

    • This indicates that the application is recognized by the scheduler but hasn't started the process of resource allocation and execution.
  2. Skipping AM assignment as cluster resource is empty.

    • The ApplicationMaster (AM) assignment is skipped because there are no available resources in the cluster to fulfill the request.
  3. Details:

    • Provides additional information about the resource request and limits.
  4. AM Partition = <DEFAULT_PARTITION>;

    • AM Partition: The partition in which the AM is supposed to run. In this case, it's the <DEFAULT_PARTITION>, which typically means the default resource pool for the cluster.
  5. AM Resource Request = <memory:2048, Cores:1>;

    • AM Resource Request: The resources requested for the ApplicationMaster. Here, it requests 2048 MB of memory and 1 core.
  6. Queue Resource Limit for AM = <memory:0, vCores:0>;

    • Queue Resource Limit for AM: The maximum resources allocated for ApplicationMasters in the queue. In this case, it shows <memory:0, vCores:0>, indicating that there are no resources currently allocated for AMs in the queue.
  7. User AM Resource Limit of the queue

    • User AM Resource Limit of the queue: This part of the message is truncated, but it generally refers to the per-user resource limits within the queue. This would typically indicate the maximum resources a single user's applications can consume within the queue.

Explanation

The diagnostic message suggests that:

  • Resource Scarcity: The cluster currently has no available resources to assign to the ApplicationMaster. This could be due to the cluster being fully utilized or the specific queue not having sufficient resources allocated or available.
  • Queue Limits: The specific queue the application belongs to has its AM resource limits set to zero (<memory:0, vCores:0>), which means no resources are allocated for ApplicationMasters in this queue at the moment.
  • Activation Pending: The application is added to the scheduler, but activation (resource assignment and start) is pending due to the lack of available resources.

Possible Causes and Solutions

  1. Cluster Resource Constraints:

    • The cluster might be fully utilized, leaving no available resources for new ApplicationMasters.
    • Solution: Monitor and manage cluster resources. Consider scaling the cluster or optimizing the current workload.
  2. Queue Configuration Issues:

    • The queue configuration might have stringent limits or no resources allocated for ApplicationMasters.
    • Solution: Review and adjust the queue configurations in the capacity-scheduler.xml or equivalent configuration file to ensure there are sufficient resources for AMs.
  3. User Resource Limits:

    • The user might have reached their resource quota in the queue.
    • Solution: Check the per-user resource limits and adjust them if necessary to allow more resource allocation.

Steps to Diagnose Further

  1. Check Cluster Resource Utilization:

    • Use the ResourceManager web UI or CLI to check the current resource utilization of the cluster.
  2. Review Queue Configurations:

    • Inspect the queue configurations, particularly the settings for ApplicationMaster resource limits.
  3. Inspect Application Logs:

    • Look at the application logs for any additional diagnostics or error messages.
  4. Consult YARN ResourceManager Logs:

    • The ResourceManager logs can provide more context about why resources are unavailable or why the AM assignment is being skipped.

By understanding and addressing the issues highlighted in this diagnostic message, you can ensure that your YARN applications get the necessary resources to run effectively.

 




avatar
Community Manager

@adsejnf Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: