Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Exercise 2: Hive data load not completing

Exercise 2: Hive data load not completing

New Contributor

Hi,


I'm new here and trying to complete the cloudera live guide. I'm trying to follow the steps required to load the data from access.log.2 but the task doens't complete. It's not giving me any errors and I'm not sure where to look for log files.


The code I'm using looks like this:

CREATE EXTERNAL TABLE intermediate_access_logs (
    ip STRING,
    date STRING,
    method STRING,
    url STRING,
    http_version STRING,
    code1 STRING,
    code2 STRING,
    dash STRING,
    user_agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
    'input.regex' = '([^ ]*) - - \\[([^\\]]*)\\] "([^\ ]*) ([^\ ]*) ([^\ ]*)" (\\d*) (\\d*) "([^"]*)" "([^"]*)"',
    'output.format.string' = "%1$$s %2$$s %3$$s %4$$s %5$$s %6$$s %7$$s %8$$s %9$$s")
LOCATION '/user/hive/warehouse/original_access_logs';

CREATE EXTERNAL TABLE tokenized_access_logs (
    ip STRING,
    date STRING,
    method STRING,
    url STRING,
    http_version STRING,
    code1 STRING,
    code2 STRING,
    dash STRING,
    user_agent STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/user/hive/warehouse/tokenized_access_logs';
ADD JAR /usr/lib/hive/lib/hive-contrib.jar;
INSERT OVERWRITE TABLE tokenized_access_logs SELECT * FROM intermediate_access_logs;

I'm sure I'm missing something silly here.


Kind Regards,

Gee

6 REPLIES 6

Re: Exercise 2: Hive data load not completing

Master Collaborator
Could you detail the environment you're running this in? The use of
/usr/lib/hive/lib/hive-contrib.jar indicates this is the QuickStart VM
version as opposed to the 4-node Cloudera Live cluster version? Or is it
something else?

You're entering this query in Hue, correct? There should be a frame at the
bottom of the window that shows the output of the job. Do you see any logs
at all there? If so, where is it getting stuck?

Re: Exercise 2: Hive data load not completing

New Contributor

Hi,

 

Yes I am using the QuickStart VM. 2 CPUs, 8GB of RAM, express edition.

 

Yes I'm using Hue and I can only see the following message with the spinning wheel next to it:

"There are currently no logs to visualize."

 

That is the only message

Re: Exercise 2: Hive data load not completing

Master Collaborator
So the memory configuration can be a bit complex once you're running CM
with very low amounts of memory. The VM gets tested in that configuration
by, among other things, running that exact same job, but perhaps something
random has pushed it over the edge and something's getting blocked. The
simplest fix might be to run the VM with a little more memory, ensure that
in Cloudera Manager, you stop any services you're not using or that aren't
required for what you're doing. You might also consider starting with a new
instance of the VM and not moving to CM if you can't run it with more
memory.

If none of those possibilities are an option or they don't work, the next
thing I'd suspect is YARN's configuration. You should be able to find some
information about the jobs in the Job Browser in Hue, or in the various Web
UIs for YARN in the browser. Any information there may be useful to figure
out exactly what's stalled. In CM, increasing some of the following
configurations may be helpful. All of them are set about as low as they can
go and still have all the examples work reliably (at least in our testing),
and setting them too high will cause other problems in the VM. Increasing
them by 50% or so may allow the job to run and not cause too many other
issues:

io_sort_mb
mapreduce_client_java_heapsize
mapreduce_map_java_opts_max_heap
mapreduce_map_memory_mb
mapreduce_reduce_java_opts_max_heap
mapreduce_reduce_memory_mb
yarn_app_mapreduce_am_max_heap
yarn_app_mapreduce_am_resource_mb

Highlighted

Re: Exercise 2: Hive data load not completing

New Contributor

I had exactly the same problem while running a VMWare with 8196 MB of memory. When I switched to 10GB+ it worked, so I guess 8 GB is a bit optmistic. 

 

I followed the tutorial and did little else. I restarted the tutorial 3 times before I changed the memory setting, and none worked. 

Re: Exercise 2: Hive data load not completing

New Contributor

Sorry for the delay and thank you for all your assistance. I can confirm that he memory increase did solve the issue.

 

Thanks,

Geouff

Re: Exercise 2: Hive data load not completing

New Contributor

Hi Team, I am new to Clodera and I hope I will get help from the community 

I also came across the same issue can anyone suggest where did u increase the memory?

I have 12 GB in my laptop 

Do i have to do changes in settings anywhere to resolve this issue?

Please help I am stuck with few Exceptions 

 "AnalysisException: Could not resolve table reference: 'tokenized_access_logs' "
" AnalysisException: Could not resolve table reference: 'intermediate_access_logs' "
 These exceptions came while running the code in Hue 
It seems tables did not get generated properly --- saw these when i checked for tables " Could not load: tokenized_access_logs" , "Could not load: intermediate_access_logs"