Support Questions

Find answers, ask questions, and share your expertise

CDH 5.5.0 - Getting Started Tables not visible though install job completed

avatar
Explorer

I had an issue with CDH 5.4 hanging. I prefered to remove it and re-install the latest version.

I downloaded and installed CDH 5.5 and tried to use the Getting Started.

After several tries, I was able to (using Community forums) complete my sudo install done correctly.

 

After the /user/hive/warehouse/categories finally were created correctly,. I went to Hive and Impala editor and tried to refresh table after entering Invalidate metadata command.

 

For some reason, I am unable to see the tables, through I see the parquet files with bytes on each of the tables (customers, departments etc)

 

Please help let me know if I am missing anything.

 

Thanks

1 ACCEPTED SOLUTION

avatar
Guru
To answer your other question though, I wouldn't expect a different data format to make a difference here. There's enough competition for memory on the system that Hive is constantly doing garbage collection, and that shouldn't have anything to do with what format Sqoop is using for the data.

View solution in original post

11 REPLIES 11

avatar
Explorer

to experts

 

sqoop import-all-tables \
> -m 3 \
> --connect jdbc:mysql://208.113.123.213:3306/retail_db \
> --username=retail_dba \
> --password=cloudera \
> --compression-codec=snappy \
> --as-parquetfile \
> --warehouse-dir=/user/hive/warehouse \
> --hive-import
was the original command I used (after cleaning files)


should I redo (after deleting connected folders) using

sqoop import-all-tables \
> -m 3 \
> --connect jdbc:mysql://208.113.123.213:3306/retail_db \
> --username=retail_dba \
> --password=cloudera \
> --compression-codec=snappy \
> --as-avrofile \
> --warehouse-dir=/user/hive/warehouse \
> --hive-import

avatar
Guru
Can you post the output of your Sqoop job? I'm wondering if there were errors when it was doing the --hive-import part. There's 2 stages: writing the files in the new data format to HDFS, and then defining the schema for the tables in Hive's Metastore. It sounds like that 2nd stage failed...

avatar
Explorer

Sean

Should I look in the log files under var/log/hive/ ?

 

Let me know.

 

Here is the file location though for the successful process (for
        / user/ hive/ warehouse/ products)

 

           Name     Size     User     Group     Permissions     Date
                cloudera     supergroup     drwxr-xr-x     December 03, 2015 09:47 PM
        .         cloudera     supergroup     drwxr-xr-x     December 03, 2015 09:47 PM
        .metadata         cloudera     supergroup     drwxr-xr-x     December 03, 2015 09:47 PM
        .signals         cloudera     supergroup     drwxr-xr-x     December 03, 2015 09:47 PM
        6f7ab0da-3cbf-40ee-a74a-d73683c68c91.parquet     43.8 KB     cloudera     supergroup     -rw-r--r--     December 03, 2015 09:47 PM

 

 

hive-metastore.log file

 

2015-12-04 10:12:55,165 WARN  [org.apache.hadoop.hive.common.JvmPauseMonitor$Monitor@33267d64]: common.JvmPauseMonitor (JvmPauseMonitor.java:run(188)) - Detected pause in JVM or host machine (eg GC): pause of approximately 12002ms
No GCs detected
2015-12-04 10:13:41,510 WARN  [org.apache.hadoop.hive.common.JvmPauseMonitor$Monitor@33267d64]: common.JvmPauseMonitor (JvmPauseMonitor.java:run(188)) - Detected pause in JVM or host machine (eg GC): pause of approximately 13720ms
No GCs detected

 

 

 

avatar
Guru
I was referring to the output of your Sqoop command - they are printed to
the terminal, not written to a log file. However the log snippets you did
post do indicate a potential problem: if Hive was pausing too much for
garbage collection, then Sqoop might have given up / timed out when doing
the import. You may not have enough memory for the services to run well.

avatar
Explorer

Sean

 

What is the recommended memory.

I have used 3, 4, 6 on CDH 5.4 and even up to 8MB

 

Can i try the sudo code using avrotable instead of parquet option.

 

Let me know.

 

 

avatar
Guru
Well there are a lot of variables so a simple "minimum requirement" is a
tough number to give. The tutorial was originally written for a 4-node
cluster with 16 GB of RAM per node, and that's a little bit small for the
master node. The QuickStart VM has a version of the tutorial with a smaller
dataset. You can get away with 4 GB (but this includes the graphical
desktop, so let's say 3 GB for a server) if you don't use Cloudera Manager
and manage everything yourself (note that this is pretty complex). If you
use the "Cloudera Express" option for Cloudera Manager, 8 GB is the
absolute minimum, and if you're going to try out "Cloudera Enterprise" you
need at least 10 GB. But the number of nodes, exactly which services you're
running, exactly what else is going on on the machines, etc. all affects
this.

avatar
Guru
To answer your other question though, I wouldn't expect a different data format to make a difference here. There's enough competition for memory on the system that Hive is constantly doing garbage collection, and that shouldn't have anything to do with what format Sqoop is using for the data.

avatar
Explorer

Sean

I understand.

I am trying to emphasize here that the job did finally run without delay or failure. It did put 6 or so tables in the directory.

But the job didnt show those tables on Hive or Impala editor.

I tried to add the table using the file option but the parquet format didnt seem to convert well for the table to show up.

Bottom line, the sqoop job put the file where it should be put but I just dont see it showing up on the editor (database does not reflects those tables have been created,

That is why I am asking are there other ways to get the table to show.

 

When I need CM or CM Enterprise I would choose another laptop. Right now I have a regular laptop for my learning and demo purpose.

 

 

avatar
Guru
Remember that schema and data are two separate things in Hadoop. The files
in the directory, they are simply data files. For tables to show up in Hive
or Impala, you have to import or define the schema for those tables in Hive
Metastore. I believe the reason you're not seeing the tables is because the
logs you posted show that Hive is constantly struggling with garbage
collection. My guess is that Sqoop tried to import the schema into Hive but
timed out - but I don't know for sure unless you can post the text
outputted of the Sqoop command.

To be clear - are you running a QuickStart VM? I'm a little unclear on
exactly what your environment is.