Getting Started: Tutorial 1



After I launched the sqoop job and verify the files in the categories directory, I get only 2 files instead of 4.


[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse/categories
Found 2 items
-rw-r--r-- 1 cloudera hive 0 2017-02-06 10:34 /user/hive/warehouse/categories/_SUCCESS
-rw-r--r-- 1 cloudera hive 1344 2017-02-06 10:34 /user/hive/warehouse/categories/part-m-00000.avro
[cloudera@quickstart ~]$


Can any one tell me if this is normal or something went wrong? If something went wrong, what went wrong and how to fix it?





Hi Geoffrey,


That’s normal on a single node setup. In the tutorial, there’s a note underneath the screenshot that shows the output of hadoop fs -ls /user/hive/warehouse/categories:


Note: The number of .parquet files shown will be equal to the number of mappers used by Sqoop. On a single-node you will just see one, but larger clusters will have a greater number of files.


I do notice that your file extension is .avro although the commands in the tutorial imports the data as parquet files. But, it should still be one .avro file on a single-node setup.