New Contributor
Posts: 1
Registered: ‎11-05-2014

Truncated Dataset

OK, downloaded the VM and uploaded a public dataset in CSV format (the CMS data from data dot gov).  The file is just under 2GB with about 10 million rows.  Imported the file - OK.  


Create a table using HUE pointing to the csv file and things look good EXCEPT it only pulls in 300K rows.  Is "Big Data" limited to just 300K rows and 64MB?  


FWIW, used the same data file and it imported fine into SQL Server in just a few seconds will all rows and no errors so the source file appears fine.  


Could there be some magical config that makes the VM work with, well, data sets with more than 300K rows?



Posts: 1,826
Kudos: 406
Solutions: 292
Registered: ‎07-31-2013

Re: Truncated Dataset

This would be unexpected. Can you check the size of the imported file? Also, what form of query/check are you performing to ascertain the fact that you're only able to see 300k rows off the total imported length.