05-19-2017 06:05 AM
I'm seeing the following behavior when running some queries in CDH 5.9.1 (Note: I am _not_ the owner of the cluster, and I have 0 flexibility on which CDH version to use).
I have a simple partitioned table, with about 7 fields, using 3 of them for partitioning. One of those fields (used for partitioning) is "date", which is the date of certain exchanges that took place.
I want to see if I have anything stored in that table for specific dates. So I'm running the following query from beeline:
$> SELECT DISTINCT date FROM my_table WHERE date = "2016_12_15";
1 row selected ( 102.635 seconds )
Now, judging from this output, I assume that there is content for those dates. So, I issue the following query:
$> SELECT * FROM my_table WHERE date = "2016_12_15" LIMIT 5;
id col2 col3 date
No rows selected (0.312 seconds)
Given the result of the 1st query I would expect to get some results back in the 2nd query, but as you can see this is not the case.
When I check the HDFS directly, issuing
$ hadoop fs -ls /mydb/my_table/date=2016_12_15
ls: `/mydb/my_table/date=2016_12_15': No such file or directory
I then take it a step further and issue the following:
$> SELECT DISTINCT "foo" FROM my_table WHERE date = "2016_12_15";
1 row selected ( 133.86 seconds )
As far as I can tell, this is buggy behavior and it's actually the first time that I come across it. I may have missed a documentation entry describing that behavior, so any pointers would be appreciated.
If I can do anything more to help resolve the issue, let me know.
What am I getting wrong?
05-21-2017 10:18 PM
what is the persmission on that folder in hdfs . chown and chmod .
are you having the right permission to read write excute .
which user are you using to excute beeline cli