I created SequenceFiles using PySpark code below.
path='/data/seq_test2'
rdd = sc.parallelize([(1, "a1"), (2, "a2"), (3, "a3")])
rdd.saveAsSequenceFile(path)
Then I created an impala table.
CREATE EXTERNAL TABLE seq_test2
(key_column STRING,
value_column STRING )
STORED AS SEQUENCEFILE
LOCATION '/data/seq_test2'
Then the query "select * from seq_test2" shows a1, a2, a3 in key_column and null in value_column. But I expect to see 1,2,3 in key column and a1, a2, a3 in value_column.
How do I fix it?
Thank you.