Created 10-13-2023 11:39 PM
I created SequenceFiles using PySpark code below.
path='/data/seq_test2'
rdd = sc.parallelize([(1, "a1"), (2, "a2"), (3, "a3")])
rdd.saveAsSequenceFile(path)
Then I created an impala table.
CREATE EXTERNAL TABLE seq_test2
(key_column STRING,
value_column STRING )
STORED AS SEQUENCEFILE
LOCATION '/data/seq_test2'
Then the query "select * from seq_test2" shows a1, a2, a3 in key_column and null in value_column. But I expect to see 1,2,3 in key column and a1, a2, a3 in value_column.
How do I fix it?
Thank you.
Created 11-03-2023 11:10 AM
Hello Seaport,
This sounds like the Sequence data file created by pyspark is not being processed by the impala table correctly.
Can you test a few things to see where the discrepancy is coming from:
Created 11-07-2023 05:40 PM
@Seaport Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. If you are still experiencing the issue, can you provide the information @ezerihun has requested? Thanks.
Regards,
Diana Torres,