Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to read hexadecimal escape sequences from Spark using shc connector.

avatar
New Contributor

In HBase, i have a column qualifier in which i have a data like below:

ReportV10\x00\x00\x00\x00\x02\x02\x02

When i am reading this table from spark using shc connect, i am getting junk characters in result. Below is the piece of code i am using to read a HBase table:

catalog='''{
"table":{"namespace":"db1","name":"tb1"},
"rowkey":"key",
"columns":{
"rowkey":{"cf":"rowkey","col":"key","type":"string"},
"nf_hh0":{"cf":"nf","col":"hh0","type":"string"}
}
}'''
df=spark.read.option("catalog",catalog).format("org.apache.spark.sql.execution.datasources.hbase").load()

df.show(1,False)

+------------------------------------------------+
| rowkey                          | nf_hh0                | 
+---------------------------+------------------- +
|26273707950926220...|ReportV10��   |
+---------------------------+--------------------+

Spark version: 2.3.2.3.1.0.319-3

HBase version: 2.0.2.3.1.0.319-3

Python version: 2.7.5

Question: Is there any way to read those hexadecimal escape sequences as it is in a dataframe.

4 REPLIES 4

avatar
Community Manager

Hi @ayukus0705 Welcome to our community! To help you get the best possible answer, I have tagged in our Spark experts @RangaReddy @Babasaheb who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Master Collaborator

Hi @ayukus0705 

The nf_hh0 column data appears to be stored in a format other than string. When you try to read this data using a string data type, it may lead to above issue.

To resolve this issue, you can either change the data type of the column to match the actual data format, or convert the data to a string format.

 

 

avatar
New Contributor

Hi @RangaReddy 

Thanks for looking into my question.

change the data type of the column to match the actual data format - I tried passing binary in catalog but had no luck.

convert the data to a string format - It will result in data manipulation on HBase which is not practically a possible solution for us. Also, data size is somewhere around 50-60 TB.

I am looking for an option where we can directly read those hexadecimal escape sequences(i.e., ReportV10\x00\x00\x00\x00\x02\x02\x02) as it is in my spark dataframe.

Let me know if you need further clarity or information, we can setup a meeting to discuss this.

Regards,

Ayush

 

avatar
Community Manager

@ayukus0705, Did the response assist in resolving your query? If it did, kindly mark the relevant reply as the solution, as it will aid others in locating the answer more easily in the future. 



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: