Support Questions

jijy · ‎05-08-2023

I have issue in importing the data from dataframe converted to csv then uploading it into hive but its not loading properly .

My procedure:

1st I took a Data frame from database and converted into a csv ,which has 343 columns and 24 lakhs rows

2nd I took the csv file to hive and I loaded the data to hive using load data code to table which i created directly by connect the hive to same database .

this is what ,I am doing.

In this case , my issue is for some rows it taking proper values but for some is null or 0.

then i took a sample of 5 rows and I checked manually then i find out in csv file some rows there are some extra comma .so I manually removed and tried ,it worked but this cant be happening in real-time .

so pls help me on this by giving some suggestion.

VidyaSargur · ‎05-08-2023

@jijy, Welcome to our community! To help you get the best possible answer, I have tagged our Hive experts @smruti, @asish, @Asok, @tjangid who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.

Regards,

Vidya Sargur,
Community Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

tj2007 · ‎05-08-2023

Hello @jijy,

could you please share your create table statement and some sample data?

Regards

ggangadharan · ‎06-06-2023

Once the data has been read from database, you don't need to write the same data to file (i.e. CSV ) . Instead you can write directly into hive table using DataFrame API's. Once the Data has been loaded you query the same from hive.

df.write.mode(SaveMode.Overwrite).saveAsTable("hive_records")

Ref - https://spark.apache.org/docs/2.4.7/sql-data-sources-hive-tables.html

Sample Code Snippet

df = spark.read \
    .format("jdbc") \
    .option("url", "jdbc:postgresql://<server name>:5432/<DBNAME>") \
    .option("dbtable", "\"<SourceTableName>\"") \
    .option("user", "<Username>") \
    .option("password", "<Password>") \
    .option("driver", "org.postgresql.Driver") \
    .load()

df.write.mode('overwrite').saveAsTable("<TargetTableName>")


From hive 

INFO  : Compiling command(queryId=hive_20230607042851_fa703b79-d6e0-4a4c-936c-efa21ec00a10): select count(*) from TBLS_POSTGRES
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20230607042851_fa703b79-d6e0-4a4c-936c-efa21ec00a10); Time taken: 0.591 seconds
INFO  : Executing command(queryId=hive_20230607042851_fa703b79-d6e0-4a4c-936c-efa21ec00a10): select count(*) from TBLS_POSTGRES
.
.
.
+------+
| _c0  |
+------+
| 122  |
+------+

Support Questions

Regarding data import into hive from csv