Support Questions

jijy · ‎05-08-2023

I have issue in importing the data from dataframe converted to csv then uploading it into hive but its not loading properly .

My procedure:

1st I took a Data frame from database and converted into a csv ,which has 343 columns and 24 lakhs rows

2nd I took the csv file to hive and I loaded the data to hive using load data code to table which i created directly by connect the hive to same database .

this is what ,I am doing.

In this case , my issue is for some rows it taking proper values but for some is null or 0.

then i took a sample of 5 rows and I checked manually then i find out in csv file some rows there are some extra comma .so I manually removed and tried ,it worked but this cant be happening in real-time .

so pls help me on this by giving some suggestion.

VidyaSargur · ‎05-08-2023

@jijy, Welcome to our community! To help you get the best possible answer, I have tagged our Hive experts @smruti, @asish, @Asok, @tjangid who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.

Regards,

Vidya Sargur,
Community Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

tj2007 · ‎05-08-2023

Hello @jijy,

could you please share your create table statement and some sample data?

Regards

ggangadharan · ‎06-06-2023

Once the data has been read from database, you don't need to write the same data to file (i.e. CSV ) . Instead you can write directly into hive table using DataFrame API's. Once the Data has been loaded you query the same from hive.

df.write.mode(SaveMode.Overwrite).saveAsTable("hive_records")

Ref - https://spark.apache.org/docs/2.4.7/sql-data-sources-hive-tables.html

Sample Code Snippet

df = spark.read \
    .format("jdbc") \
    .option("url", "jdbc:postgresql://<server name>:5432/<DBNAME>") \
    .option("dbtable", "\"<SourceTableName>\"") \
    .option("user", "<Username>") \
    .option("password", "<Password>") \
    .option("driver", "org.postgresql.Driver") \
    .load()

df.write.mode('overwrite').saveAsTable("<TargetTableName>")


From hive 

INFO  : Compiling command(queryId=hive_20230607042851_fa703b79-d6e0-4a4c-936c-efa21ec00a10): select count(*) from TBLS_POSTGRES
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20230607042851_fa703b79-d6e0-4a4c-936c-efa21ec00a10); Time taken: 0.591 seconds
INFO  : Executing command(queryId=hive_20230607042851_fa703b79-d6e0-4a4c-936c-efa21ec00a10): select count(*) from TBLS_POSTGRES
.
.
.
+------+
| _c0  |
+------+
| 122  |
+------+

Cloudera Community

Support Questions

Regarding data import into hive from csv

Import CSV data into HBase using importtsv

Import HBase data in csv format using pig

Sqoop import data from hive to csv.

NULL columns importing csv data into table

import csv data into hive table orc format

How to process corrupted CSV data with NiFi

SQOOP import of "image" data type into hive

How do I import data from csv file into Hbase?

Error on importer using csv

Hive meta data failed to import