About jijy

ggangadharan · ‎06-06-2023

Once the data has been read from database, you don't need to write the same data to file (i.e. CSV ) . Instead you can write directly into hive table using DataFrame API's. Once the Data has been loaded you query the same from hive. df.write.mode(SaveMode.Overwrite).saveAsTable("hive_records") Ref - https://spark.apache.org/docs/2.4.7/sql-data-sources-hive-tables.html Sample Code Snippet df = spark.read \ .format("jdbc") \ .option("url", "jdbc:postgresql://<server name>:5432/<DBNAME>") \ .option("dbtable", "\"<SourceTableName>\"") \ .option("user", "<Username>") \ .option("password", "<Password>") \ .option("driver", "org.postgresql.Driver") \ .load() df.write.mode('overwrite').saveAsTable("<TargetTableName>") From hive INFO : Compiling command(queryId=hive_20230607042851_fa703b79-d6e0-4a4c-936c-efa21ec00a10): select count(*) from TBLS_POSTGRES INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20230607042851_fa703b79-d6e0-4a4c-936c-efa21ec00a10); Time taken: 0.591 seconds INFO : Executing command(queryId=hive_20230607042851_fa703b79-d6e0-4a4c-936c-efa21ec00a10): select count(*) from TBLS_POSTGRES . . . +------+ | _c0 | +------+ | 122 | +------+

Online	Offline
Last Visited	‎05-08-2023 03:47 AM

Member Since	‎05-08-2023 12:05 AM
Last Visited	‎05-08-2023 03:47 AM
Posts	1

Cloudera Community

Re: Regarding data import into hive from csv