Member since
11-23-2021
13
Posts
0
Kudos Received
0
Solutions
08-30-2022
10:05 PM
It is internal table. Data in hive is normal, it can select/update/delete from openquery in sql server and can query from dbeaver.
... View more
08-30-2022
06:45 PM
Hello @RangaReddy , I run in Hortonwork, and hive table is orc format. What you mean hive catalog or in-memory catalog?
... View more
07-20-2022
12:33 AM
Hello all,
I cannot read data from hive orc table and load to dataframe. If someone know, could you help me to fix it? Below is my scripts:
from pyspark import SparkContext, SparkConf from pyspark.conf import SparkConf from pyspark.sql import SparkSession from pyspark.sql import HiveContext,SQLContext
spark = SparkSession.builder.appName("Testing....").enableHiveSupport().getOrCreate()
hive_context = HiveContext(spark) sqlContext = SQLContext(spark)
df_pgw=hive_context.sql("select * from orc_table") Hive Session ID = 79c9e6c0-1649-41dc-9aea-493c0f62d046 22/07/20 11:50:52 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory. 22/07/20 11:50:56 WARN HiveMetastoreCatalog: Unable to infer schema for table orc_table from file format ORC (inference mode: INFER_AND_SAVE). Using metastore schema.
df_pgw.show()
=> ....Don't have data presents
Thanks,
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
07-14-2022
03:36 AM
Hello everyone, I am new for spark processing. I need you help my problem when transform data from plat file to hive orc table. Below is my process flow using pyspark: 1 - Use pyspark to load flat file to dataframe 2 - Transform data in dataframe and insert to hive table(parquet) 3 - Insert data from hive(parquet) to orc format Step 1 and 2 is fast, but step 3 is too slow because it use much memory. Sometime it stuck cannot continue. Please help to advice and recommend the better flow. Thanks Here is sample code: -- loading.py import pyspark from pyspark import SparkContext, SparkConf from pyspark.conf import SparkConf from pyspark.sql import SparkSession from pyspark.sql import HiveContext,SQLContext from pyspark.sql.types import StructType,StructField, StringType, IntegerType from pyspark.sql.types import ArrayType, DoubleType, BooleanType from pyspark.sql.functions import input_file_name,col,array_contains spark = SparkSession.builder.appName("testing..").enableHiveSupport().getOrCreate() df_schema = StructType([ StructField("col1",StringType(),True) ,StructField("col2",StringType(),True) ,StructField("col3",StringType(),True) ,StructField("col4",StringType(),True) ,StructField("col5",StringType(),True) ,StructField("filename",StringType(),True) ,StructField("YEARKEY",StringType(),True) ,StructField("MONTHKEY",StringType(),True) ,StructField("DAYKEY",StringType(),True) ]) dsCSV = spark.read.format("csv").options(header='False', delimiter=';').schema(df_schema).load("/user/test/processing/data/out").withColumn("filename",input_file_name()) dsCSV.registerTempTable("cdr_data") df_insert=spark.sql("select * from cdr_data") df_insert.write.option("compression","snappy").mode('append').format('parquet').partitionBy("yearkey","monthkey","daykey").saveAsTable(('landing.test_loading')) dsCSV.unpersist() dsCSV.unpersist(True) df_insert.unpersist() df_insert.unpersist(True) -- cdr_hivesql.sh v_history_records="insert into staging.test_loading select * from landing.test_loading" echo "====================>>>`date +%Y%m%d%H%M%S`<<<=====================" echo "" echo $v_history_records hive -e "$v_history_records;" Note: -- landing.test_loading(parquet format) -- staging.test_loading(orc format)
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hive
-
Apache Spark
04-06-2022
06:32 PM
Hello @steven-matison , Could you provide of value of each properties?
... View more
04-05-2022
02:54 AM
Hello Team,
I have scenario below need you help which processor use in apache nifi.
I want to pull near real time files from FTP server and put to HDFS. but after pull files I want to move those files to other path in the same ftp server or rename to new name like (.tmp).
Please help me to design this data flow.
Thank
Regards,
... View more
Labels:
- Labels:
-
Apache NiFi
-
HDFS
01-22-2022
11:44 PM
Hi, you mean it exists in hive CDH source?
... View more
01-20-2022
07:57 PM
Hello Everyone, I want to use hive class library name "org.apache.hadoop.hive.ql.udf". Could you tell me where I can download it? Regards,
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive