Member since
07-14-2017
99
Posts
5
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1420 | 09-05-2018 09:58 AM | |
1903 | 07-31-2018 12:59 PM | |
1406 | 01-15-2018 12:07 PM | |
1312 | 11-23-2017 04:19 PM |
02-18-2021
09:19 AM
Hi everybody, I am trying the following approach to write data in to hive table. import logging
from pyspark.sql import SparkSession
from pyspark.sql.functions import explode
from pyspark.sql.functions import split
from pyspark.streaming.kafka import KafkaUtils
import datetime
from pyspark.sql.functions import lit,unix_timestamp
from os.path import *
from pyspark import Row
warehouseLocation = abspath("spark-warehouse")
spark = SparkSession.builder.appName("spark_streaming").config("spark.sql.warehouse.dir", warehouseLocation).enableHiveSupport().getOrCreate()
kafka = "kafka"
offsets = "earliest"
servers = "server_1:port,server_2:port"
security_protocol = "SSL"
keystore_location = "keystore"
keystore_password = "keystore_password"
kafka_topic = "kafka_topic"
checkpoint_location ="/checkpoint/location"
def hiveInsert(df, batchId):
df.createOrReplaceTempView("updates")
spark.sql("insert into hive_db.hive_table select value, time_stamp from updates")
df = spark.readStream.format(kafka).option("startingoffsets", offsets).option("kafka.bootstrap.servers", servers).option("kafka.security.protocol", security_protocol).option("kafka.ssl.keystore.location", keystore_location).option("kafka.ssl.keystore.password", keystore_password).option("subscribe",kafka_topic).load().selectExpr("CAST(value AS STRING)").select('value').withColumn('time_stamp',lit(datetime.datetime.now().strftime('%Y%m%d%H%M')))
query = df.writeStream.foreachBatch(hiveInsert).start()
query.awaitTermination() The above code is not working Any pointers are of great help!
... View more
Labels:
- Labels:
-
Apache Spark
08-21-2019
10:29 AM
Hi,
I am trying to match multiple values in a string using hive regxp, looking for an optimal solution.
I want to match "first" and "1.11" from the below
column name is col:
This string is the first string with two decimals 1.11 and 2.22 with a special char / and some more extra string.
table name is t:
query I was using:
select * from t where t.col regexp '(?=.*first)(?=.*1.11)'
Could you please help me.
Thank you
... View more
Labels:
- Labels:
-
Apache Hive
02-07-2019
04:04 PM
@Shu can you please help me
... View more
02-06-2019
10:29 AM
Hi All, I have a string String:
some text with an ip 111.111.111.111 and a decimal 11.2323232 and some text here and then an int 1 and then some HTTP/1.1 with a 503 request and then another ip 222.222.222.222 and some imaginary 999.999.999.999 I want to output all the ip addresses in comma saperated. I tried the below select regexp_replace(regexp_replace(String,'[^(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})]',' '),'\\s+',',');
+------------------------------------------------------------------------+--+
| _c0 |
+------------------------------------------------------------------------+--+
| ,111.111.111.111,11.2323232,1,1.1,503,222.222.222.222,999.999.999.999 |
+------------------------------------------------------------------------+--+
Expected output is : 111.111.111.111,222.222.222.222,999.999.999 Could you please help me
... View more
Labels:
- Labels:
-
Apache Hive
10-16-2018
01:35 PM
@sadapa you can never insert a "?" in to a column which has a datatype int. because you can never find a number as "?", and hive knows it. I am not sure why you want to do that, but if you want to still convert a "?" in to a number, which you want to change it later, you can try ascii()
... View more
10-02-2018
01:33 PM
@Shu Thankyou
... View more
10-02-2018
11:40 AM
@Carlton Patterson myresults.coalesce(1).write.format('csv').save("/tmp/myresults.csv", header='true')
... View more
10-01-2018
01:14 PM
Hi All, I need help to get the below result. I have two tables table name: match
+-----------------------------------+----------------+--+
| hint | remarks |
+-----------------------------------+----------------+--+
| 1.1.1.1 | ip |
| 123456789 | contact |
| http://123123123123123123.some_n | url |
+-----------------------------------+----------------+--+
table name : t1
+-------------------------------------------------------------------------------+-------------------+--+
| t1.text | t1.b |
+-------------------------------------------------------------------------------+-------------------+--+
| This ip is found 1.1.1.1 and is matched with match | table name match |
| This ip is found 1.1.1.2 and is matched with match | table name match |
| This contact is found 123456789 and is matched with match | table name match |
| This contact is found 123456789123456789 and is matched with match | table name match |
| This url is found http://123456789123456789.some_n and is matched with match | table name match |
+-------------------------------------------------------------------------------+-------------------+--+
I want to search hint column of match table in text column of t1 table and get complete text column values. so, basically I want to do a query like select t1.text from t1 join match where t1.text contains (any value in match.hint); It will be helpful if this can be done in hive or I can live with pyspark, so pyspark help is also welcome P.S: table t1 is a big table and match is a small table with limite values(say 1500). Thank you
... View more
Labels:
- Labels:
-
Apache Hive
09-18-2018
01:12 PM
@Andy LoPresto Thats a nice idea, but I dont have leverage to user executescript or excecutestreamcommand, as there are no scripts/programs(including awk) waiting for me, also getting them is out of my hands, so looking for a solution with in my flex. Thank you
... View more
09-18-2018
01:09 PM
@Shu 1. Sample data: Every value is present in attributes(i.e. every flowfile is parsed and the value in the flowfile is assigned to attributes) There are multiple flow files with the same value (user_name)in attributes. ex: flowfile1 attributes: user_name: mark, file_in: 2018-09-18 15:00:00, file_out: 2018-09-18 15:01:00
user_name: michelle, file_in: 2018-09-18 15:00:02, file_out: 2018-09-18 15:01:01
user_name: mark, file_in: 2018-09-18 15:00:05, file_out: 2018-09-18 15:01:01
flowfile2 attributes:
user_name: mark, file_in: 2018-09-18 15:01:00, file_out: 2018-09-18 15:01:10
user_name: stella, file_in: 2018-09-18 15:01:12, file_out: 2018-09-18 15:01:21
2. I want to count all the flowfiles that have user_name (in the above example count of mark is 3 in both the flowfiles) 3. Schema of the flow file is just as above 3 fields, which are assigned to attributes. Thank you
... View more