Member since 
    
	
		
		
		07-14-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                99
            
            
                Posts
            
        
                5
            
            
                Kudos Received
            
        
                4
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 1885 | 09-05-2018 09:58 AM | |
| 2509 | 07-31-2018 12:59 PM | |
| 1972 | 01-15-2018 12:07 PM | |
| 1720 | 11-23-2017 04:19 PM | 
			
    
	
		
		
		02-18-2021
	
		
		09:19 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi everybody,  I am trying the following approach to write data in to hive table.           import logging
from pyspark.sql import SparkSession
from pyspark.sql.functions import explode
from pyspark.sql.functions import split
from pyspark.streaming.kafka import KafkaUtils
import datetime
from pyspark.sql.functions import lit,unix_timestamp
from os.path import *
from pyspark import Row
warehouseLocation = abspath("spark-warehouse")
spark = SparkSession.builder.appName("spark_streaming").config("spark.sql.warehouse.dir", warehouseLocation).enableHiveSupport().getOrCreate()
kafka = "kafka"
offsets = "earliest"
servers = "server_1:port,server_2:port"
security_protocol = "SSL"
keystore_location = "keystore"
keystore_password = "keystore_password"
kafka_topic = "kafka_topic"
checkpoint_location ="/checkpoint/location"
def hiveInsert(df, batchId):
        df.createOrReplaceTempView("updates")
        spark.sql("insert into hive_db.hive_table select value, time_stamp from updates")
df = spark.readStream.format(kafka).option("startingoffsets", offsets).option("kafka.bootstrap.servers", servers).option("kafka.security.protocol", security_protocol).option("kafka.ssl.keystore.location", keystore_location).option("kafka.ssl.keystore.password", keystore_password).option("subscribe",kafka_topic).load().selectExpr("CAST(value AS STRING)").select('value').withColumn('time_stamp',lit(datetime.datetime.now().strftime('%Y%m%d%H%M')))
query = df.writeStream.foreachBatch(hiveInsert).start()
query.awaitTermination()           The above code is not working  Any pointers are of great help! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Spark
			
    
	
		
		
		08-21-2019
	
		
		10:29 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi, 
 I am trying to match multiple values in a string using hive regxp, looking for an optimal solution. 
 I want to match "first" and "1.11" from the below 
   
 column name is col:  
 This string is the first string with two decimals 1.11 and 2.22 with a special char / and some more extra string. 
  table name is t: 
 query I was using: 
  select * from t where t.col regexp '(?=.*first)(?=.*1.11)' 
   
 Could you please help me. 
 Thank you 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hive
			
    
	
		
		
		02-07-2019
	
		
		04:04 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Shu can you please help me 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-06-2019
	
		
		10:29 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi All,  I have a string  String:
some text with an ip 111.111.111.111 and a decimal 11.2323232 and some text here and then an int 1 and then some HTTP/1.1 with a 503 request and then another ip 222.222.222.222 and some imaginary 999.999.999.999  I want to output all the ip addresses in comma saperated. I tried the below  select regexp_replace(regexp_replace(String,'[^(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})]',' '),'\\s+',',');
+------------------------------------------------------------------------+--+
|                                  _c0                                   |
+------------------------------------------------------------------------+--+
| ,111.111.111.111,11.2323232,1,1.1,503,222.222.222.222,999.999.999.999  |
+------------------------------------------------------------------------+--+
  Expected output is : 111.111.111.111,222.222.222.222,999.999.999  Could you please help me 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hive
			
    
	
		
		
		10-16-2018
	
		
		01:35 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							@sadapa you can never insert a "?" in to a column which has a datatype int.  because you can never find a number as "?", and hive knows it.  I am not sure why you want to do that, but if you want to still convert a "?" in to a number, which you want to change it later, you can try ascii() 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-02-2018
	
		
		01:33 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Shu   Thankyou 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-02-2018
	
		
		11:40 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Carlton Patterson   myresults.coalesce(1).write.format('csv').save("/tmp/myresults.csv", header='true')   
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-01-2018
	
		
		01:14 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi All,  I need help to get the below result.  I have two tables  table name: match
+-----------------------------------+----------------+--+
|            hint                   | remarks        |
+-----------------------------------+----------------+--+
| 1.1.1.1                           | ip             |
| 123456789                         | contact        |
| http://123123123123123123.some_n  | url            |
+-----------------------------------+----------------+--+
  table name : t1
+-------------------------------------------------------------------------------+-------------------+--+
|                                     t1.text                                   |       t1.b        |
+-------------------------------------------------------------------------------+-------------------+--+
| This ip is found 1.1.1.1 and is matched with match                            | table name match  |
| This ip is found 1.1.1.2 and is matched with match                            | table name match  |
| This contact is found 123456789 and is matched with match                     | table name match  |
| This contact is found 123456789123456789 and is matched with match            | table name match  |
| This url is found http://123456789123456789.some_n and is matched with match  | table name match  |
+-------------------------------------------------------------------------------+-------------------+--+
  I want to search hint column of match table in text column of t1 table  and get complete text column values.  so, basically I want to do a query like  select t1.text from t1 join match where t1.text contains (any value in match.hint);  It will be helpful if this can be done in hive or I can live with pyspark, so pyspark help is also welcome  P.S: table t1 is a big table and match is a small table with limite values(say 1500).  Thank you 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hive
			
    
	
		
		
		09-18-2018
	
		
		01:12 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Andy LoPresto   Thats a nice idea, but I dont have leverage to user executescript or excecutestreamcommand, as there are no scripts/programs(including awk) waiting for me, also getting them is out of my hands, so looking for a solution with in my flex.  Thank you  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-18-2018
	
		
		01:09 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Shu   1. Sample data:  Every value is present in attributes(i.e. every flowfile is parsed and the value in the flowfile is assigned to attributes)  There are multiple flow files with the same value (user_name)in attributes.  ex:  flowfile1 attributes:  user_name: mark, file_in: 2018-09-18 15:00:00, file_out: 2018-09-18 15:01:00
user_name: michelle, file_in: 2018-09-18 15:00:02, file_out: 2018-09-18 15:01:01
user_name: mark, file_in: 2018-09-18 15:00:05, file_out: 2018-09-18 15:01:01
flowfile2 attributes:
user_name: mark, file_in: 2018-09-18 15:01:00, file_out: 2018-09-18 15:01:10
user_name: stella, file_in: 2018-09-18 15:01:12, file_out: 2018-09-18 15:01:21
    2.  I want to count all the flowfiles that have user_name (in the above example count of mark is 3 in both the flowfiles)  3.  Schema of the flow file is just as above 3 fields, which are assigned to attributes.  Thank you  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













