Member since
12-21-2016
83
Posts
5
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
31118 | 02-08-2017 05:56 AM | |
4152 | 01-02-2017 11:05 PM |
12-20-2020
04:38 PM
How to add a new column to an existing parquet table and how to update it ?
... View more
Labels:
- Labels:
-
Apache Hive
04-24-2020
09:32 AM
Traceback (most recent call last): File "consumer.py", line 8, in <module> consumer = KafkaConsumer('test', File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/kafka/consumer/group.py", line 355, in __init__ self._client = KafkaClient(metrics=self._metrics, **self.config) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/kafka/client_async.py", line 242, in __init__ self.config['api_version'] = self.check_version(timeout=check_timeout) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/kafka/client_async.py", line 907, in check_version version = conn.check_version(timeout=remaining, strict=strict, topics=list(self.config['bootstrap_topics_filter'])) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/kafka/conn.py", line 1228, in check_version if not self.connect_blocking(timeout_at - time.time()): File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/kafka/conn.py", line 337, in connect_blocking self.connect() File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/kafka/conn.py", line 426, in connect if self._try_handshake(): File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/kafka/conn.py", line 505, in _try_handshake self._sock.do_handshake() File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 1309, in do_handshake self._sslobj.do_handshake() ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1108) I am getting above error after running the program, Any inputs ?
... View more
04-23-2020
10:35 AM
Web Hdfs is disabled for our cluster.. Is there any other options ?
... View more
04-21-2020
04:41 PM
Hi,
I am trying to connect Kafka from my local machine to kafka kerberized cluster using python, but i am connect with below credentials. Could any guide me and you help is appreciated.
consumer = KafkaConsumer('test',bootstrap_servers='XXX:1234', #client_id= kafka-python- + __version__, request_timeout_ms=30000, connections_max_idle_ms=9 * 60 * 1000, reconnect_backoff_ms=50, reconnect_backoff_max_ms=1000, max_in_flight_requests_per_connection=5, receive_buffer_bytes=None, send_buffer_bytes=None, #socket_options= [(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)], sock_chunk_bytes=4096, # undocumented experimental option sock_chunk_buffer_count=1000, # undocumented experimental option retry_backoff_ms=100, metadata_max_age_ms=300000, security_protocol='SASL_SSL', ssl_context=None, ssl_check_hostname=True, ssl_cafile=None, ssl_certfile=None, ssl_keyfile=None, ssl_password=None, ssl_crlfile=None, api_version=None, api_version_auto_timeout_ms=2000, #selector=selectors.DefaultSelector, sasl_mechanism='GSSAPI', #sasl_plain_username= None, #sasl_plain_password='XXX', sasl_kerberos_service_name='XXX', # metrics configs metric_reporters=[], metrics_num_samples=2, metrics_sample_window_ms=30000)
for msg in consumer: print(msg)
Please guide and you help is appreciated.
Thanks
... View more
Labels:
- Labels:
-
Apache Kafka
-
Kerberos
04-20-2020
04:37 PM
Hi,
I am trying to connect and authenticate kerberized cluster using python program and read hdfs files. Could anyone help me to achieve it ?
Your help is appreciated.
Thanks
... View more
04-20-2020
03:17 PM
Hi, I am trying to connect from local machine to a kerberized kafka cluster through python as python client, could you please let me know what all the properties to include along with bootstrap server ? consumer = KafkaConsumer('test',bootstrap_servers='XXX.ORG:XXXX', #client_id= kafka-python- + __version__, request_timeout_ms=30000, connections_max_idle_ms=9 * 60 * 1000, reconnect_backoff_ms=50, reconnect_backoff_max_ms=1000, max_in_flight_requests_per_connection=5, receive_buffer_bytes=None, send_buffer_bytes=None, #socket_options= [(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)], sock_chunk_bytes=4096, # undocumented experimental option sock_chunk_buffer_count=1000, # undocumented experimental option retry_backoff_ms=100, metadata_max_age_ms=300000, security_protocol='SASL_SSL', ssl_context=None, ssl_check_hostname=True, ssl_cafile=None, ssl_certfile=None, ssl_keyfile=None, ssl_password=None, ssl_crlfile=None, api_version=None, api_version_auto_timeout_ms=2000, #selector=selectors.DefaultSelector, sasl_mechanism='GSSAPI', #sasl_plain_username= None, #sasl_plain_password='XXXX', sasl_kerberos_service_name='XXXX', # metrics configs metric_reporters=[], metrics_num_samples=2, metrics_sample_window_ms=30000) Your help is appreciated. Thanks
... View more
04-20-2020
03:07 PM
Hi All, I am trying to connect from Local machine to kafka cluster(kerberized Cluster) through python. Can anyone help what are the properties to specify for the krb5.conf file and other properties. your help is appreciated.
... View more
12-12-2019
04:10 PM
We even ran the MSCK repair table, but still no luck. Any other options ?
... View more
12-12-2019
02:17 PM
I am unable to create external hive table after manually deleting underlying hdfs location files of the table.
when table desc is statement is issued, it gives the describe of the table, but when select is performed on the table, then we are getting table doesn't exists. So we issued drop statement.
After issuing drop statement, then we again tried to create the table, but we are getting table already exists. Do we need to manually do a delete from the hive metastore ? or is there any way to forcefully re-create the table ? Please let me know.
... View more
Labels:
- Labels:
-
Apache Hive
08-17-2017
10:41 AM
I am getting error which i am trying to check if the hdfs directory or not and i am trying check it through oozie fs action and below is the code, however, i am getting error. Appreciate any help on this. </action> <decision name="deleteFrompraveenPostCondition"> <switch> <case to="Export"> ${fs:ex ists(/dev/praveen/test/*)} </case> <default to="statusLog"/> </switch> </decision> --------------------------------- ERROR ---------------------------------------------------------- Encountered "/", expected one of [<INTEGER_LITERAL>, <FLOATING_POINT_LITERAL>, <STRING_LITERAL>, "true", "false", "null", "(", ")", "-", "not", "!", "empty", <IDENTIFIER>] Appreciate any help on this ......
... View more
Labels:
- Labels:
-
Apache Oozie
07-21-2017
07:31 PM
Hive - i would like to calculate percentage of column and based on the percentage i would like to load the data into another table(if the percentage of n is less 20%) or else not to load colA
y
y
y
n
------------------
Output: -- This is what i am expecting
y 80% n 20%
... View more
Labels:
- Labels:
-
Apache Hive
05-17-2017
06:22 AM
Can you eloberate more details as i am facing same issue and when i checked it i see the java-json.jar in the oozie shared lib path, However, i don't see it in the sqoop-client/lib path on the gateway.
... View more
04-27-2017
06:24 PM
Replication is for the data-node failure, when Human deletes the data, data will lost where-ever it resides be it on any number of nodes. and this is moved into trash and if needed we can get it back within certain interval time.
... View more
04-25-2017
06:07 PM
Thanks and Yes, i can re-write it, however i am looking options if there is any way to get it back and when i drop the table,immediately the commit will occur to the meta-store, which might be causing for not recovering the hive table schema back. Any other alternatives options ?
... View more
04-25-2017
05:31 PM
I have a Hive external table and Unfortunately, the schema of the table got dropped and i want to get back the schema. Is there any ways to get it back ? I do understand that Hdfs is a file system, However, try to see if there are any possibilities.
... View more
- Tags:
- Hadoop Core
- HDFS
- hiverserver2
- Upgrade to HDP 2.5.3 : ConcurrentModificationException When Executing Insert Overwrite : Hive
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
03-10-2017
06:41 PM
Thanks, This is done on the folder level of encryption, however i am looking on the fields level of encryption rather than entire file. I know ranger has this feature, however, this only help us on the hive column level of encryption when i query it, but eventually, when i look at the raw file, i could still see the sensitive data.
... View more
03-09-2017
09:55 PM
I have a requirement, where i need to encrypt certain sensitive data before landing/ingestion into Hadoop. Just want to understand, how Hadoop process these kind of encrypted data(be it in hive or pig or any map-reduce). Do we need to write specific programs? to read this kind of encrypted files in hadoop or do we need to set any parameters on hive table or pig session to read these this kind of encrypt data ? Any ideas/thoughts or suggestions ?
... View more
- Tags:
- Data Processing
- Mapreduce
- Pig
- Sqoop
- Upgrade to HDP 2.5.3 : ConcurrentModificationException When Executing Insert Overwrite : Hive
Labels:
- Labels:
-
Apache Hadoop
-
Apache Pig
-
Apache Sqoop
02-16-2017
03:15 AM
I found a solution to export this kind of data to any RDBS in the form of UTF8 or any other character set by giving the specific character set after the database/host name.
... View more
02-15-2017
11:40 PM
Yes, It is displaying the special characters with good reading format after adding serilization encoding property, however,while i am exporting the data to teradata with sqoop statement as using a connection manager i getting as non-readable characters in teradata. Attached is the screen shot(teradat.png). I suspect sqoop is not reconizing the special chracters correctly or do i need to use any specific teradata jar's while exporting the data ? I have attached the ingested data(after-ingestion-data-into-hadoop.png) and the showed the data in hive after adding encoding property(after-adding-encoding-to-hive-table.png), where as the same data is not same in Teradata. I would like to see the same type of characters in teradata as-well. Any Help appreciated. )
... View more
02-14-2017
07:47 PM
I have requirement to handle file which contains special characters (like trademarks, non-utf and so on..)
... View more
Labels:
- Labels:
-
Apache Hive
02-11-2017
12:21 AM
Could you let me know if the data is not in Quto's, How to Handle it and below is the example column 1| column 2 first|second|last In the above example the first|second are actually one column. Could you let me know how to handle if the data is not in quoto's and if the delimiter is part of the data. Any suggestion or help is appreciated.
... View more
02-09-2017
03:03 AM
Thanks and Yah .. Open Csv serde will do it.However, i am looking if there any other alternatives.
... View more
02-08-2017
06:35 AM
In hive, One of my column data contains Pipe as the part of the data('|'), however, while exporting data from this table, we need to export the data as the pipe('|') as the delimiter between each fields, How to handle if the delimiters as part of the data while creating the flat file from the hive table.
... View more
- Tags:
- Data Processing
- Upgrade to HDP 2.5.3 : ConcurrentModificationException When Executing Insert Overwrite : Hive
Labels:
- Labels:
-
Apache Hive
02-08-2017
06:04 AM
Just curious to find..... Why Sqoop will not allow us to create an external table while sqooping the data from RDBMS ?
... View more
Labels:
- Labels:
-
Apache Sqoop
02-08-2017
05:56 AM
2 Kudos
Here is work around, which i have implemented. For an external table, If you are trying to drop a partition and as-well would like to delete the data. This can be achieved as below. 1. Alter external table as internal table -- by changing the TBL properties as external =false 2. Drop the partitions -- when you drop the partitions, data pertained to the partitions will also be dropped as now this table is managed table 3. Alter back the table as external=True. By doing this, there more controlled on what we are deleting and drop the partitions rather than using hadoop rm command
... View more
02-08-2017
05:47 AM
In a flat file, i have certain keywords, which are sensitive, i would like to identify these sensitive keywords row by row. These keywords could appear in any column of the flat file. Appreciate any help. Either in Hive or Pig anything is fine.
... View more
- Tags:
- Data Processing
- Pig
- Upgrade to HDP 2.5.3 : ConcurrentModificationException When Executing Insert Overwrite : Hive
Labels:
- Labels:
-
Apache Pig
02-07-2017
10:44 PM
In hive,One of my column data contains Pipe as the part of the data('|'),however, while exporting data from this table,we need to export the data as the pipe('|') as the delimiter between each fields,How to handle if the delimiters as part of the data while creating the flat file from the hive table.
... View more
Labels:
- Labels:
-
Apache Hive
01-27-2017
12:07 AM
While i was trying to drop a partition based on date range, i unable to achieve it and below is what i am trying to do alter table X drop partition(partdate <= (select max(partdate) as from Y)); I am getting error, while i am executing above query -- cannot recognize input near '(' 'select' 'max' in constant Looks like the alter statement accepts only the constant value rather then a sub-query. Basicaly - Alter table drop partition ( partdate <= date -- and this date need to fetch from another table). Any help is appreciated.
... View more
Labels:
- Labels:
-
Apache Hive
01-26-2017
06:06 AM
I would like to create a hive table for an xml files
... View more
- Tags:
- Data Processing
- Hive
Labels:
- Labels:
-
Apache Hive