Member since
08-01-2017
65
Posts
3
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
26265 | 01-22-2018 10:19 AM | |
3026 | 01-22-2018 10:18 AM | |
2841 | 07-05-2017 02:33 PM | |
3252 | 05-26-2017 09:01 AM |
01-22-2018
10:19 AM
1 Kudo
Used impyla. Works like a charm 🙂
... View more
01-22-2018
10:18 AM
It was actually a problem in the twitter JSON. When we get a tweet wich is actually a retweet, flume truncates it. Problem solved 🙂
... View more
01-12-2018
05:40 PM
@PY Paul-Arnaud, Many thanks for your quick answer. Unfortunately that's not a datatype that hiveql recognizes... 😞
... View more
01-12-2018
04:12 PM
Hello friends, I'm working with a Hive table which fetches twitter data from flume / oozie. The problem is that Hive is truncating the tweet text field... Can anybody please help me solving this issue? Here's the table: CREATE EXTERNAL TABLE tweets (
id bigint,
created_at string,
source STRING,
favorited BOOLEAN,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>,
retweet_count:INT>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
lang string,
retweet_count int,
text string,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>
)
PARTITIONED BY (datehour int)
LOCATION
'hdfs://192.168.1.11:8020/user/flume/tweets'
... View more
Labels:
- Labels:
-
Apache Hive
11-20-2017
04:05 PM
I'm trying to get a table located in hive (hortonworks) ,to collect
some twitter data to implement on a machine learning project, using
pyhive since pyhs2 is not supported by python3.6. Here's my code: from pyhive import hive
conn = hive.Connection(host='192.168.1.11', port=10000, auth='NOSASL')
import pandas as pd
import sys
df = pd.read_sql("SELECT * FROM my_table", conn)
print(sys.getsizeof(df))
df.head() When compiling I get this error: Traceback (most recent call last):
File "C:\Users\PWST112\Desktop\import.py", line 44, in <module>
conn = hive.Connection(host='192.168.1.11', port=10000, auth='NOSASL')
File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site- packages\pyhive\hive.py", line 164, in __init__
response = self._client.OpenSession(open_session_req)
File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site- packages\TCLIService\TCLIService.py", line 187, in OpenSession
return self.recv_OpenSession()
File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site-packages\TCLIService\TCLIService.py", line 199, in recv_OpenSession
(fname, mtype, rseqid) = iprot.readMessageBegin()
File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site-packages\thrift\protocol\TBinaryProtocol.py", line 148, in readMessageBegin
name = self.trans.readAll(sz)
File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site-packages\thrift\transport\TTransport.py", line 60, in readAll
chunk = self.read(sz - have)
File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site-packages\thrift\transport\TTransport.py", line 161, in read
self.__rbuf = BufferIO(self.__trans.read(max(sz, self.__rbuf_size)))
File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site-packages\thrift\transport\TSocket.py", line 132, in read
message='TSocket read 0 bytes')
thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
[Finished in 0.3s] Here is the PIP list: beautifulsoup4 (4.6.0)
bleach (2.0.0)
colorama (0.3.9)
cycler (0.10.0)
decorator (4.0.11)
entrypoints (0.2.3)
ez-setup (0.9)
future (0.16.0)
html5lib (0.999999999)
impala (0.2)
ipykernel (4.6.1)
ipython (6.1.0)
ipython-genutils (0.2.0)
ipywidgets (6.0.0)
jedi (0.10.2)
Jinja2 (2.9.6)
jsonschema (2.6.0)
jupyter (1.0.0)
jupyter-client (5.1.0)
jupyter-console (5.1.0)
jupyter-core (4.3.0)
konlpy (0.4.4)
MarkupSafe (1.0)
matplotlib (2.0.2)
mistune (0.7.4)
nbconvert (5.2.1)
nbformat (4.3.0)
nltk (3.2.4)
notebook (5.0.0)
numpy (1.13.1+mkl)
pandas (0.20.3)
pandocfilters (1.4.1)
pickleshare (0.7.4)
pip (9.0.1)
prompt-toolkit (1.0.14)
pure-sasl (0.4.0)
Pygments (2.2.0)
PyHive (0.5.0)
pyhs2 (0.6.0)
pyparsing (2.2.0)
python-dateutil (2.6.0)
pytz (2017.2)
pyzmq (16.0.2)
qtconsole (4.3.0)
sasl (0.2.1)
scikit-learn (0.18.2)
scipy (0.19.1)
setuptools (28.8.0)
simplegeneric (0.8.1)
six (1.10.0)
testpath (0.3.1)
thrift (0.10.0)
thrift-sasl (0.3.0)
tornado (4.5.1)
traitlets (4.3.2)
wcwidth (0.1.7)
webencodings (0.5.1)
wheel (0.30.0)
widgetsnbextension (2.0.0) Can somebody help? I have my sandbox configured for "NONE" authentication, since the NOSASL option is not available. Best regards
... View more
Labels:
- Labels:
-
Apache Hive
10-04-2017
09:58 AM
@Dan Zaratsian I'm still around this problem... the query only crashes when I query 2 or more columns with the text column. If I don't query the text column, or query it alone it works... Do you have any suggestion, please? Many thanks in advance.
... View more
09-28-2017
03:35 PM
@Dan Zaratsian, Hope you're doing great. After a fresh install of HDP 2.6.1.0 I've tried to make the query. First I've made the external tweets table: ADD JAR /tmp/json-serde-1.3.8-jar-with-dependencies.jar;
CREATE EXTERNAL TABLE tweets (
id bigint,
created_at string,
source STRING,
favorited BOOLEAN,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>,
retweet_count:INT>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
lang string,
retweet_count int,
text string,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>
)
PARTITIONED BY (datehour int)
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( "ignore.malformed.json" = "true")
LOCATION
'hdfs://192.168.1.11:8020/user/flume/tweets' The hive query gives this error on hiveserver2: 2017-09-28 15:22:39,577 ERROR [HiveServer2-Background-Pool: Thread-1161]: SessionState (SessionState.java:printError(993)) - Status: Failed
2017-09-28 15:22:39,578 ERROR [HiveServer2-Background-Pool: Thread-1161]: SessionState (SessionState.java:printError(993)) - Vertex failed, vertexName=Map 1, vertexId=vertex_1506521964877_0091_1_00, diagnostics=[Task failed, taskId=task_1506521964877_0091_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"id":913124962290069505,"created_at":"Wed Sep 27 19:35:30 +0000 2017","source":"<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>","retweeted_status":{"text":"Mãe de @Cristiano, Dolores Aveiro no seu perfil de Instagram: \"Vim apoiar o meu @Sporting_CP\" #UCL #DiaDeSporting https://t.co/GnPPkt2CYG","user":{"screen_name":"sportingfanspt","name":"SPORTING FANS"},"retweet_count":43},"lang":"pt","retweet_count":0,"text":"RT @sportingfanspt: Mãe de @Cristiano, Dolores Aveiro no seu perfil de Instagram: \"Vim apoiar o meu @Sporting_CP\" #UCL #DiaDeSporting https…","user":{"screen_name":"ladraodoapito","name":"Arbitro com Voucher","friends_count":568,"followers_count":319,"statuses_count":1668,"verified":false,"utc_offset":3600,"time_zone":"Lisbon"},"datehour":2017092720}
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"id":913124962290069505,"created_at":"Wed Sep 27 19:35:30 +0000 2017","source":"<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>","retweeted_status":{"text":"Mãe de @Cristiano, Dolores Aveiro no seu perfil de Instagram: \"Vim apoiar o meu @Sporting_CP\" #UCL #DiaDeSporting https://t.co/GnPPkt2CYG","user":{"screen_name":"sportingfanspt","name":"SPORTING FANS"},"retweet_count":43},"lang":"pt","retweet_count":0,"text":"RT @sportingfanspt: Mãe de @Cristiano, Dolores Aveiro no seu perfil de Instagram: \"Vim apoiar o meu @Sporting_CP\" #UCL #DiaDeSporting https…","user":{"screen_name":"ladraodoapito","name":"Arbitro com Voucher","friends_count":568,"followers_count":319,"statuses_count":1668,"verified":false,"utc_offset":3600,"time_zone":"Lisbon"},"datehour":2017092720}
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"id":913124962290069505,"created_at":"Wed Sep 27 19:35:30 +0000 2017","source":"<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>","retweeted_status":{"text":"Mãe de @Cristiano, Dolores Aveiro no seu perfil de Instagram: \"Vim apoiar o meu @Sporting_CP\" #UCL #DiaDeSporting https://t.co/GnPPkt2CYG","user":{"screen_name":"sportingfanspt","name":"SPORTING FANS"},"retweet_count":43},"lang":"pt","retweet_count":0,"text":"RT @sportingfanspt: Mãe de @Cristiano, Dolores Aveiro no seu perfil de Instagram: \"Vim apoiar o meu @Sporting_CP\" #UCL #DiaDeSporting https…","user":{"screen_name":"ladraodoapito","name":"Arbitro com Voucher","friends_count":568,"followers_count":319,"statuses_count":1668,"verified":false,"utc_offset":3600,"time_zone":"Lisbon"},"datehour":2017092720}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:565)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
... 17 more Can you help? I'm only getting this error when I query the text column....
... View more
09-04-2017
08:34 AM
@Sindhu Many thanks for your answer... it didn't work but apparently worked with disable database check. It completed the upgrade at least. Many thanks!
... View more
08-28-2017
12:46 PM
Dear @Sagar Shimpi I've followed your article but when I hit restart I get this error: Can you help? Best regards
... View more
08-25-2017
02:18 PM
@Eric Periard have you managed to find any solution for this? I'm stuck in the same error for over a week. Many thanks in advance. Best regards
... View more