Member since
08-01-2017
65
Posts
2
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
18222 | 01-22-2018 10:19 AM | |
1627 | 01-22-2018 10:18 AM | |
2113 | 07-05-2017 02:33 PM | |
1940 | 05-26-2017 09:01 AM |
02-20-2018
05:43 PM
@David Kaiser Many thanks for your quick answer. It's a small volume of data, just a couple of json files captured by flume (twitter) stored on hdfs and with some queries on hive. Many thanks once more. Kind regards
... View more
02-20-2018
05:14 PM
Hello, I'm new to Hadoop and I've deployed the sandbox into a VM with 32GB of RAM. However Hive queries and everything run very, very slowly. Can it be of the VM? Also I don't have multinodes, only a single node... can this deteriorate (considerably) the performance? Many thanks in advance. Best regards
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop
-
Apache Hive
02-20-2018
01:37 AM
@csguna many thanks for your answer. The problem is that I cannot convert the csv file with structs to a proper json using df.to_json Can you please help? Many thanks in advance. Kind regards
... View more
02-14-2018
09:09 AM
I have a table on hive wich I've downloaded to Pandas. On this table i've edited a complete column and now i wish to put it back onto hive. The problem is that this table has some arrays and therefore I can't use OpenCSV wich converts all columns to string. Here's an exemple of a row: 956527303395246080 , 1 , Thu Jan 25 14 : 00 : 55 + 0000 2018 , "<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>" , False ,, "{""urls"":[],""user_mentions"":[{""screen_name"":""librofm"",""name"":""Libro.fm""}],""hashtags"":[{""text"":""FireAndFury""}]}" , en , 0 , "In an attack on my mental health, I’m listening to #FireAndFury via @librofm" , "{""screen_name"":""maryruthless"",""name"":"":sparkles: Vincent :sparkles: "",""friends_count"":680,""followers_count"":226,""statuses_count"":3981,""verified"":false,""utc_offset"":-18000,""time_zone"":""Eastern Time (US & Canada)""}" , 2018012515 And the hive table: CREATE EXTERNAL TABLE test ( id bigint , sentiment INT , created_at string , source STRING , favorited BOOLEAN , retweeted_status STRUCT < text : STRING , user : STRUCT < screen_name : STRING , name : STRING >, retweet_count : INT >, entities STRUCT < urls : ARRAY < STRUCT < expanded_url : STRING >>, user_mentions : ARRAY < STRUCT < screen_name : STRING , name : STRING >>, hashtags : ARRAY < STRUCT < text : STRING >>>, lang string , retweet_count int , text string , user STRUCT < screen_name : STRING , name : STRING , friends_count : INT , followers_count : INT , statuses_count : INT , verified : BOOLEAN , utc_offset : INT , time_zone : STRING >
) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS TEXTFILE I've tought in converting it to json or xml... is this a good idea? Can anyone please help? Many thanks in advance.
... View more
Labels:
- Labels:
-
Apache Hive
01-22-2018
10:19 AM
Used impyla. Works like a charm 🙂
... View more
01-22-2018
10:18 AM
It was actually a problem in the twitter JSON. When we get a tweet wich is actually a retweet, flume truncates it. Problem solved 🙂
... View more
01-12-2018
05:40 PM
@PY Paul-Arnaud, Many thanks for your quick answer. Unfortunately that's not a datatype that hiveql recognizes... 😞
... View more
01-12-2018
04:12 PM
Hello friends, I'm working with a Hive table which fetches twitter data from flume / oozie. The problem is that Hive is truncating the tweet text field... Can anybody please help me solving this issue? Here's the table: CREATE EXTERNAL TABLE tweets (
id bigint,
created_at string,
source STRING,
favorited BOOLEAN,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>,
retweet_count:INT>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
lang string,
retweet_count int,
text string,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>
)
PARTITIONED BY (datehour int)
LOCATION
'hdfs://192.168.1.11:8020/user/flume/tweets'
... View more
- Tags:
- Hive
Labels:
- Labels:
-
Apache Hive
11-20-2017
04:05 PM
I'm trying to get a table located in hive (hortonworks) ,to collect
some twitter data to implement on a machine learning project, using
pyhive since pyhs2 is not supported by python3.6. Here's my code: from pyhive import hive
conn = hive.Connection(host='192.168.1.11', port=10000, auth='NOSASL')
import pandas as pd
import sys
df = pd.read_sql("SELECT * FROM my_table", conn)
print(sys.getsizeof(df))
df.head() When compiling I get this error: Traceback (most recent call last):
File "C:\Users\PWST112\Desktop\import.py", line 44, in <module>
conn = hive.Connection(host='192.168.1.11', port=10000, auth='NOSASL')
File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site- packages\pyhive\hive.py", line 164, in __init__
response = self._client.OpenSession(open_session_req)
File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site- packages\TCLIService\TCLIService.py", line 187, in OpenSession
return self.recv_OpenSession()
File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site-packages\TCLIService\TCLIService.py", line 199, in recv_OpenSession
(fname, mtype, rseqid) = iprot.readMessageBegin()
File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site-packages\thrift\protocol\TBinaryProtocol.py", line 148, in readMessageBegin
name = self.trans.readAll(sz)
File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site-packages\thrift\transport\TTransport.py", line 60, in readAll
chunk = self.read(sz - have)
File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site-packages\thrift\transport\TTransport.py", line 161, in read
self.__rbuf = BufferIO(self.__trans.read(max(sz, self.__rbuf_size)))
File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site-packages\thrift\transport\TSocket.py", line 132, in read
message='TSocket read 0 bytes')
thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
[Finished in 0.3s] Here is the PIP list: beautifulsoup4 (4.6.0)
bleach (2.0.0)
colorama (0.3.9)
cycler (0.10.0)
decorator (4.0.11)
entrypoints (0.2.3)
ez-setup (0.9)
future (0.16.0)
html5lib (0.999999999)
impala (0.2)
ipykernel (4.6.1)
ipython (6.1.0)
ipython-genutils (0.2.0)
ipywidgets (6.0.0)
jedi (0.10.2)
Jinja2 (2.9.6)
jsonschema (2.6.0)
jupyter (1.0.0)
jupyter-client (5.1.0)
jupyter-console (5.1.0)
jupyter-core (4.3.0)
konlpy (0.4.4)
MarkupSafe (1.0)
matplotlib (2.0.2)
mistune (0.7.4)
nbconvert (5.2.1)
nbformat (4.3.0)
nltk (3.2.4)
notebook (5.0.0)
numpy (1.13.1+mkl)
pandas (0.20.3)
pandocfilters (1.4.1)
pickleshare (0.7.4)
pip (9.0.1)
prompt-toolkit (1.0.14)
pure-sasl (0.4.0)
Pygments (2.2.0)
PyHive (0.5.0)
pyhs2 (0.6.0)
pyparsing (2.2.0)
python-dateutil (2.6.0)
pytz (2017.2)
pyzmq (16.0.2)
qtconsole (4.3.0)
sasl (0.2.1)
scikit-learn (0.18.2)
scipy (0.19.1)
setuptools (28.8.0)
simplegeneric (0.8.1)
six (1.10.0)
testpath (0.3.1)
thrift (0.10.0)
thrift-sasl (0.3.0)
tornado (4.5.1)
traitlets (4.3.2)
wcwidth (0.1.7)
webencodings (0.5.1)
wheel (0.30.0)
widgetsnbextension (2.0.0) Can somebody help? I have my sandbox configured for "NONE" authentication, since the NOSASL option is not available. Best regards
... View more
Labels:
- Labels:
-
Apache Hive
10-06-2017
02:06 AM
Still stuck at this point... can anyone please help?
... View more
10-04-2017
09:58 AM
@Dan Zaratsian I'm still around this problem... the query only crashes when I query 2 or more columns with the text column. If I don't query the text column, or query it alone it works... Do you have any suggestion, please? Many thanks in advance.
... View more
10-04-2017
09:56 AM
@Sindhu I'm still around this problem... the query only crashes when I query 2 or more columns with the text column. If I don't query the text column, or query it alone it works... Do you have any suggestion, please? Many thanks in advance.
... View more
09-29-2017
02:58 PM
The problem was lack of permissions... deactivated permissions on hive and the problem is solved.
... View more
09-28-2017
03:35 PM
@Dan Zaratsian, Hope you're doing great. After a fresh install of HDP 2.6.1.0 I've tried to make the query. First I've made the external tweets table: ADD JAR /tmp/json-serde-1.3.8-jar-with-dependencies.jar;
CREATE EXTERNAL TABLE tweets (
id bigint,
created_at string,
source STRING,
favorited BOOLEAN,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>,
retweet_count:INT>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
lang string,
retweet_count int,
text string,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>
)
PARTITIONED BY (datehour int)
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( "ignore.malformed.json" = "true")
LOCATION
'hdfs://192.168.1.11:8020/user/flume/tweets' The hive query gives this error on hiveserver2: 2017-09-28 15:22:39,577 ERROR [HiveServer2-Background-Pool: Thread-1161]: SessionState (SessionState.java:printError(993)) - Status: Failed
2017-09-28 15:22:39,578 ERROR [HiveServer2-Background-Pool: Thread-1161]: SessionState (SessionState.java:printError(993)) - Vertex failed, vertexName=Map 1, vertexId=vertex_1506521964877_0091_1_00, diagnostics=[Task failed, taskId=task_1506521964877_0091_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"id":913124962290069505,"created_at":"Wed Sep 27 19:35:30 +0000 2017","source":"<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>","retweeted_status":{"text":"Mãe de @Cristiano, Dolores Aveiro no seu perfil de Instagram: \"Vim apoiar o meu @Sporting_CP\" #UCL #DiaDeSporting https://t.co/GnPPkt2CYG","user":{"screen_name":"sportingfanspt","name":"SPORTING FANS"},"retweet_count":43},"lang":"pt","retweet_count":0,"text":"RT @sportingfanspt: Mãe de @Cristiano, Dolores Aveiro no seu perfil de Instagram: \"Vim apoiar o meu @Sporting_CP\" #UCL #DiaDeSporting https…","user":{"screen_name":"ladraodoapito","name":"Arbitro com Voucher","friends_count":568,"followers_count":319,"statuses_count":1668,"verified":false,"utc_offset":3600,"time_zone":"Lisbon"},"datehour":2017092720}
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"id":913124962290069505,"created_at":"Wed Sep 27 19:35:30 +0000 2017","source":"<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>","retweeted_status":{"text":"Mãe de @Cristiano, Dolores Aveiro no seu perfil de Instagram: \"Vim apoiar o meu @Sporting_CP\" #UCL #DiaDeSporting https://t.co/GnPPkt2CYG","user":{"screen_name":"sportingfanspt","name":"SPORTING FANS"},"retweet_count":43},"lang":"pt","retweet_count":0,"text":"RT @sportingfanspt: Mãe de @Cristiano, Dolores Aveiro no seu perfil de Instagram: \"Vim apoiar o meu @Sporting_CP\" #UCL #DiaDeSporting https…","user":{"screen_name":"ladraodoapito","name":"Arbitro com Voucher","friends_count":568,"followers_count":319,"statuses_count":1668,"verified":false,"utc_offset":3600,"time_zone":"Lisbon"},"datehour":2017092720}
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"id":913124962290069505,"created_at":"Wed Sep 27 19:35:30 +0000 2017","source":"<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>","retweeted_status":{"text":"Mãe de @Cristiano, Dolores Aveiro no seu perfil de Instagram: \"Vim apoiar o meu @Sporting_CP\" #UCL #DiaDeSporting https://t.co/GnPPkt2CYG","user":{"screen_name":"sportingfanspt","name":"SPORTING FANS"},"retweet_count":43},"lang":"pt","retweet_count":0,"text":"RT @sportingfanspt: Mãe de @Cristiano, Dolores Aveiro no seu perfil de Instagram: \"Vim apoiar o meu @Sporting_CP\" #UCL #DiaDeSporting https…","user":{"screen_name":"ladraodoapito","name":"Arbitro com Voucher","friends_count":568,"followers_count":319,"statuses_count":1668,"verified":false,"utc_offset":3600,"time_zone":"Lisbon"},"datehour":2017092720}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:565)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
... 17 more Can you help? I'm only getting this error when I query the text column....
... View more
09-27-2017
09:56 AM
I've made a fresh install of HDP 2.6.1.0 and I'm trying to run the same query I've ran on 2.5: ADD JAR hdfs://192.168.1.11:8020/user/admin/oozie-workflows/lib/json-serde-1.3.8-jar-with-dependencies.jar;
SELECT
t.retweeted_screen_name,
sum(retweets) AS total_retweets,
count(*) AS tweet_count
FROM (SELECT
retweeted_status.user.screen_name as retweeted_screen_name,
retweeted_status.text,
max(retweeted_status.retweet_count) as retweets
FROM tweets
GROUP BY retweeted_status.user.screen_name,
retweeted_status.text) t
GROUP BY t.retweeted_screen_name
ORDER BY total_retweets DESC
LIMIT 10; The table tweets is an external table with a few hundred rows... The problem is that the query runs like forever... Can you please help? Many thanks in advance
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez
-
Apache YARN
09-04-2017
09:37 AM
@sindhu many thanks for your answer... hiveserver2logs are clean... Which logs can I check to tackle the problem? I'll post here the yarn log which I think is clean. yarn.txt Also here it goes the oozie log: 2017-09-04 09:07:01,107 INFO ActionStartXCommand:520 - SERVER[sandbox.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-add-partition-wf] JOB[0000065-170901161649746-oozie-oozi-W] ACTION[0000065-170901161649746-oozie-oozi-W@:start:] Start action [0000065-170901161649746-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2017-09-04 09:07:01,107 INFO ActionStartXCommand:520 - SERVER[sandbox.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-add-partition-wf] JOB[0000065-170901161649746-oozie-oozi-W] ACTION[0000065-170901161649746-oozie-oozi-W@:start:] [***0000065-170901161649746-oozie-oozi-W@:start:***]Action status=DONE
2017-09-04 09:07:01,107 INFO ActionStartXCommand:520 - SERVER[sandbox.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-add-partition-wf] JOB[0000065-170901161649746-oozie-oozi-W] ACTION[0000065-170901161649746-oozie-oozi-W@:start:] [***0000065-170901161649746-oozie-oozi-W@:start:***]Action updated in DB!
2017-09-04 09:07:01,139 INFO WorkflowNotificationXCommand:520 - SERVER[sandbox.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000065-170901161649746-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000065-170901161649746-oozie-oozi-W
2017-09-04 09:07:01,139 INFO WorkflowNotificationXCommand:520 - SERVER[sandbox.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000065-170901161649746-oozie-oozi-W] ACTION[0000065-170901161649746-oozie-oozi-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000065-170901161649746-oozie-oozi-W@:start:
2017-09-04 09:07:01,154 INFO ActionStartXCommand:520 - SERVER[sandbox.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-add-partition-wf] JOB[0000065-170901161649746-oozie-oozi-W] ACTION[0000065-170901161649746-oozie-oozi-W@hive-add-partition] Start action [0000065-170901161649746-oozie-oozi-W@hive-add-partition] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2017-09-04 09:07:03,202 INFO HiveActionExecutor:520 - SERVER[sandbox.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-add-partition-wf] JOB[0000065-170901161649746-oozie-oozi-W] ACTION[0000065-170901161649746-oozie-oozi-W@hive-add-partition] Trying to get job [job_1504271777639_0005], attempt [1]
2017-09-04 09:07:03,216 INFO HiveActionExecutor:520 - SERVER[sandbox.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-add-partition-wf] JOB[0000065-170901161649746-oozie-oozi-W] ACTION[0000065-170901161649746-oozie-oozi-W@hive-add-partition] checking action, hadoop job ID [job_1504271777639_0005] status [RUNNING]
2017-09-04 09:07:03,217 INFO ActionStartXCommand:520 - SERVER[sandbox.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-add-partition-wf] JOB[0000065-170901161649746-oozie-oozi-W] ACTION[0000065-170901161649746-oozie-oozi-W@hive-add-partition] [***0000065-170901161649746-oozie-oozi-W@hive-add-partition***]Action status=RUNNING
2017-09-04 09:07:03,217 INFO ActionStartXCommand:520 - SERVER[sandbox.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-add-partition-wf] JOB[0000065-170901161649746-oozie-oozi-W] ACTION[0000065-170901161649746-oozie-oozi-W@hive-add-partition] [***0000065-170901161649746-oozie-oozi-W@hive-add-partition***]Action updated in DB!
2017-09-04 09:07:03,221 INFO WorkflowNotificationXCommand:520 - SERVER[sandbox.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000065-170901161649746-oozie-oozi-W] ACTION[0000065-170901161649746-oozie-oozi-W@hive-add-partition] No Notification URL is defined. Therefore nothing to notify for job 0000065-170901161649746-oozie-oozi-W@hive-add-partition
2017-09-04 09:17:12,662 INFO HiveActionExecutor:520 - SERVER[sandbox.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-add-partition-wf] JOB[0000065-170901161649746-oozie-oozi-W] ACTION[0000065-170901161649746-oozie-oozi-W@hive-add-partition] Trying to get job [job_1504271777639_0005], attempt [1]
2017-09-04 09:17:12,701 INFO HiveActionExecutor:520 - SERVER[sandbox.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-add-partition-wf] JOB[0000065-170901161649746-oozie-oozi-W] ACTION[0000065-170901161649746-oozie-oozi-W@hive-add-partition] checking action, hadoop job ID [job_1504271777639_0005] status [RUNNING]
2017-09-04 09:27:24,642 WARN ResumeXCommand:523 - SERVER[sandbox.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-add-partition-wf] JOB[0000065-170901161649746-oozie-oozi-W] ACTION[] E1100: Command precondition does not hold before execution, [workflow's status is RUNNING is not SUSPENDED], Error Code: E1100
2017-09-04 09:28:12,685 INFO HiveActionExecutor:520 - SERVER[sandbox.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-add-partition-wf] JOB[0000065-170901161649746-oozie-oozi-W] ACTION[0000065-170901161649746-oozie-oozi-W@hive-add-partition] Trying to get job [job_1504271777639_0005], attempt [1]
2017-09-04 09:28:12,704 INFO HiveActionExecutor:520 - SERVER[sandbox.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-add-partition-wf] JOB[0000065-170901161649746-oozie-oozi-W] ACTION[0000065-170901161649746-oozie-oozi-W@hive-add-partition] checking action, hadoop job ID [job_1504271777639_0005] status [RUNNING]
2017-09-04 09:38:24,684 WARN ResumeXCommand:523 - SERVER[sandbox.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-add-partition-wf] JOB[0000065-170901161649746-oozie-oozi-W] ACTION[] E1100: Command precondition does not hold before execution, [workflow's status is RUNNING is not SUSPENDED], Error Code: E1100
<br> Best regards
... View more
09-04-2017
08:34 AM
@Sindhu Many thanks for your answer... it didn't work but apparently worked with disable database check. It completed the upgrade at least. Many thanks!
... View more
09-01-2017
04:52 PM
I've installed a fresh HDP 2.6.1.0 and lost my past configuration files. Tried to configure the Twitter pipeline but I'm stuck at Oozie Workflow. When I launch the coordinator job to run every 60minutes, it hangs on running state forever. It simply doesn't do anything. This also happened in the past when I changed some memory configurations, so I suspect it must be it... I've tunned my memory configurations with the python script so I guess everything is ok... Can someone give me some hints to resolve this issue? Many thanks in advance. Best regards
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Oozie
08-28-2017
12:46 PM
Dear @Sagar Shimpi I've followed your article but when I hit restart I get this error: Can you help? Best regards
... View more
08-25-2017
02:18 PM
@Eric Periard have you managed to find any solution for this? I'm stuck in the same error for over a week. Many thanks in advance. Best regards
... View more
08-25-2017
12:12 PM
Hello guys, I'm trying to upgrade from hdp-2.4.0 to hdp-2.6.1 Updated ambari to 2.5 and all went ok, all services are green and all service checks run smoothly . When I do the express upgrade to hdp-2.6.1 it gives error in the last part . I've followed this article but when I restart the server it gives this error: Is there any workaround for this? Many thanks in advance. Best regards
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
08-24-2017
12:30 PM
Hey guys I'm hitting the same problem on HDP-2.6.1.0 I've followed your article @Sagar Shimpi But when I restart the server by: ambari-server restart, I get this: I attatch the ambari-server.log ambari-server.txt Can you please help? Many thanks in advance. Best regards
... View more
08-11-2017
09:04 AM
Hello @Dan Zaratsian I'm currently at HDP 2.6.1.0 Followed the instructions on the link you've gave me and used json serde 1.3.7 with dependencies (cannot found the 1.1.4) Created the table and I've used this json captured from flume: flumedata.zip Made the query and the same error persists: java.sql.SQLException: Error while
processing statement: FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed,
vertexName=Map 1, vertexId=vertex_1502379039867_0005_1_00,
diagnostics=[Task failed, taskId=task_1502379039867_0005_1_00_000000,
diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running
task:java.lang.RuntimeException: java.lang.RuntimeException: Hive
Runtime Error while closing operators
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing
operators
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:370)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:164)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error
20003]: An error occurred when trying to close the Operator running your
custom script.
at
org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:560)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:346)
... 15 more
], TaskAttempt 1 failed, info=[Error: Failure while running
task:java.lang.RuntimeException: java.lang.RuntimeException: Hive
Runtime Error while closing operators
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing
operators
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:370)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:164)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error
20003]: An error occurred when trying to close the Operator running your
custom script.
at
org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:560)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:346)
... 15 more
], TaskAttempt 2 failed, info=[Error: Failure while running
task:java.lang.RuntimeException: java.lang.RuntimeException: Hive
Runtime Error while closing operators
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing
operators
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:370)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:164)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error
20003]: An error occurred when trying to close the Operator running your
custom script.
at
org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:560)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:346)
... 15 more
], TaskAttempt 3 failed, info=[Error: Failure while running
task:java.lang.RuntimeException: java.lang.RuntimeException: Hive
Runtime Error while closing operators
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing
operators
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:370)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:164)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error
20003]: An error occurred when trying to close the Operator running your
custom script.
at
org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:560)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:346)
... 15 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1
killedTasks:0, Vertex vertex_1502379039867_0005_1_00 [Map 1]
killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to
VERTEX_FAILURE. failedVertices:1 killedVertices:0
... View more
08-04-2017
08:43 AM
Hello @Dan Zaratsian and many thanks once more. Sorry you're right, I've forgotten to select best answer. I can make queries in the hive table with this serde, except when I query the text column. For instance, if I use your script with ID and Lang columns it runs smoothly. If I query the text column, it gives error. I'll try to update HDP to version 2.6.1.0 and then let you know. Best regards
... View more
08-01-2017
09:15 AM
I'm trying to run a Python udf in hive to make some sentiment analysis on twitter data captured with flume. My twitter table code: CREATE EXTERNAL TABLE tweets (
id bigint,
created_at string,
source STRING,
favorited BOOLEAN,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>,
retweet_count:INT>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
lang string,
retweet_count int,
text string,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>
)
PARTITIONED BY (datehour int)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 'hdfs://192.168.0.73:8020/user/flume/tweets' My python code: import hashlib
import sys
for line in sys.stdin:
line = line.strip()
(lang, text) = line.split('\t')
positive = set(["love", "good", "great", "happy", "cool", "best", "awesome", "nice", "helpful", "enjoyed"])
negative = set(["hate", "bad", "stupid", "terrible", "unhappy"])
words = text.split()
word_count = len(words)
positive_matches = [1 for word in words if word in positive]
negative_matches = [-1 for word in words if word in negative]
st = sum(positive_matches) + sum(negative_matches)
if st > 0:
print ('\t'.join([lang, text, 'positive', str(word_count)]))
elif st < 0:
print ('\t'.join([lang, text, 'negative', str(word_count)]))
else:
print ('\t'.join([lang, text, 'neutral', str(word_count)])) And finally my Hive query: ADD JAR /tmp/json-serde-1.3.9-SNAPSHOT-jar-with-dependencies.jar;
ADD FILE /tmp/my_py_udf.py;
SELECT
TRANSFORM (lang, text)
USING 'python my_py_udf.py'
AS (lang, text, sentiment, word_count)
FROM tweets With this query I get error while closing operators. If I use only one variable in the python UDF the query runs successfuly if I make: text = line.replace('\n',' ') May it be from the SerDe in the split('\t')? Can anyone please help? I'm suck with this for the past 10 days...
... View more
Labels:
- Labels:
-
Apache Hive
08-01-2017
09:00 AM
@Dan Zaratsian I don't see how updating HDP can solve my problem. I think the problem maybe of the serde when creating the table. I'm currently using this: ADD JAR hdfs://192.168.0.73:8020/user/admin/oozie-workflows/lib/json-serde-1.3.8-jar-with-dependencies.jar;
CREATE EXTERNAL TABLE tweets (
id bigint,
created_at string,
source STRING,
favorited BOOLEAN,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>,
retweet_count:INT>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
lang string,
retweet_count int,
text string,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>
)
PARTITIONED BY (datehour int)
ROW FORMAT SERDE
'org.apache.hive.hcatalog.data.JsonSerDe'
WITH SERDEPROPERTIES ( "ignore.malformed.json" = "true")
LOCATION
'hdfs://192.168.0.73:8020/user/flume/tweets'
... View more
07-31-2017
04:14 PM
@Dan Zaratsian After some days I've reached the conclusion that the problem must be on the Json Serde because when I upload your table into Hive it works ok. I'm currently using json-serde-1.3.8-jar-with-dependencies.jar ... Many thanks in advance. Best regards
... View more
07-27-2017
09:25 AM
@Sindhu
No problem here it is.
The table is stored in Hive (via flume and oozie), I don't know how I can send it... The code seems to be ok and runs well locally.... import sys
import hashlib
for line in sys.stdin:
line = line.strip()
line = line.replace('\n',' ')
lang, text = line.split('\t',maxsplit=2)
positive = set(["love", "good", "great", "happy", "cool", "best", "awesome", "nice", "helpful", "enjoyed"])
negative = set(["hate", "bad", "stupid", "terrible", "unhappy"])
words = text.split()
word_count = len(words)
positive_matches = [1 for word in words if word in positive]
negative_matches = [-1 for word in words if word in negative]
st = sum(positive_matches) + sum(negative_matches)
if st > 0:
print ('\t'.join([lang, text, 'positive', str(word_count)]))
elif st < 0:
print ('\t'.join([lang, text, 'negative', str(word_count)]))
else:
print ('\t'.join([lang, text, 'neutral', str(word_count)]))
... View more
07-26-2017
08:54 AM
@Sindhu
Many thanks for your help. Found the error: createdAt, screenName, text = line.replace('\n',' ').split('\t') It only works when I have only 1 variable. With more than 1 it crashes. Is there any alternative to the split('\t') ?
... View more
07-25-2017
01:31 PM
@Dan Zaratsian Found the error: createdAt, screenName, text = line.replace('\n',' ').split('\t') It only works when I have only 1 variable. With more than 1 it crashes. Is there any alternative to the split('\t') ?
... View more