Member since
02-04-2016
189
Posts
70
Kudos Received
9
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3656 | 07-12-2018 01:58 PM | |
7671 | 03-08-2018 10:44 AM | |
3618 | 06-24-2017 11:18 AM | |
23041 | 02-10-2017 04:54 PM | |
2218 | 01-19-2017 01:41 PM |
05-28-2019
06:08 PM
It took me a while to look in /var/log/messages, but I found a ton of ntpd errors. It turns out that our nodes were having issues getting out to the servers they were configured to use for sync. I switched all the configurations to use a local premise server and restarted everything. I'm hoping that will be the full solution to our issue.
... View more
05-26-2019
11:52 AM
Thanks @Geoffrey Shelton Okot Just to clarify, we corrected all the hosts files and re-started all the services. I have a hunch that there are is some hbase data somewhere that is now corrupt because it is associated with the incorrect fqdn. But I wouldn't expect hive to have any relationship to hbase. Does zookeeper use hbase for record keeping?
... View more
05-25-2019
12:03 PM
Hello, We've recently been seeing some weird behavior from our cluster. Things will work well for a day or two, and then Hive server and several region servers will go offline. When I dig into the logs, they all reference zookeeper: 2019-05-24 20:12:15,108 ERROR nodes.PersistentEphemeralNode (PersistentEphemeralNode.java:deleteNode(323)) - Deleting node: /hiveserver2/serverUri=<servername>:10010;version=1.2.1000.2.6.1.0-129;sequence=0000000187
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hiveserver2/serverUri=<servername>:10010;version=1.2.1000.2.6.1.0-129;sequence=0000000187
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:239)
at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:234)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
at org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:230)
at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:215)
at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:42)
at org.apache.curator.framework.recipes.nodes.PersistentEphemeralNode.deleteNode(PersistentEphemeralNode.java:315)
at org.apache.curator.framework.recipes.nodes.PersistentEphemeralNode.close(PersistentEphemeralNode.java:274)
at org.apache.hive.service.server.HiveServer2$DeRegisterWatcher.process(HiveServer2.java:334)
at org.apache.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:61)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2019-05-24 20:12:15,110 ERROR server.HiveServer2 (HiveServer2.java:process(338)) - Failed to close the persistent ephemeral znode However, when I look in the zookeeper logs, I don't see anything. If I re-start the failed services, they will run for several hours, and then the process repeats. We haven't changed any settings on the cluster, BUT, 2 things have changed recently: 1 - A couple weeks ago, some IT guys made a mistake and accidentally changed the /etc/hosts files We fixed this, and re-started everything on the cluster. 2 - Those changes in (1) were part of some major network changes and we seem to have a lot more latency. With all of that said, I really need some help figuring this out. Could it be stale HBase wal files somewhere? Could that cause Hive server to fail? Is there a zookeeper timeout setting I can change to help? Any tips would be much appreciated.
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Hive
10-11-2018
05:50 PM
I'm following the instructions here, but I still can't seem to get it working. I'm pointing to the same driver with the same driver class name. However, my URL is a bit different. I'm trying to connect to an AWS Aurora instance. If I use a connection string like this: jdbc:mysql://my-url.us-east-1.rds.amazonaws.com:1433;databaseName=my_db_name I get the error below. If I remove the port and db name, I get an error that says "Unable to execute SQL ... due to org.apache.commons.dbcp.SQLNestedException: Cannot create PoolableConnectionFactory (Communications link failure" Any ideas? ExecuteSQL[id=df4d1531-3056-1f5a-9d32-fa30462c23ba] Unable to execute SQL select query <query> for StandardFlowFileRecord[uuid=7d70ed35-ae97-47e8-a860-0a2fa75fa2ef,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1538684794331-5, container=default, section=5], offset=939153, length=967],offset=0,name=properties.json,size=967] due to org.apache.commons.dbcp.SQLNestedException: Cannot create PoolableConnectionFactory (Cannot load connection class because of underlying exception: 'java.lang.NumberFormatException: For input string: "1433;databaseName=my_db_name"'.); routing to failure: org.apache.nifi.processor.exception.ProcessException: org.apache.commons.dbcp.SQLNestedException...
... View more
07-12-2018
01:58 PM
I was able to get this to work by using the insertInto() function, rather than the saveAsTable() function.
... View more
07-12-2018
10:29 AM
Thanks @hmatta Printing schema for sqlDFProdDedup:
root
|-- time_of_event_day: date (nullable = true)
|-- endpoint_id: integer (nullable = true)
...
|-- time_of_event: integer (nullable = true)
...
|-- source_file_name: string (nullable = true)
Printing schema for deviceData:
root
...
|-- endpoint_id: integer (nullable = true)
|-- source_file_name: string (nullable = true)
...
|-- start_dt_unix: long (nullable = true)
|-- end_dt_unix: long (nullable = true)
Printing schema for incrementalKeyed (result of joining 2 sets above):
root
|-- source_file_name: string (nullable = true)
|-- ingest_timestamp: timestamp (nullable = false)
...
|-- endpoint_id: integer (nullable = true)
...
|-- time_of_event: integer (nullable = true)
...
|-- time_of_event_day: date (nullable = true)
... View more
07-11-2018
06:38 PM
I have a hive table (in the glue metastore in AWS) like this:
CREATE EXTERNAL TABLE `events_keyed`(
`source_file_name` string,
`ingest_timestamp` timestamp,
...
`time_of_event` int
...)
PARTITIONED BY (
`time_of_event_day` date)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'my_location'
TBLPROPERTIES (
'PARQUET.COMPRESSION'='SNAPPY',
'transient_lastDdlTime'='1531187782')
I want to append data to it from spark:
val deviceData = hiveContext.table(deviceDataDBName + "." + deviceDataTableName)
val incrementalKeyed = sqlDFProdDedup.join(broadcast(deviceData),
$"prod_clean.endpoint_id" === $"$deviceDataTableName.endpoint_id"
&& $"prod_clean.time_of_event" >= $"$deviceDataTableName.start_dt_unix"
&& $"prod_clean.time_of_event" <= coalesce($"$deviceDataTableName.end_dt_unix"),
"inner")
.select(
$"prod_clean.source_file_name",
$"prod_clean.ingest_timestamp",
...
$"prod_clean.time_of_event",
...
$"prod_clean.time_of_event_day"
)
// this show good data:
incrementalKeyed.show(20, false)
incrementalKeyed.repartition($"time_of_event_day")
.write
.partitionBy("time_of_event_day")
.format("hive")
.mode("append")
.saveAsTable(outputDBName + "." + outputTableName + "_keyed")
But this gives me a failure:
Exception encountered reading prod data:
org.apache.spark.SparkException: Requested partitioning does not match the events_keyed table:
Requested partitions:
Table partitions: time_of_event_day
What am I doing wrong? How can I accomplish the append operation I'm trying to get?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
06-14-2018
10:46 AM
I'm sorry, @shaleen somani - this was over a year ago and I don't remember the details any more. My guess is that our primary and secondary name nodes had failed over for some reason. I've found that when this happens, things continue to "work", but not quite right and it can be hard to pin down. You can use the hdfs haadmin utility to check the status. Good luck!
... View more
05-24-2018
12:04 PM
Thanks Matt, My issue was firewall related. I'm all set now. Thanks for your help!
... View more
05-23-2018
08:27 PM
Thanks @Matt Clarke You must be back from the NSA days 🙂 Your message is helpful, but I'm still not able to access the from my browser laptop. Here's what I've got: I have a RHEL 7.5 server running in EC2, in a VPC. It's running Nifi 1.6.0 using all vanilla settings. I can access the server using NoMachine and interract with Nifi in the browser directly on the machine. I added a SecurityGroup to open port 8080. As you said, the logs list about 4 different URLs - they are all different IPs associated with the machine. But none of them work from my laptop (which is in the VPC via VPN). I also tried setting the nifi.web.http.host value, and I also tried changing to a different port (restarting after each change). I even tried setting the Security Group to allow "all traffic" from "everywhere". So I don't think ports are the issue. (Interestingly, if I set the nifi.web.http.host value, I am no longer able to access nifi in the browser on the host machine using 'localhost') So... Any other ideas? I'm feeling a little stuck...
... View more