About zack_riesland

zack_riesland · ‎05-28-2019

It took me a while to look in /var/log/messages, but I found a ton of ntpd errors. It turns out that our nodes were having issues getting out to the servers they were configured to use for sync. I switched all the configurations to use a local premise server and restarted everything. I'm hoping that will be the full solution to our issue.

zack_riesland · ‎05-26-2019

Thanks @Geoffrey Shelton Okot Just to clarify, we corrected all the hosts files and re-started all the services. I have a hunch that there are is some hbase data somewhere that is now corrupt because it is associated with the incorrect fqdn. But I wouldn't expect hive to have any relationship to hbase. Does zookeeper use hbase for record keeping?

zack_riesland · ‎05-25-2019

Hello, We've recently been seeing some weird behavior from our cluster. Things will work well for a day or two, and then Hive server and several region servers will go offline. When I dig into the logs, they all reference zookeeper: 2019-05-24 20:12:15,108 ERROR nodes.PersistentEphemeralNode (PersistentEphemeralNode.java:deleteNode(323)) - Deleting node: /hiveserver2/serverUri=<servername>:10010;version=1.2.1000.2.6.1.0-129;sequence=0000000187 org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hiveserver2/serverUri=<servername>:10010;version=1.2.1000.2.6.1.0-129;sequence=0000000187 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873) at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:239) at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:234) at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) at org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:230) at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:215) at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:42) at org.apache.curator.framework.recipes.nodes.PersistentEphemeralNode.deleteNode(PersistentEphemeralNode.java:315) at org.apache.curator.framework.recipes.nodes.PersistentEphemeralNode.close(PersistentEphemeralNode.java:274) at org.apache.hive.service.server.HiveServer2$DeRegisterWatcher.process(HiveServer2.java:334) at org.apache.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:61) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510) 2019-05-24 20:12:15,110 ERROR server.HiveServer2 (HiveServer2.java:process(338)) - Failed to close the persistent ephemeral znode However, when I look in the zookeeper logs, I don't see anything. If I re-start the failed services, they will run for several hours, and then the process repeats. We haven't changed any settings on the cluster, BUT, 2 things have changed recently: 1 - A couple weeks ago, some IT guys made a mistake and accidentally changed the /etc/hosts files We fixed this, and re-started everything on the cluster. 2 - Those changes in (1) were part of some major network changes and we seem to have a lot more latency. With all of that said, I really need some help figuring this out. Could it be stale HBase wal files somewhere? Could that cause Hive server to fail? Is there a zookeeper timeout setting I can change to help? Any tips would be much appreciated.

zack_riesland · ‎10-11-2018

I'm following the instructions here, but I still can't seem to get it working. I'm pointing to the same driver with the same driver class name. However, my URL is a bit different. I'm trying to connect to an AWS Aurora instance. If I use a connection string like this: jdbc:mysql://my-url.us-east-1.rds.amazonaws.com:1433;databaseName=my_db_name I get the error below. If I remove the port and db name, I get an error that says "Unable to execute SQL ... due to org.apache.commons.dbcp.SQLNestedException: Cannot create PoolableConnectionFactory (Communications link failure" Any ideas? ExecuteSQL[id=df4d1531-3056-1f5a-9d32-fa30462c23ba] Unable to execute SQL select query <query> for StandardFlowFileRecord[uuid=7d70ed35-ae97-47e8-a860-0a2fa75fa2ef,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1538684794331-5, container=default, section=5], offset=939153, length=967],offset=0,name=properties.json,size=967] due to org.apache.commons.dbcp.SQLNestedException: Cannot create PoolableConnectionFactory (Cannot load connection class because of underlying exception: 'java.lang.NumberFormatException: For input string: "1433;databaseName=my_db_name"'.); routing to failure: org.apache.nifi.processor.exception.ProcessException: org.apache.commons.dbcp.SQLNestedException...

zack_riesland · ‎07-12-2018

I was able to get this to work by using the insertInto() function, rather than the saveAsTable() function.

zack_riesland · ‎07-12-2018

Thanks @hmatta Printing schema for sqlDFProdDedup: root |-- time_of_event_day: date (nullable = true) |-- endpoint_id: integer (nullable = true) ... |-- time_of_event: integer (nullable = true) ... |-- source_file_name: string (nullable = true) Printing schema for deviceData: root ... |-- endpoint_id: integer (nullable = true) |-- source_file_name: string (nullable = true) ... |-- start_dt_unix: long (nullable = true) |-- end_dt_unix: long (nullable = true) Printing schema for incrementalKeyed (result of joining 2 sets above): root |-- source_file_name: string (nullable = true) |-- ingest_timestamp: timestamp (nullable = false) ... |-- endpoint_id: integer (nullable = true) ... |-- time_of_event: integer (nullable = true) ... |-- time_of_event_day: date (nullable = true)

zack_riesland · ‎07-11-2018

I have a hive table (in the glue metastore in AWS) like this: CREATE EXTERNAL TABLE `events_keyed`( `source_file_name` string, `ingest_timestamp` timestamp, ... `time_of_event` int ...) PARTITIONED BY ( `time_of_event_day` date) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'my_location' TBLPROPERTIES ( 'PARQUET.COMPRESSION'='SNAPPY', 'transient_lastDdlTime'='1531187782') I want to append data to it from spark: val deviceData = hiveContext.table(deviceDataDBName + "." + deviceDataTableName) val incrementalKeyed = sqlDFProdDedup.join(broadcast(deviceData), $"prod_clean.endpoint_id" === $"$deviceDataTableName.endpoint_id" && $"prod_clean.time_of_event" >= $"$deviceDataTableName.start_dt_unix" && $"prod_clean.time_of_event" <= coalesce($"$deviceDataTableName.end_dt_unix"), "inner") .select( $"prod_clean.source_file_name", $"prod_clean.ingest_timestamp", ... $"prod_clean.time_of_event", ... $"prod_clean.time_of_event_day" ) // this show good data: incrementalKeyed.show(20, false) incrementalKeyed.repartition($"time_of_event_day") .write .partitionBy("time_of_event_day") .format("hive") .mode("append") .saveAsTable(outputDBName + "." + outputTableName + "_keyed") But this gives me a failure: Exception encountered reading prod data: org.apache.spark.SparkException: Requested partitioning does not match the events_keyed table: Requested partitions: Table partitions: time_of_event_day What am I doing wrong? How can I accomplish the append operation I'm trying to get?

zack_riesland · ‎06-14-2018

I'm sorry, @shaleen somani - this was over a year ago and I don't remember the details any more. My guess is that our primary and secondary name nodes had failed over for some reason. I've found that when this happens, things continue to "work", but not quite right and it can be hard to pin down. You can use the hdfs haadmin utility to check the status. Good luck!

zack_riesland · ‎05-24-2018

Thanks Matt, My issue was firewall related. I'm all set now. Thanks for your help!

zack_riesland · ‎05-23-2018

Thanks @Matt Clarke You must be back from the NSA days 🙂 Your message is helpful, but I'm still not able to access the from my browser laptop. Here's what I've got: I have a RHEL 7.5 server running in EC2, in a VPC. It's running Nifi 1.6.0 using all vanilla settings. I can access the server using NoMachine and interract with Nifi in the browser directly on the machine. I added a SecurityGroup to open port 8080. As you said, the logs list about 4 different URLs - they are all different IPs associated with the machine. But none of them work from my laptop (which is in the VPC via VPN). I also tried setting the nifi.web.http.host value, and I also tried changing to a different port (restarting after each change). I even tried setting the Security Group to allow "all traffic" from "everywhere". So I don't think ports are the issue. (Interestingly, if I set the nifi.web.http.host value, I am no longer able to access nifi in the browser on the host machine using 'localhost') So... Any other ideas? I'm feeling a little stuck...

Online	Offline
Last Visited	‎06-10-2019 05:13 PM

Member Since	‎02-04-2016 01:07 PM
Last Visited	‎06-10-2019 05:13 PM
Posts	189
Kudos received	70

Cloudera Community

Re: Help with spark partition syntax (scala)

Re: Can I control naming patterns for HDFS chunks

Re: How to connect to Spark2 Thrift Server via JDB...

Re: Hive: Convert int timestamp to date

Re: How to clear temp data from dataflow / nifi?

Re: Help diagnosing zookeeper timeouts

Re: Help diagnosing zookeeper timeouts

Help diagnosing zookeeper timeouts

Re: how to configure and connect mysql with nifi ...

Re: Help with spark partition syntax (scala)

Re: Help with spark partition syntax (scala)

Help with spark partition syntax (scala)

Re: Hive queries taking LONG time to start

Re: What are best practices for NiFi development i...

Re: What are best practices for NiFi development i...