About mbigelow

mbigelow · ‎02-21-2017

Try this out. I think you need to use the version for which you are trying to upgrade to, but you can only do that if the parcel for it has been pushed to all of the nodes already. This may help you get the exact Hive version needed for the next command. schematool -dbType mysql -info. This will generate the upgrade scripts that the wizard is running. What I would try to do is create a copy of your metastore db and then manually run scripts to see exactly were it hits the error. schematool -dbType mysql -upgradeSchemaFrom <old-hive-version> -dryRun It just hit me, the DB was changed but not the tables. They will keep the original charset they were created with. Once you have the problematic table(s) you can changes those with this command. I tried finding any thing for Cloudera on if you should be making a smaller jump as that is the case sometimes but I couldn't find anything. You could try that as well, see if you can upgrade to something lower like CDH 5.5.x and then going to CDH 5.10.x. ALTER TABLE tbl_name [[DEFAULT] CHARACTER SET charset_name] [COLLATE collation_name]

mbigelow · ‎02-21-2017

What are you trying to achieve exactly? Do you just want to be able to search through the logs for key phrases? Do you want to all basic users to search the raw logs? Are you trying to hunt down problematic jobs?

mbigelow · ‎02-21-2017

Can you run this query in Hive w/o the driver? I believe the first two queries do not require MR job(s) and it is possible for it to work w/o proper user setup. The last query will require MR and that requires that the users exist. What user are you on when using the ODBC driver? Does that user exist in the docker quickstart container?

mbigelow · ‎02-21-2017

I am assuming that log aggregation was turned off as it doesn't trigger until a job completes which is useless for long running/streaming jobs. I recommend turning it back on and using yarn.log-aggregation.retain-check-interval-seconds to have the logs collected up on a regular basis. Solr/ES is really good for the counters/metrics and could be used for the logs as well.

mbigelow · ‎02-21-2017

I have seen this error on the HDP side. I had created the databases with utf8 charset and the scripts did not like that. On CDH I still use charset utf8. It seems that you have MySQL and charset utf8 is in use. My fix was to change them to latin1 but I was working with fresh databases and a fresh install.

mbigelow · ‎02-20-2017

If I recall correctly, edits logs will just be filled up to a certain size and then move to the next. In CM, the Namenode metrics for Transaction, Edit Log Syncs and Average Edit Log Sync Time would be better. Not sure if these are exposed by default.

mbigelow · ‎02-20-2017

That is probably the source in the spike in edits being written to the JNs. You could try to address it so reduce the impact.

mbigelow · ‎02-20-2017

The du switch gets the size for the given directory. The first number is the single replica size and the second number is the size at the full replication factor . The UI and even CM do a different calc and it is annoying as it isn't what I would call accurate. In the last few days I saw a JIRA related to it on how Non-DFS and the Reserved space are using in the calculation. I don't have the current calc in front of me but it is different. It is obvious when you tally up the space space used (including non-dfs), and unused, and even the percentage. It will never equal 100%. And it will never equate to your raw disk availability. I may get this wrong but it is related to amount of you have reserved for non-dfs data. That lops of the configured capacity but then the system also uses it to calculate the non-dfs used in a weird way that always says that there is more used than there ever is.

mbigelow · ‎02-18-2017

No, typically Worker nodes are just the process that do the work, Datanode, Impala daemon, NodeManager. In theory you could and have it on the OS disk (not on any HDFS disks) but you will eventually run into contention between the OS, logs, and the edits. But if you have a small cluster. My minimum, for a production cluster and/or HA, is three large, physical servers for the Master. The DBs (although I prefer to have the HMS DB on the Master nodes as well), gateway roles, CM can all be on VMs. Where is your third ZK instance? As that one will also have IO contention issues on a VM or on a Datanode.

mbigelow · ‎02-18-2017

I do think that you need to move the JN to the same/similar hardware to what you have the others on. You don't need to check the contents or the files itself. Since it is happening every few seconds it is just lagging behind and then catching up. So if you want to run any real loads on the cluster it needs to be moved to better hardware.

Online	Offline
Last Visited	‎03-25-2019 05:55 PM

Member Since	‎08-16-2016 08:51 PM
Last Visited	‎03-25-2019 05:55 PM
Posts	642
Kudos received	129

Cloudera Community

Re: Configuring the HDFS superuser in Kerberos

Re: Hive process crash

Re: Upgrade from CDH 5.11 Express to Enterprise

Re: Adding user to Cloudera Manager using REST AP...

Re: Running in non-interactive mode, and data appe...

Re: upgrade hive schema failed

Re: Log managmement for Long-running Spark Stream...

Re: Hive - ODBC driver - Error_State when querying...

Re: Log managmement for Long-running Spark Stream...

Re: upgrade hive schema failed

Re: Intermittently one of the journal nodes get ou...

Re: Intermittently one of the journal nodes get ou...

Re: HDFS storage check shows different values

Re: Intermittently one of the journal nodes get ou...

Re: Intermittently one of the journal nodes get ou...