Member since
08-16-2016
642
Posts
131
Kudos Received
68
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3978 | 10-13-2017 09:42 PM | |
| 7477 | 09-14-2017 11:15 AM | |
| 3799 | 09-13-2017 10:35 PM | |
| 6042 | 09-13-2017 10:25 PM | |
| 6606 | 09-13-2017 10:05 PM |
02-21-2017
08:36 AM
Try this out. I think you need to use the version for which you are trying to upgrade to, but you can only do that if the parcel for it has been pushed to all of the nodes already. This may help you get the exact Hive version needed for the next command. schematool -dbType mysql -info. This will generate the upgrade scripts that the wizard is running. What I would try to do is create a copy of your metastore db and then manually run scripts to see exactly were it hits the error. schematool -dbType mysql -upgradeSchemaFrom <old-hive-version> -dryRun It just hit me, the DB was changed but not the tables. They will keep the original charset they were created with. Once you have the problematic table(s) you can changes those with this command. I tried finding any thing for Cloudera on if you should be making a smaller jump as that is the case sometimes but I couldn't find anything. You could try that as well, see if you can upgrade to something lower like CDH 5.5.x and then going to CDH 5.10.x. ALTER TABLE tbl_name [[DEFAULT] CHARACTER SET charset_name] [COLLATE collation_name]
... View more
02-21-2017
08:13 AM
What are you trying to achieve exactly? Do you just want to be able to search through the logs for key phrases? Do you want to all basic users to search the raw logs? Are you trying to hunt down problematic jobs?
... View more
02-21-2017
12:38 AM
Can you run this query in Hive w/o the driver? I believe the first two queries do not require MR job(s) and it is possible for it to work w/o proper user setup. The last query will require MR and that requires that the users exist. What user are you on when using the ODBC driver? Does that user exist in the docker quickstart container?
... View more
02-21-2017
12:26 AM
I am assuming that log aggregation was turned off as it doesn't trigger until a job completes which is useless for long running/streaming jobs. I recommend turning it back on and using yarn.log-aggregation.retain-check-interval-seconds to have the logs collected up on a regular basis. Solr/ES is really good for the counters/metrics and could be used for the logs as well.
... View more
02-21-2017
12:20 AM
I have seen this error on the HDP side. I had created the databases with utf8 charset and the scripts did not like that. On CDH I still use charset utf8. It seems that you have MySQL and charset utf8 is in use. My fix was to change them to latin1 but I was working with fresh databases and a fresh install.
... View more
02-20-2017
11:08 PM
If I recall correctly, edits logs will just be filled up to a certain size and then move to the next. In CM, the Namenode metrics for Transaction, Edit Log Syncs and Average Edit Log Sync Time would be better. Not sure if these are exposed by default.
... View more
02-20-2017
07:40 PM
1 Kudo
That is probably the source in the spike in edits being written to the JNs. You could try to address it so reduce the impact.
... View more
02-20-2017
12:25 PM
The du switch gets the size for the given directory. The first number is the single replica size and the second number is the size at the full replication factor . The UI and even CM do a different calc and it is annoying as it isn't what I would call accurate. In the last few days I saw a JIRA related to it on how Non-DFS and the Reserved space are using in the calculation. I don't have the current calc in front of me but it is different. It is obvious when you tally up the space space used (including non-dfs), and unused, and even the percentage. It will never equal 100%. And it will never equate to your raw disk availability. I may get this wrong but it is related to amount of you have reserved for non-dfs data. That lops of the configured capacity but then the system also uses it to calculate the non-dfs used in a weird way that always says that there is more used than there ever is.
... View more
02-18-2017
10:33 AM
No, typically Worker nodes are just the process that do the work, Datanode, Impala daemon, NodeManager. In theory you could and have it on the OS disk (not on any HDFS disks) but you will eventually run into contention between the OS, logs, and the edits. But if you have a small cluster. My minimum, for a production cluster and/or HA, is three large, physical servers for the Master. The DBs (although I prefer to have the HMS DB on the Master nodes as well), gateway roles, CM can all be on VMs. Where is your third ZK instance? As that one will also have IO contention issues on a VM or on a Datanode.
... View more
02-18-2017
10:21 AM
I do think that you need to move the JN to the same/similar hardware to what you have the others on. You don't need to check the contents or the files itself. Since it is happening every few seconds it is just lagging behind and then catching up. So if you want to run any real loads on the cluster it needs to be moved to better hardware.
... View more