Member since
07-16-2015
177
Posts
28
Kudos Received
19
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
14170 | 11-14-2017 01:11 AM | |
60613 | 11-03-2017 06:53 AM | |
4321 | 11-03-2017 06:18 AM | |
13554 | 09-12-2017 05:51 AM | |
1991 | 09-08-2017 02:50 AM |
04-11-2017
07:05 AM
1 Kudo
Severals idea : - Not sure it is mandatory to specify the path of the root znode where solr is installed on each host. I think you should supply it only at the end of the quorum - "host1:port,host2:port,host3:port/solr". (I think this is your issue) - Does Solr is installed inside the /solr znode ? - What is the size of the current "clusterstate.json" znode ? (does it exists ?) - Does your current "clusterstate.json" content is valid ? - Does zookeeper is hosted on the 3 nodes you have specified ? (can you ping these hosts ?) Good luck !
... View more
04-07-2017
08:45 AM
1 Kudo
At least one of the containers was killed by your application master because it requested too much memory. You need to identify if it was a mapper or a reducer. And then, you will need to tweak yarn configuration a little to increase the memory available for mapper and/or reducer. 1Go is rather "small" in my opinion. regards, mathieu
... View more
04-07-2017
07:57 AM
2 Kudos
Ok, I could reproduce the issue quickly. Two things : - 1 : the MapreduceIndexerTool generates metadata about the file. These "fields" are presented to Solr https://www.cloudera.com/documentation/enterprise/5-6-x/topics/search_metadata.html So if you need these fields, you need to add them to the schema of the collection. - 2 : if you don't need these fields, then you have to delete them before presenting the row to Solr. For this, the function "sanitizeUnknownSolrFields" if the right way to go. But you missplaced it in your morhpline. It should be inside the command [] and after the readCSV function (of course before the loadSolr function). Something like this : commands [
{
readCSV {
...
}
}
{
sanitizeUnknowSolrFields {
...
}
}
{
loadSolr {
...
}
}
]
... View more
04-07-2017
06:43 AM
Could you share the following things (if you can) ? - collection name and the schema.xml of the collection - indexer configuration used (indexer_def.xml ?) - morphline configuration used - the command line used for launching the batch indexation - a small csv sample ? This might enable me to pin-point the issue (or tell you that all seems fine for me). regards, mathieu
... View more
04-07-2017
03:03 AM
Does HiveServer2 is up and running ? If yes, does it run on port 10000 ? (per default it should). This is the service beeline try to connect to.
... View more
04-07-2017
02:44 AM
Does HBase works ? Because with what you said (ERROR: The node /hbase is not in ZooKeeper.) I have the feeling that your HBase cluster is not working properly. You should fix that first.
... View more
04-07-2017
01:44 AM
The error tells that your are trying to push a field named "file_length" into Solr. And Solr doesn't know about that field. Either you made an error in the schema of the collection and you need to fix the field name ? Either this field really not exist into Solr and you should not push it (for that you can use the sanitizeUnknownSolrFields function of morphline if you are using morphline).
... View more
03-24-2017
05:48 AM
I do think this is a defect. Not sure how Cloudera will see it. But to be fair, this particular way of inserting data (with the VALUES syntax) into a table is pretty much limited to small testing.
... View more
03-23-2017
02:51 AM
Well, I personaly tends to think this is a small "overlooked" use case. I mean, this particular query syntax is doing some "weird" things in Hive under the hood (it creates a table and reads it) and sentry seems to not be expecting it.
... View more
03-23-2017
02:46 AM
I did encounter such a situation from time to time in the past (not recently). Each time, it was because at least one of the Solr server was busy doing something else. Check that each Solr server has good health and that there is no defect on the health of the other collections.
... View more