Member since
02-09-2016
559
Posts
422
Kudos Received
98
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2106 | 03-02-2018 01:19 AM | |
3425 | 03-02-2018 01:04 AM | |
2338 | 08-02-2017 05:40 PM | |
2334 | 07-17-2017 05:35 PM | |
1693 | 07-10-2017 02:49 PM |
04-21-2017
08:11 PM
@Stefan Schuster Can you confirm that Zeppelin is actually running? Are you accessing the Ambari via "http://sandbox.hortonworks.com:8080"? If you are not, have you put sandbox.hortonworks.com in your local computer host file? You can try "http://localhost:9995" to see if that works for you, assuming you are using the Sandbox on your local computer.
... View more
04-13-2017
01:19 PM
@Kelvin Tong Based on the screen shots you are providing, you are attempting to push the file to HDFS using the command line within the Sandbox itself. However, you are specifying a file path that is local to the computer running the VIrtualBox Sandbox VM. That won't work. The Sandbox has no way of knowing how to access "C:\". You must first push the file to the Sandbox using WinSCP. Then you can use the hdfs dsfs -put command using a local directory within the Sandbox (something like /root/<my filename>).
... View more
03-29-2017
11:51 PM
@Girish Mane The JournalNodes are for shared edits. They are responsible for keep in the Active and Standby NameNodes in sync in terms of filesystem edits. You do not need a JournalNode for each of your data nodes. The normal approach is to use 3 JournalNodes to give the greatest level of high availability. It's the same idea behind 3x replication of data.
... View more
03-21-2017
08:40 PM
@tuxnet Preemption will not kill existing tasks that are running. As tasks for any given job finish, those resources are then made available to the jobs in the queue that are relying on preemption. The idea behind using the queues is to assign a minimum amount of cluster resources to a given user/job. With preemption enabled, jobs can get access to a larger percentage of resources when they are available. If a new job comes in that requires a minimum percentage of resources than what is currently available, those resources will be made available as the currently running jobs individual tasks are completed. What does your capacity scheduler queues look like in terms of percentage of cluster resources? What is the min and the max values?
... View more
03-21-2017
08:31 PM
@Peter Teunissen If you log into the Cloudbreak Deployer node via SSH, you can access the logs in /var/lib/cloudbreak-deployer. You can also run the cbd logs command (as root) to see output of the logs in realtime as the cluster is deploying.
... View more
03-19-2017
03:22 PM
1 Kudo
@mqureshi @james.jones I recommend you read up on information about SolrCloud. The reference guide provides a good overview for how it works starting on page 419: http://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.10.pdf A SolrCloud cluster uses Zookeeper for cluster coordination. This means keeping track of which nodes are up, how many shards a collection has and which hosts are currently serving those shards, etc. Zookeeper is also used to store configuration sets. These are the index and schema configuration files that are used for your indexes. When you create a collection using the Solr scripts, the configuration files for the collection are uploaded to Zookeeper. An collection is comprised of 1 or more shard indexes and 0 or more replica indexes. When you use HDFS to store the indexes, it is much easier to add/remove SolrCloud nodes to your cluster. You don't have to copy the indexes which are normally stored locally. The new SolrCloud node is configured to coordinate with Zookeeper. Upon startup, the new SolrCloud node will be told by Zookeeper which shards for which it is responsible and then use the respective indexes stored on HDFS. All of the index data itself is stored within the index directories on HDFS. These directories are self contained. Solr stores collections within index directories where each index has its own directory within the top level Solr index directory. This is true for local storage and HDFS. When you replicate your HDFS index directories to another HDFS cluster, all of the data is maintained within the respective index directories. HDFS: /solr/collectionname_shard1_replica1/<index files> HDFS: /solr/collectionname_shard2_replica1/<index files> 1. In the case of having Solr running on a DR cluster, you would need to ensure the index configuration (schemas, configuration sets, etc) are updated in the DR Solr Zookeeper. If you create collections on your primary cluster, then you would need to similarly create collections on the DR cluster. This is primarily to ensure the collection metadata exists in both clusters. As long as these settings are in sync, copying the index directories from one HDFS cluster to another HDFS cluster is all you need to do to keep DR the cluster in sync with the production cluster. As I mentioned above, both clusters will be configured to store indexes in an HDFS location. As long as the index directories exist, the SolrCloud nodes will read the indexes from those HDFS directories. Solr creates those index directories based on the name of the collection/index. That is how it knows which data goes with which index. 2. Yes, you should be able to do this. If you need to "restore" a collection from backup, then you would have to copy each of the collection index shards. If you create a collection with 5 shards, then you will have 5 index directories that you need to restore from DR. Using something like Cross Data Center Replication in SolrCloud 6 is the easiest way to get Solr DR in place. Second to that, using the native Backup/Restore functionality in SolrCloud 5 is a viable alternative. Unfortunately, SolrCloud 4 has neither of these more user friendly approaches. I highly recommend upgrading to at least Solr 5 to get a better handle on backups and disaster recovery.
... View more
03-18-2017
12:46 AM
@Yogesh Sharma The _all field is analyzed by default, so you shouldn't have problems performing case-insensitive queries. You are also specifying the analyze_wildcard: true parameter which will attempt to analyze the query string with wildcards before running the query. As you have shown, the query itself returns hits. So the problem is with the aggregations. For your aggregations you are using the include parameter. Can you try using ".*drama.*" as the include value instead of "*drama*"?
... View more
03-17-2017
09:44 PM
@mqureshi If Solr is storing the indexes on HDFS, then you have a fairly easy way of doing backups. You can use HDFS snapshots to take incremental backups of the Solr index directories on HDFS and then use distcp to copy those snapshots to another HDFS cluster. That provides the ability to have local backup copies and remote backup copies. If you didn't want to perform the HDFS snapshots, you could simply use distcp to replicate the HDFS data to another cluster. However, you lose the easy ability to restore an HDFS snapshot from a local backup.
... View more
03-17-2017
02:21 PM
1 Kudo
@mqureshi Cross Data Center Replication for Solr was released in Solr 6.x. It is not available in version 4.10.3. http://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.10.pdf Take a look at page 409 that talks about using the ReplicationHandler for making backup copies of indexes. You can always use standard filesystem methods for performing backups, but it isn't as clean as CDCR in Solr 6.x. Solr 5.x introduced the ability to backup and restore your indexes using the API. I would encourage customers to upgrade to at least Solr 5.x.
... View more
03-08-2017
02:48 PM
@Yogesh Sharma Have you disabled the _all field? That is the catch-all field that is used for a query when you don't specify a field. Your queries are not specifying a specific field, so it should be going against the _all field. By default the _all field should be able to handle mixed-case queries. Have you verified the query returns results without any of the aggregations: GET /movies/_search?pretty
{
"size": 10,
"_source": false,
"query": {
"query_string": {
"analyze_wildcard": true,
"query": "*drama*"
}
}
}
... View more