- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Installing Solr on yarn using Slider
- Labels:
-
Apache Solr
Created on ‎12-10-2015 05:32 PM - edited ‎09-16-2022 02:52 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am trying to run solr on yarn using the link lucidworksSolrSlider, apart from taking help from slider.incubator.apache.org/docs/getting_started.html
Here is my folder structure:
[solrs@ip-10-0-0-217 solr-slider]$ ls -lrt total 131744 -rw-rw-r--. 1 solrs solrs 3182 Dec 10 01:17 README.md drwxrwxr-x. 4 solrs solrs 32 Dec 10 01:17 package -rw-rw-r--. 1 solrs solrs 2089 Dec 10 01:17 metainfo.xml -rw-rw-r--. 1 solrs solrs 11358 Dec 10 01:17 LICENSE -rw-rw-r--. 1 solrs solrs 134874517 Dec 10 01:37 solr-on-yarn.zip -rw-rw-r--. 1 solrs solrs 277 Dec 10 01:49 resources-default.json -rw-rw-r--. 1 solrs solrs 1355 Dec 10 15:33 appConfig-default.json
appConfig-default.json:
{ "schema": "http://example.org/specification/v2.0.0", "metadata": { }, "global": { "application.def": "/user/solrs/.slider/package/solryarn/solr-on-yarn.zip", "java_home": "/usr/jdk64/jdk1.8.0_40", "site.global.app_root": "${AGENT_WORK_ROOT}/app/install/solr-5.2.0-SNAPSHOT", "site.global.zk_host": "localhost:2181", "site.global.solr_host": "${SOLR_HOST}", "site.global.listen_port": "${SOLR.ALLOCATED_PORT}", "site.global.xmx_val": "1g", "site.global.xms_val": "1g", "site.global.gc_tune": "-XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewG$ "site.global.zk_timeout": "15000", "site.global.server_module": "--module=http", "site.global.stop_key": "solrrocks", "site.global.solr_opts": "" }, "components": { "slider-appmaster": { "jvm.heapsize": "512M" }, "SOLR": { } } }
resources-default.json:
{ "schema" : "http://example.org/specification/v2.0.0", "metadata" : { }, "global" : { }, "components": { "slider-appmaster": { }, "SOLR": { "yarn.role.priority": "1", "yarn.component.instances": "3", "yarn.memory": "1024" } } }
Could you please suggest me what will be the value of below parameters in appConfig-default.json file:
"site.global.app_root": "${AGENT_WORK_ROOT}/app/install/solr-5.2.0-SNAPSHOT", "site.global.solr_host": "${SOLR_HOST}", "site.global.listen_port": "${SOLR.ALLOCATED_PORT}",
Basically where should I find "/app/install/solr-5.2.0-SNAPSHOT"??
My Environment: HDP 2.3, Slider Core-0.80.0.2.3.2.0-2950
Thanks, hoping a quick reply.
Created ‎12-10-2015 06:15 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The only part of "site.global.app_root": "${AGENT_WORK_ROOT}/app/install/solr-5.2.0-SNAPSHOT" that you should change is the solr-5.2.0-SNAPSHOT. You should make this match the version of the Solr tarball you downloaded. (You can check the version by running "tar tf solr.tgz").
You probably also want to change "site.global.zk_host": "localhost:2181" to "site.global.zk_host": "${ZK_HOST}", which will configure Solr to use the same ZooKeeper instance Slider is using.
I think you can leave ${SOLR_HOST} as is, but I am not completely sure of the purpose of that parameter.
Created ‎12-10-2015 05:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Gour Saha
Created ‎12-10-2015 06:15 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The only part of "site.global.app_root": "${AGENT_WORK_ROOT}/app/install/solr-5.2.0-SNAPSHOT" that you should change is the solr-5.2.0-SNAPSHOT. You should make this match the version of the Solr tarball you downloaded. (You can check the version by running "tar tf solr.tgz").
You probably also want to change "site.global.zk_host": "localhost:2181" to "site.global.zk_host": "${ZK_HOST}", which will configure Solr to use the same ZooKeeper instance Slider is using.
I think you can leave ${SOLR_HOST} as is, but I am not completely sure of the purpose of that parameter.
Created ‎12-10-2015 08:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The directory ${AGENT_WORK_ROOT}/app/install/solr-* will be created for you by Slider. Slider will untar your Solr tarball to the ${AGENT_WORK_ROOT}/app/install directory. That's why Slider needs to know the name of the directory contained in your tarball.
Created ‎12-10-2015 09:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you think Solr on YARN is ready for a PoC?
Created ‎12-11-2015 11:19 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the response, but the slider application failed to start again.
When I look at the HDFS path:
[solr@sandbox solr-slider]$ hadoop fs -cat /user/solr/.slider/cluster/solr-yarn4/app_config.json { "schema" : "http://example.org/specification/v2.0.0", "metadata" : { }, "global" : { "site.global.gc_tune" : "-XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime", "site.fs.default.name" : "hdfs://sandbox.hortonworks.com:8020", "site.global.solr_host" : "${SOLR_HOST}", "site.global.solr_opts" : "", "zookeeper.hosts" : "sandbox.hortonworks.com", "site.global.server_module" : "--module=http", "site.global.stop_key" : "solrrocks", "java_home" : "/usr/lib/jvm/java-1.7.0-openjdk.x86_64/", "site.fs.defaultFS" : "hdfs://sandbox.hortonworks.com:8020", "site.global.zk_timeout" : "15000", "env.MALLOC_ARENA_MAX" : "4", "zookeeper.path" : "/services/slider/users/solr/solr-yarn4", "site.global.listen_port" : "8983", "zookeeper.quorum" : "sandbox.hortonworks.com:2181", "site.global.xmx_val" : "1g", "site.global.zk_host" : "${ZK_HOST}", "site.global.app_root" : "${AGENT_WORK_ROOT}/app/install/solr-5.3.1-SNAPSHOT", "application.def" : "/user/solr/.slider/package/solr-yarn/solr-on-yarn.zip", "site.global.xms_val" : "1g" }, "credentials" : { }, "components" : { "slider-appmaster" : { "jvm.heapsize" : "512M" }, "SOLR" : { } }
- The variable names "${ZK_HOST}" shoul nt they be replaced with actual values?
- Where should I look for the Solr specific logs as I am not able to find anything in the container logs.
- What is the value of ${AGENT_WORK_ROOT}? what is the absolute path?
- Is there any detailed documentation on how to deploy Solr application on yarn via Slider.
Regards,
Created ‎12-11-2015 03:48 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ZK_HOST and AGENT_WORK_ROOT will be replaced by Slider. The AGENT_WORK_ROOT will have the form /hadoop/yarn/local/usercache/<userName>/appcache/<appID>/<containerID> (where /hadoop/yarn/local is the directory specified by the yarn.nodemanager.local-dirs in yarn-site.xml). Based on the solr_node.py script, it looks like the output of the Solr start command should end up in the slider-agent logs in the container log directory. If containers are failing to launch, information about that should be in the AM log, slider.log in the log directory for container 0001.
Created ‎12-11-2015 04:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've added a comment to my initial response that should solve your problem.
Created ‎12-11-2015 04:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Another thing I noticed is that memory requested is pretty high if you're going to be running it on a VM. It might not be launching Solr because it doesn't have enough memory. I made the these changes to appConfig and resources and was able to get Solr running on a VM that has 9GB of RAM. You might need to make additional adjustments for your setup, and also make sure yarn.scheduler.minimum-allocation-mb isn't too high.
Created ‎12-18-2015 05:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Billie for your response!
I was able run solr on yarn, the mistake was "site.global.app_root" did not have the correct name of my solr version which was solr-5.3.1
However when I stop the solr application via slider (slider stop solr-yarn8) and restart it,
1) the cores I created disappear and, which is bad.
2) new instances start on new ports, can I fix the ports?
3) also I am only able to connect to only one of the solr instances (solr UI).
4) Is it yet possible to deploy solr cloud on yarn using multiple instances of solr?
Regards,
Rakesh
