Created on 12-10-2015 05:32 PM - edited 09-16-2022 02:52 AM
Hi,
I am trying to run solr on yarn using the link lucidworksSolrSlider, apart from taking help from slider.incubator.apache.org/docs/getting_started.html
Here is my folder structure:
[solrs@ip-10-0-0-217 solr-slider]$ ls -lrt total 131744 -rw-rw-r--. 1 solrs solrs 3182 Dec 10 01:17 README.md drwxrwxr-x. 4 solrs solrs 32 Dec 10 01:17 package -rw-rw-r--. 1 solrs solrs 2089 Dec 10 01:17 metainfo.xml -rw-rw-r--. 1 solrs solrs 11358 Dec 10 01:17 LICENSE -rw-rw-r--. 1 solrs solrs 134874517 Dec 10 01:37 solr-on-yarn.zip -rw-rw-r--. 1 solrs solrs 277 Dec 10 01:49 resources-default.json -rw-rw-r--. 1 solrs solrs 1355 Dec 10 15:33 appConfig-default.json
appConfig-default.json:
{ "schema": "http://example.org/specification/v2.0.0", "metadata": { }, "global": { "application.def": "/user/solrs/.slider/package/solryarn/solr-on-yarn.zip", "java_home": "/usr/jdk64/jdk1.8.0_40", "site.global.app_root": "${AGENT_WORK_ROOT}/app/install/solr-5.2.0-SNAPSHOT", "site.global.zk_host": "localhost:2181", "site.global.solr_host": "${SOLR_HOST}", "site.global.listen_port": "${SOLR.ALLOCATED_PORT}", "site.global.xmx_val": "1g", "site.global.xms_val": "1g", "site.global.gc_tune": "-XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewG$ "site.global.zk_timeout": "15000", "site.global.server_module": "--module=http", "site.global.stop_key": "solrrocks", "site.global.solr_opts": "" }, "components": { "slider-appmaster": { "jvm.heapsize": "512M" }, "SOLR": { } } }
resources-default.json:
{ "schema" : "http://example.org/specification/v2.0.0", "metadata" : { }, "global" : { }, "components": { "slider-appmaster": { }, "SOLR": { "yarn.role.priority": "1", "yarn.component.instances": "3", "yarn.memory": "1024" } } }
Could you please suggest me what will be the value of below parameters in appConfig-default.json file:
"site.global.app_root": "${AGENT_WORK_ROOT}/app/install/solr-5.2.0-SNAPSHOT", "site.global.solr_host": "${SOLR_HOST}", "site.global.listen_port": "${SOLR.ALLOCATED_PORT}",
Basically where should I find "/app/install/solr-5.2.0-SNAPSHOT"??
My Environment: HDP 2.3, Slider Core-0.80.0.2.3.2.0-2950
Thanks, hoping a quick reply.
Created 12-10-2015 06:15 PM
The only part of "site.global.app_root": "${AGENT_WORK_ROOT}/app/install/solr-5.2.0-SNAPSHOT" that you should change is the solr-5.2.0-SNAPSHOT. You should make this match the version of the Solr tarball you downloaded. (You can check the version by running "tar tf solr.tgz").
You probably also want to change "site.global.zk_host": "localhost:2181" to "site.global.zk_host": "${ZK_HOST}", which will configure Solr to use the same ZooKeeper instance Slider is using.
I think you can leave ${SOLR_HOST} as is, but I am not completely sure of the purpose of that parameter.
Created 12-10-2015 05:43 PM
@Gour Saha
Created 12-10-2015 06:15 PM
The only part of "site.global.app_root": "${AGENT_WORK_ROOT}/app/install/solr-5.2.0-SNAPSHOT" that you should change is the solr-5.2.0-SNAPSHOT. You should make this match the version of the Solr tarball you downloaded. (You can check the version by running "tar tf solr.tgz").
You probably also want to change "site.global.zk_host": "localhost:2181" to "site.global.zk_host": "${ZK_HOST}", which will configure Solr to use the same ZooKeeper instance Slider is using.
I think you can leave ${SOLR_HOST} as is, but I am not completely sure of the purpose of that parameter.
Created 12-10-2015 08:05 PM
The directory ${AGENT_WORK_ROOT}/app/install/solr-* will be created for you by Slider. Slider will untar your Solr tarball to the ${AGENT_WORK_ROOT}/app/install directory. That's why Slider needs to know the name of the directory contained in your tarball.
Created 12-10-2015 09:31 PM
Do you think Solr on YARN is ready for a PoC?
Created 12-11-2015 11:19 AM
Thanks for the response, but the slider application failed to start again.
When I look at the HDFS path:
[solr@sandbox solr-slider]$ hadoop fs -cat /user/solr/.slider/cluster/solr-yarn4/app_config.json { "schema" : "http://example.org/specification/v2.0.0", "metadata" : { }, "global" : { "site.global.gc_tune" : "-XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime", "site.fs.default.name" : "hdfs://sandbox.hortonworks.com:8020", "site.global.solr_host" : "${SOLR_HOST}", "site.global.solr_opts" : "", "zookeeper.hosts" : "sandbox.hortonworks.com", "site.global.server_module" : "--module=http", "site.global.stop_key" : "solrrocks", "java_home" : "/usr/lib/jvm/java-1.7.0-openjdk.x86_64/", "site.fs.defaultFS" : "hdfs://sandbox.hortonworks.com:8020", "site.global.zk_timeout" : "15000", "env.MALLOC_ARENA_MAX" : "4", "zookeeper.path" : "/services/slider/users/solr/solr-yarn4", "site.global.listen_port" : "8983", "zookeeper.quorum" : "sandbox.hortonworks.com:2181", "site.global.xmx_val" : "1g", "site.global.zk_host" : "${ZK_HOST}", "site.global.app_root" : "${AGENT_WORK_ROOT}/app/install/solr-5.3.1-SNAPSHOT", "application.def" : "/user/solr/.slider/package/solr-yarn/solr-on-yarn.zip", "site.global.xms_val" : "1g" }, "credentials" : { }, "components" : { "slider-appmaster" : { "jvm.heapsize" : "512M" }, "SOLR" : { } }
- The variable names "${ZK_HOST}" shoul nt they be replaced with actual values?
- Where should I look for the Solr specific logs as I am not able to find anything in the container logs.
- What is the value of ${AGENT_WORK_ROOT}? what is the absolute path?
- Is there any detailed documentation on how to deploy Solr application on yarn via Slider.
Regards,
Created 12-11-2015 03:48 PM
ZK_HOST and AGENT_WORK_ROOT will be replaced by Slider. The AGENT_WORK_ROOT will have the form /hadoop/yarn/local/usercache/<userName>/appcache/<appID>/<containerID> (where /hadoop/yarn/local is the directory specified by the yarn.nodemanager.local-dirs in yarn-site.xml). Based on the solr_node.py script, it looks like the output of the Solr start command should end up in the slider-agent logs in the container log directory. If containers are failing to launch, information about that should be in the AM log, slider.log in the log directory for container 0001.
Created 12-11-2015 04:39 PM
I've added a comment to my initial response that should solve your problem.
Created 12-11-2015 04:38 PM
Another thing I noticed is that memory requested is pretty high if you're going to be running it on a VM. It might not be launching Solr because it doesn't have enough memory. I made the these changes to appConfig and resources and was able to get Solr running on a VM that has 9GB of RAM. You might need to make additional adjustments for your setup, and also make sure yarn.scheduler.minimum-allocation-mb isn't too high.
Created 12-18-2015 05:30 PM
Thanks Billie for your response!
I was able run solr on yarn, the mistake was "site.global.app_root" did not have the correct name of my solr version which was solr-5.3.1
However when I stop the solr application via slider (slider stop solr-yarn8) and restart it,
1) the cores I created disappear and, which is bad.
2) new instances start on new ports, can I fix the ports?
3) also I am only able to connect to only one of the solr instances (solr UI).
4) Is it yet possible to deploy solr cloud on yarn using multiple instances of solr?
Regards,
Rakesh