Member since
01-09-2014
283
Posts
70
Kudos Received
50
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1717 | 06-19-2019 07:50 AM | |
2762 | 05-01-2019 08:07 AM | |
2811 | 04-10-2019 08:49 AM | |
2711 | 03-20-2019 09:30 AM | |
2366 | 01-23-2019 10:58 AM |
03-15-2017
03:58 PM
1. Simply copying the PRO machine index and collection folder of hdfs to DR Cluster. will it work? This will not work unfortunately. the solr index and tlog files are in a constant state of being updating, and there is no way to ensure a consistent snapshot while solr is running. This could be done if solr was shut down, however, the core_node directories that exist under the /solr/<collection_name> in hdfs are mapped to specific shards/replicas, and you would have to ensure that when creating the corresponding collection in DR, that you map the core_node directories to the same shards/replicas at collection creation time. 2. Is it any possibility there to make both CDH 5.4.8 and CDH 5.4.8 DR machine always sync on index and collection. Prior to CDH 5.9, the best way to do this is to have your indexing jobs publish documents to both collections. As of CDH5.9, there is the ability to backup and restore collections, either locally or in DR: https://www.cloudera.com/documentation/enterprise/5-9-x/topics/search_backup_restore.html 3. What is the recommeded way to take backup of PRO solr indexes and collection to DR Cluster. If you can't upgrade to CDH5.9, then the recommended way to backup the solr indexes is to stop the solr service and do an hdfs snapshot or distcp to copy the indexes to a backup location. For the backup location, if you need to run the same collection there, you would need to create it with the createNodeSet property for Solr 4.10.3 to ensure the collection gets created on the proper nodes, and you'd have to verify that the core_noden directories map to the same shards in the clusterstate.json as whats in production. -pd
... View more
03-13-2017
09:11 AM
Based on this error: 17/03/11 23:35:34 WARN conf.FlumeConfiguration: Could not configure sink agent-sink due to: No channel configured for sink: agent-sink org.apache.flume.conf.ConfigurationException: No channel configured for sink: agent-sink Sinks can only have one channel that they are attached to, change the following line: agent.sinks.agent-sink.channels = agent-chan To: agent.sinks.agent-sink.channel = agent-chan
... View more
03-13-2017
09:10 AM
Are you following this plugin directory architecture: http://flume.apache.org/FlumeUserGuide.html#the-plugins-d-directory If you look in the flume stderr.log, you should see it on the cmd line: stderr.log:+ exec /opt/cloudera/parcels/CDH-5.10.2-1.cdh5.10.2.p0.19/lib/flume-ng/bin/flume-ng agent --conf /var/run/cloudera-scm-agent/process/434-flume-AGENT --classpath /var/run/cloudera-scm-agent/process/434-flume-AGENT/hbase-conf:/var/run/cloudera-scm-agent/process/434-flume-AGENT/hadoop-conf: --conf-file /var/run/cloudera-scm-agent/process/434-flume-AGENT/flume.conf --name tier2 -Djava.net.preferIPv4Stack=true -Duser.home=/var/lib/flume-ng -Xms1073741824 -Xmx1073741824 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/FLUME-1_FLUME-AGENT-1_pid30399.hprof -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -Dflume.monitoring.type=HTTP -Dflume.monitoring.port=24001 --plugins-path /usr/lib/flume-ng/plugins.d:/var/lib/flume-ng/plugins.d
... View more
02-15-2017
02:15 PM
2 Kudos
numFound is the number that should be returned each time. If it is different, there are a couple of possibilities: 1. You are indexing in real time, so the numFound would keep increasing, or if using the lily hbase indexer, docs could be deleted. 2. Your replicas for a given shard are out of sync. You can find out if this is the case by sending the same query to each replica in the shard, and add the following property to the URL string: distrib=false http://solr.server/solr/collection1_shard1_replica1/select?q=*:*&distrib=false http://solr2.server/solr/collection1_shard1_replica2/select?q=*:*&distrib=false If that returns different results and you aren't doing real time indexing, then there is likely an issue, and you can do DELETEREPLICA and ADDREPLICA to recreate they synced replica: https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_mc_solr_service.html#id_s15_n33_45 -pd
... View more
02-13-2017
10:54 AM
You are correct, once the batch of messages have been read from the queue and confirmed delivered to the channel (flushed to disk), then they are marked as acknowledged and, depending on your settings in IBM MQ, can be deleted. -pd
... View more
01-31-2017
08:06 AM
Take a look at the following settings: num.streams num.producers Increasing the num.streams will increase the number of consumer threads that you have running and increasing num.producers will allow you to produce more messages to the destination in parallel. https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330 -pd
... View more
01-06-2017
12:13 PM
As I stated before, flume can't consume from a remote http server. You would need to have something that could consume from the remote server and then post to flume. -pd
... View more
01-06-2017
12:11 PM
It seems like you are having problems even reaching hdfs. have you tried a simple 'hdfs dfs -ls' from that flume node? Are you running iptables? can you ping/traceroute to the NN? -pd
... View more
01-06-2017
11:51 AM
Take a look at the preferred leader election tool: https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools This assumes that the desired leaders are listed first in the partition list. If 30 is listed first, it will still be the leader for all those partitions. -pd
... View more
12-02-2016
12:05 PM
Flume doesn't have the ability to poll an http service, however it can act as an http service itself (http://flume.apache.org/FlumeUserGuide.html#http-source) that you can post json data to (or other formats). I would suggest reviewing the documentation here: http://flume.apache.org/FlumeUserGuide.html To see some examples and different configuration options. In Cloudera Manager, you will be editing Configuration file section, and that is the configuration that is read when flume starts up. -pd
... View more