Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Data for all fields of Solr Collection are coming in Single Field

Highlighted

Data for all fields of Solr Collection are coming in Single Field

New Contributor

Hi,

 

I am able to successfully load the data into SOLR collection but the problem is all the field and its data is coming into single field even though I have defined different fields in schema.xml and conf file as well.

Table is in HDFS and data stored in CSV format.

 

Data_CSV.pngSolrComps1_Schema.pngSolrComps1_Schema_Query.png

 

$ solrctl instancedir --generate $HOME/solr_configs
$ solrctl instancedir --create solrcomps1 $HOME/solr_configs

$ solrctl collection --create solrcomps1 -s 1

$ hadoop jar /opt/cloudera/parcels/CDH-*/jars/search-mr-*-job.jar org.apache.solr.hadoop.MapReduceIndexerTool -D 'mapred.child.java.opts=-Xmx500m'  --morphline-file /home/username/solr_local/solrcomps1.conf --output-dir hdfs://IP_Address:8020/tmp/load/solrcomps1 --verbose --go-live --zk-host  ServerName1:2181,ServerName2:2181,ServerName3:2181/solr --collection solrcomps1 hdfs://IP_Address:8020/tmp/solr_morphline

 

****************************************************************************************************************

solrcomps1.conf file details:

 

# Specify server locations in a SOLR_LOCATOR variable; used later in
# variable substitutions:
SOLR_LOCATOR : {
# Name of solr collection
collection : solrcomps1

# ZooKeeper ensemble
zkHost : "dalxclasnp01.prd.den.vz.altidev.net:2181,dalxclasnp02.prd.den.vz.altidev.net:2181,dalxclasnp03.prd.den.vz.altidev.net:2181/solr"
}

# Specify an array of one or more morphlines, each of which defines an ETL
# transformation chain. A morphline consists of one or more (potentially
# nested) commands. A morphline is a way to consume records (e.g. Flume events,
# HDFS files or blocks), turn them into a stream of records, and pipe the stream
# of records through a set of easily configurable transformations on the way to
# a target application such as Solr.
morphlines : [
{
id : solrcomps1
importCommands : ["org.kitesdk.**", "org.apache.solr.**"]
commands : [
{
readCSV {
separator : ","
columns : ['id' ,'gid_s' ,'latlon_p' ,'beds_f' ,'state_s' ,'fips_s' ,'bath_f' ,'buildingarea_f' ,'stdlandusecode_s' ,'recordingdate_s' ,'recordingdate_dt' ,'nid_cb_s' ,'nid_ct_s' ,'nid_n_s' ,'mx_id_p_s' ,'mx_id_m_s' ,'mx_id_h_s' ,'yearbuilt_i' ,'situsstdzip5_s' ,'situsstdstreet_s' ,'max_radius_f' ,'base_dist_f' ,'deleted_flag_s' ,'update_timestamp_s']
quoteChar : "\""
charset : UTF-8
}
}

{
if { 
conditions : [
{ 
equals { id : [] } 
} 
]
then : [ 
{ 
dropRecord {} 
}
]
}
}

# Consume the output record of the previous command and pipe another
# record downstream.
#
# Command that deletes record fields that are unknown to Solr
# schema.xml.
#
# Recall that Solr throws an exception on any attempt to load a document
# that contains a field that isn't specified in schema.xml.
{
sanitizeUnknownSolrFields {
# Location from which to fetch Solr schema
solrLocator : ${SOLR_LOCATOR}
}
}

# log the record at DEBUG level to SLF4J
{ logDebug { format : "output record: {}", args : ["@{}"] } }

# load the record into a Solr server or MapReduce Reducer
{
loadSolr { 
solrLocator : ${SOLR_LOCATOR}
}
}
]
}
]

****************************************************************************************************************

 

Schema.xml Fields Description

****************************************************************************************************************


<!-- File Metadata -->
<field name="id" type="string" indexed="false" stored="true" /> 
<field name="gid_s" type="string" indexed="true" stored="true" /> 
<field name="latlon_p" type="string" indexed="true" stored="true" /> 
<field name="beds_f" type="float" indexed="true" stored="true" /> 
<field name="state_s" type="string" indexed="true" stored="true" /> 
<field name="fips_s" type="string" indexed="true" stored="true" /> 
<field name="bath_f" type="float" indexed="true" stored="true" /> 
<field name="buildingarea_f" type="float" indexed="true" stored="true" /> 
<field name="stdlandusecode_s" type="string" indexed="true" stored="true" /> 
<field name="recordingdate_s" type="string" indexed="true" stored="true" /> 
<field name="recordingdate_dt" type="string" indexed="true" stored="true" /> 
<field name="nid_cb_s" type="string" indexed="true" stored="true" /> 
<field name="nid_ct_s" type="string" indexed="true" stored="true" /> 
<field name="nid_n_s" type="string" indexed="true" stored="true" /> 
<field name="mx_id_p_s" type="string" indexed="true" stored="true" /> 
<field name="mx_id_m_s" type="string" indexed="true" stored="true" /> 
<field name="mx_id_h_s" type="string" indexed="true" stored="true" /> 
<field name="yearbuilt_i" type="int" indexed="true" stored="true" /> 
<field name="situsstdzip5_s" type="string" indexed="true" stored="true" /> 
<field name="situsstdstreet_s" type="string" indexed="true" stored="true" /> 
<field name="max_radius_f" type="double" indexed="true" stored="true" /> 
<field name="base_dist_f" type="double" indexed="true" stored="true" /> 
<field name="deleted_flag_s" type="string" indexed="true" stored="true" /> 
<field name="update_timestamp_s" type="string" indexed="true" stored="true" />

****************************************************************************************************************