Reply
New Contributor
Posts: 1
Registered: ‎04-21-2018

Data for all fields of Solr Collection are coming in Single Field

[ Edited ]

Hi,

 

I am able to successfully load the data into SOLR collection but the problem is all the field and its data is coming into single field even though I have defined different fields in schema.xml and conf file as well.

Table is in HDFS and data stored in CSV format.

 

Data_CSV.pngSolrComps1_Schema.pngSolrComps1_Schema_Query.png

 

$ solrctl instancedir --generate $HOME/solr_configs
$ solrctl instancedir --create solrcomps1 $HOME/solr_configs

$ solrctl collection --create solrcomps1 -s 1

$ hadoop jar /opt/cloudera/parcels/CDH-*/jars/search-mr-*-job.jar org.apache.solr.hadoop.MapReduceIndexerTool -D 'mapred.child.java.opts=-Xmx500m'  --morphline-file /home/username/solr_local/solrcomps1.conf --output-dir hdfs://IP_Address:8020/tmp/load/solrcomps1 --verbose --go-live --zk-host  ServerName1:2181,ServerName2:2181,ServerName3:2181/solr --collection solrcomps1 hdfs://IP_Address:8020/tmp/solr_morphline

 

****************************************************************************************************************

solrcomps1.conf file details:

 

# Specify server locations in a SOLR_LOCATOR variable; used later in
# variable substitutions:
SOLR_LOCATOR : {
# Name of solr collection
collection : solrcomps1

# ZooKeeper ensemble
zkHost : "dalxclasnp01.prd.den.vz.altidev.net:2181,dalxclasnp02.prd.den.vz.altidev.net:2181,dalxclasnp03.prd.den.vz.altidev.net:2181/solr"
}

# Specify an array of one or more morphlines, each of which defines an ETL
# transformation chain. A morphline consists of one or more (potentially
# nested) commands. A morphline is a way to consume records (e.g. Flume events,
# HDFS files or blocks), turn them into a stream of records, and pipe the stream
# of records through a set of easily configurable transformations on the way to
# a target application such as Solr.
morphlines : [
{
id : solrcomps1
importCommands : ["org.kitesdk.**", "org.apache.solr.**"]
commands : [
{
readCSV {
separator : ","
columns : ['id' ,'gid_s' ,'latlon_p' ,'beds_f' ,'state_s' ,'fips_s' ,'bath_f' ,'buildingarea_f' ,'stdlandusecode_s' ,'recordingdate_s' ,'recordingdate_dt' ,'nid_cb_s' ,'nid_ct_s' ,'nid_n_s' ,'mx_id_p_s' ,'mx_id_m_s' ,'mx_id_h_s' ,'yearbuilt_i' ,'situsstdzip5_s' ,'situsstdstreet_s' ,'max_radius_f' ,'base_dist_f' ,'deleted_flag_s' ,'update_timestamp_s']
quoteChar : "\""
charset : UTF-8
}
}

{
if { 
conditions : [
{ 
equals { id : [] } 
} 
]
then : [ 
{ 
dropRecord {} 
}
]
}
}

# Consume the output record of the previous command and pipe another
# record downstream.
#
# Command that deletes record fields that are unknown to Solr
# schema.xml.
#
# Recall that Solr throws an exception on any attempt to load a document
# that contains a field that isn't specified in schema.xml.
{
sanitizeUnknownSolrFields {
# Location from which to fetch Solr schema
solrLocator : ${SOLR_LOCATOR}
}
}

# log the record at DEBUG level to SLF4J
{ logDebug { format : "output record: {}", args : ["@{}"] } }

# load the record into a Solr server or MapReduce Reducer
{
loadSolr { 
solrLocator : ${SOLR_LOCATOR}
}
}
]
}
]

****************************************************************************************************************

 

Schema.xml Fields Description

****************************************************************************************************************


<!-- File Metadata -->
<field name="id" type="string" indexed="false" stored="true" /> 
<field name="gid_s" type="string" indexed="true" stored="true" /> 
<field name="latlon_p" type="string" indexed="true" stored="true" /> 
<field name="beds_f" type="float" indexed="true" stored="true" /> 
<field name="state_s" type="string" indexed="true" stored="true" /> 
<field name="fips_s" type="string" indexed="true" stored="true" /> 
<field name="bath_f" type="float" indexed="true" stored="true" /> 
<field name="buildingarea_f" type="float" indexed="true" stored="true" /> 
<field name="stdlandusecode_s" type="string" indexed="true" stored="true" /> 
<field name="recordingdate_s" type="string" indexed="true" stored="true" /> 
<field name="recordingdate_dt" type="string" indexed="true" stored="true" /> 
<field name="nid_cb_s" type="string" indexed="true" stored="true" /> 
<field name="nid_ct_s" type="string" indexed="true" stored="true" /> 
<field name="nid_n_s" type="string" indexed="true" stored="true" /> 
<field name="mx_id_p_s" type="string" indexed="true" stored="true" /> 
<field name="mx_id_m_s" type="string" indexed="true" stored="true" /> 
<field name="mx_id_h_s" type="string" indexed="true" stored="true" /> 
<field name="yearbuilt_i" type="int" indexed="true" stored="true" /> 
<field name="situsstdzip5_s" type="string" indexed="true" stored="true" /> 
<field name="situsstdstreet_s" type="string" indexed="true" stored="true" /> 
<field name="max_radius_f" type="double" indexed="true" stored="true" /> 
<field name="base_dist_f" type="double" indexed="true" stored="true" /> 
<field name="deleted_flag_s" type="string" indexed="true" stored="true" /> 
<field name="update_timestamp_s" type="string" indexed="true" stored="true" />

****************************************************************************************************************

Announcements
The Kite SDK is a collection of docs, sample code, APIs, and tools to make Hadoop application development faster. Learn more at http://kitesdk.org.