Member since
04-04-2016
147
Posts
40
Kudos Received
16
Solutions
03-10-2017
07:22 PM
1 Kudo
Adding TTL on Solr: cd to this directory Step1: Step2: Step3: vi managed-schema: add these 3 lines <field
name="_timestamp_" type="date" indexed="true"
stored="true" multiValued="false" />
<field name="_ttl_" type="string"
indexed="true" multiValued="false" stored="true"
/>
<field name="_expire_at_" type="date"
multiValued="false" indexed="true" stored="true"
/> Step4: vi solrconfig.xml Replace the below 3 lines with the lines after it: <updateRequestProcessorChain
name="add-unknown-fields-to-the-schema">
<!-- UUIDUpdateProcessorFactory will generate an id if none is present in
the incoming document -->
<processor class="solr.UUIDUpdateProcessorFactory"/> as <updateRequestProcessorChain
name="add-unknown-fields-to-the-schema"> <processor
class="solr.TimestampUpdateProcessorFactory"> <str
name="fieldName">_timestamp_</str> </processor> <processor
class="solr.DefaultValueUpdateProcessorFactory"> <str
name="fieldName">_ttl_</str> <str
name="value">+30SECONDS</str> </processor> <processor
class="solr.processor.DocExpirationUpdateProcessorFactory"> <str
name="ttlFieldName">_ttl_</str> <str
name="ttlParamName">_ttl_</str> <int
name="autoDeletePeriodSeconds">30</int> <str
name="expirationFieldName">_expire_at_</str> </processor> <processor
class="solr.FirstFieldValueUpdateProcessorFactory"> <str
name="fieldName">_expire_at_</str> </processor> <!--
UUIDUpdateProcessorFactory will generate an id if none is present in the
incoming document --> <processor
class="solr.UUIDUpdateProcessorFactory" />
Things that might be useful: Make sure to start solr like this so that configs related to
solr goes to /solr in zookeeper: 1./opt/lucidworks-hdpsearch/solr/bin/solr
start -c –z lake1.field.hortonworks.com:2181, lake2.field.hortonworks.com:2181,
lake3.field.hortonworks.com:2181/solr 2.create
the collection like this /opt/lucidworks-hdpsearch/solr/bin/solr create -c
tweets -d data_driven_schema_configs -s 1 -rf 1 3.to
delete the collection: http://testdemo.field.hortonworks.com:8983/solr/admin/collections?action=DELETE&name=tweets 4.also
remove it from zkCli.sh as rmr /solr/config/tweets Thanks, Sujitha Sanku please ping me or email me at ssanku@hortonworks.com in case of any issues.
... View more
Labels:
11-21-2016
09:37 PM
Special thanks to Michael Young for the help to be my mentor. Step1: cd /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs/conf/ screen-shot-2016-11-21-at-100524-am.png Step2: vi managed-schema: add these 3 lines <field
name="_timestamp_" type="date" indexed="true"
stored="true" multiValued="false" />
<field name="_ttl_" type="string"
indexed="true" multiValued="false" stored="true"
/>
<field name="_expire_at_" type="date"
multiValued="false" indexed="true" stored="true"
/> screen-shot-2016-11-21-at-100929-am.png Step3: vi solrconfig.xml on the same directory. Replace the below 3 lines with the lines after it:
<updateRequestProcessorChain name="add-unknown-fields-to-the-schema">
<!-- UUIDUpdateProcessorFactory will generate an id if none is present in
the incoming document -->
<processor /> as <updateRequestProcessorChain name="add-unknown-fields-to-the-schema">
<processor>
<str name="fieldName">_timestamp_</str>
</processor>
<processor>
<str name="fieldName">_ttl_</str>
<str name="value">+30SECONDS</str>
</processor>
<processor
class="solr.processor.DocExpirationUpdateProcessorFactory">
<str name="ttlFieldName">_ttl_</str>
<str name="ttlParamName">_ttl_</str>
<int name="autoDeletePeriodSeconds">30</int>
<str name="expirationFieldName">_expire_at_</str>
</processor>
<processor>
<str name="fieldName">_expire_at_</str>
</processor>
<!-- UUIDUpdateProcessorFactory will generate an id if none is present in
the incoming document --> <processor
class="solr.UUIDUpdateProcessorFactory" /> screen-shot-2016-11-21-at-101045-am.png Hope that helps. Thanks, Sujitha
... View more
Labels:
08-18-2016
05:43 AM
1 Kudo
Solr indexing the
MySQL database table on HDP 2.5 Tech Preview: Solr version used: solr 4.9.0 Step1: Downloaded the solr 4.9.0.zip from https://archive.apache.org/dist/lucene/solr/4.9.0/ Step2: Extract the file: Step3: modify the solrconfig.xml, schema.xml and add the
db-data-config.xml at Step4: add the jar at this location
a.vi solrconfig.xml: add these lines in between
the config tags. <lib
dir="../../../contrib/dataimporthandler/lib/"
regex=".*\.jar" /> <lib dir="../../../dist/"
regex="solr-dataimporthandler-\d.*\.jar" /> <lib dir="../../../lib/"
regex="mysql-connector-java-5.0.8-bin.jar" /> <requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">db-data-config.xml</str>
</lst> </requestHandler>
b.vi schema.xml add the below line: <dynamicField name="*_name" type="text_general" multiValued="false"
indexed="true" stored="true"
/>
c.Create a file called db-data-config.xml at the
same path later in this session I would create a database employee in mysql
server add these <dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/employees" user="root"
password="hadoop" />
<document>
<entity name="id" query="select emp_no as 'id',
first_name, last_name from employees limit 1000;" />
</document> </dataConfig> After this
is complete run the below command (d) to start solr and check if solr is up and
running at url below: 8983 is the default port of solr d.java –jar start.jar http://localhost:8983/solr/#/
e.select the core selector as collection1. f.Click on Data Import, expand configuration and check if its
pointing to our db-data-config.xml file we created. g.After the completion of Step5 below click on execute on the page. Step5: Setting
up database: Import an already
available database into Mysql:
Ref:
https://dev.mysql.com/doc/employee/en/employees-installation.html shell> tar -xjf
employees_db-full-1.0.6.tar.bz2 shell> cd
employees_db/ shell> mysql -t
< employees.sql With this
installation of employees db in mysql is complete. Step6: With this our indexing is complete using
Solr. To do: I will try indexing the tables in Mysql using latest
version of Solr. Reference: http://blog.comperiosearch.com/blog/2014/08/28/indexing-database-using-solr/ Hope this helps…. Thanks, Sujitha
... View more
Labels:
07-07-2016
04:13 AM
2 Kudos
Story: From
the documentation I was able to add a service to existing stack definition in
Ambari. Issue: But I
was either not able to stop the service or delete the service just in case. https://cwiki.apache.org/confluence/display/AMBARI/Defining+a+Custom+Stack+and+Services How did I solve the problem?
1.Create and
Add Stack: cd
/var/lib/ambari-server/resources/stacks/HDP/2.4/services
2.Create a directory that contains the service
definition for SAMPLESRV mkdir
/var/lib/ambari-server/resources/stacks/HDP/2.4/services/SAMPLESRV cd
/var/lib/ambari-server/resources/stacks/HDP/2.4/services/SAMPLESRV
3.Create
a metainfo.xml as show in the link above.
4.With
this we have a service name as SAMPLESRV and it contains SAMPLESRV_MASTER,
SAMPLESRV_SLAVE and SAMPLESRV_CLIENT
5.Next we need to
create the command scripts mkdir –p /var/lib/ambari-server/resources/stacks/HDP/2.4/services/SAMPLESRV/package/scriptscd /var/lib/ambari-server/resources/stacks/HDP/2.4/services/SAMPLESRV/package/scripts 6.Browse the scripts
directory and create the .py command scripts:master.py, slave.py and
sample_client.py under : /var/lib/ambari-server/resources/stacks/HDP/2.4/services/SAMPLESRV/package/scripts Master.py
and slave.py here
was the issue: in the documentation
it doesn’t mention about the dummy.pid that needs to be created. Since we have not installed a real service, there is no PID file
created by it. Therefore, we are going to artificially create the PID, remove
the PID and check the process status of the dummy pid.
7.Then restart ambari: ambari-server
restart and add the service to the stack as shown the document. Just don't want to duplicate the process with steps here. Hope this helps....
... View more
Labels:
06-24-2016
12:41 AM
2 Kudos
How to make Mysql Database
as Hive’s instance: Install Mysql if not available: brew update brew doctor brew upgrade brew install
mysql mysql.server
restart mysql_secure_installation login to mysql
-> mysql –u root –p Enter password: Happy Mysql
learning…. Mysql is
already installed on Hortonworks sandbox. Steps: Confirm with mysql –u root –p Import an already
available database into Mysql: Ref: https://dev.mysql.com/doc/employee/en/employees-installation.html shell> tar -xjf
$HOME/Downloads/employees_db-full-1.0.6.tar.bz2 shell> cd employees_db/ shell> mysql -t <
employees.sql With this installation of
employee db in mysql is complete. Configuration
of Mysql Instance with Hive: From
HIVE create Mysql metastore [root@host]#
mysqladmin -u root create hivedb mysql>
USE hivedb; mysql>
CREATE USER 'hive'@'localhost' IDENTIFIED BY 'hive'; mysql>
GRANT ALL PRIVILEGES ON *.* TO 'hive'@'localhost'; With
this we confirm that mysql database is the Hive’s new metastore. Suppose to perform a full import of the ‘employees’
and ‘salaries’ tables into HDP: Tables created in
Hive Create database employees; Use employees; CREATE EXTERNAL TABLE IF NOT EXISTS employees ( emp_no INT, birth_date DATE, first_name
VARCHAR(14), last_name
VARCHAR(16), gender STRING, hire_date DATE ) STORED AS TEXTFILE; CREATE TABLE IF NOT EXISTS salaries ( emp_no INT, salary INT, from_date DATE, to_date DATE ) STORED AS TEXTFILE; sqoop import --connect
jdbc:mysql://172.16.16.128:3306/employees --username=hive --password=hive
--driver com.mysql.jdbc.Driver --table=employees --hive-import
--hive-table=empl.employees --target-dir=wp_users_import –direct sqoop import --connect
jdbc:mysql://172.16.16.128:3306/employees --username=hive --password=hive
--driver com.mysql.jdbc.Driver --table=employees --hive-import
--hive-table=empl.salaries --target-dir=wp_users_import –direct Suppose we need to perform some cleansing of data
using Regex expressions of Hive: use empl; drop table
empl.empl_clean; show tables; create table
empl.empl_clean(emp_no INT, birth_date STRING, first_name STRING, last_name STRING,gender
STRING, hire_date STRING ); insert overwrite table
empl.empl_clean SELECT regexp_replace(employees.emp_no,
'\t', '')emp_no, regexp_replace(employees.birth_date,
'\t', '')birth_date, regexp_replace(employees.first_name,
'\t', '')first_name, regexp_replace(employees.last_name,
'\t', '')last_name, regexp_replace(employees.gender,
'\t', '')gender, regexp_replace(employees.hire_date,
'\t', '')hire_date from empl.employees; select * from
empl.empl_clean limit 100; Cleansing the
salaries table: use empl; drop table
empl.salary_clean; create table
empl.salary_clean(emp_no INT,salary INT, from_date STRING, to_date STRING); insert overwrite table
empl.salary_clean SELECT regexp_replace(salaries.emp_no,
'\t', '')emp_no, regexp_replace(salaries.salary,
'\t', '')salary, regexp_replace(salaries.from_date,
'\t', '')from_date, regexp_replace(salaries.to_date,
'\t', '')to_date from empl.salaries; select * from
empl.salary_clean limit 100; Happy Learning….
... View more
Labels:
06-21-2016
08:59 PM
6 Kudos
SQOOP CONNECTIONS: Sqoop
is a tool designed to transfer data between Hadoop and relational databases.
You can use Sqoop to import data from a relational database management system
(RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS),
transform the data in Hadoop MapReduce, and then export the data back into an
RDBMS. Reference: sqoop user guide: https://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html JDBC ORACLE:
Examples for Import: sqoop-import
--connect jdbc:oracle:thin:@db.test.com:PORT:INSTANCE_NAME --table
DW_DATAMART.HCM_EMPLOYEE_D --fields-terminated-by '\t' --lines-terminated-by
'\n' --username SSANKU -P sqoop-import
--connect jdbc:oracle:thin:@db.test.com:PORT:INSTANCE_NAME --table
DW_DATAMART.HCM_EMPLOYEE_D --fields-terminated-by '\t' --lines-terminated-by
'\n' --username SSANKU -P JDBC ORACLE: Example
for Select: The eval tool allows users to quickly run simple SQL queries
against a database; results are printed to the console. This allows users to
preview their import queries to ensure they import the data they expect. sqoop-eval
--connect jdbc:oracle:thin:@db.test.com:PORT:INSTANCE_NAME --select * from
DW_DATAMART.HCM_COMPANY_D JDBC INFORMIX:
example JDBC Informix:
Examples for Import: sqoop-import
--connect jdbc:informix-sqli://4jane.soi.com:15062/common:INFORMIXSERVER=ids_4jane
--driver com.informix.jdbc.IfxDriver --table portal_request_params –username
username -P Sqoop Import to HBASE table: Examples: sqoop-import
--connect jdbc:oracle:thin:@db.test.com:PORT:INSTANCE_NAME --username ssanku
--P --table DW_DATAMART.PAY_PAY_CHK_OPTION_D --hbase-table
DW_DATAMART.PAY_PAY_CHK_OPTION_D --column-family cf1 --hbase-create-table If no primary key defined on the
oracle table sqoop-import
--connect jdbc:oracle:thin:@db.test.com:1725:hrlites --username ssanku --P
--table PSMERCHANTID --hbase-table PSMERCHANTID --column-family cf
--hbase-row-key MERCHANTID --hbase-create-table --split-by MERCHANTID sqoop-import
--connect jdbc:oracle:thin:@db.test.com:PORT:INSTANCE_NAME --username ssanku
--P --table DW_DATAMART.PAY_PAYGROUP_D --hbase-table DW_DATAMART.PAY_PAYGROUP_D
--column-family cf1 --hbase-create-table
sqoop-import
--connect jdbc:oracle:thin:@db.test.com:1725:hrlites --username ssanku --P
--table PSMERCHANTID --hbase-table PSMERCHANTID --column-family cf
--hbase-create-table --split-by MERCHANTID Sqoop Import to HIVE table from Mysql
Database: Examples: sqoop
import --connect jdbc:mysql://172.16.16.128:3306/employees -- username=hive
--password=hive --driver com.mysql.jdbc.Driver --table=employees -- hive-import
--hive-table=empl.employees --target-dir=wp_users_import –direct sqoop import --connect jdbc:mysql://172.16.16.128:3306/employees --
username=hive --password=hive --driver com.mysql.jdbc.Driver --table=employees
-- hive-import --hive-table=empl.salaries --target-dir=wp_users_import --direct
... View more
Labels: