Member since
04-04-2016
147
Posts
40
Kudos Received
16
Solutions
11-21-2016
09:37 PM
Special thanks to Michael Young for the help to be my mentor. Step1: cd /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs/conf/ screen-shot-2016-11-21-at-100524-am.png Step2: vi managed-schema: add these 3 lines <field
name="_timestamp_" type="date" indexed="true"
stored="true" multiValued="false" />
<field name="_ttl_" type="string"
indexed="true" multiValued="false" stored="true"
/>
<field name="_expire_at_" type="date"
multiValued="false" indexed="true" stored="true"
/> screen-shot-2016-11-21-at-100929-am.png Step3: vi solrconfig.xml on the same directory. Replace the below 3 lines with the lines after it:
<updateRequestProcessorChain name="add-unknown-fields-to-the-schema">
<!-- UUIDUpdateProcessorFactory will generate an id if none is present in
the incoming document -->
<processor /> as <updateRequestProcessorChain name="add-unknown-fields-to-the-schema">
<processor>
<str name="fieldName">_timestamp_</str>
</processor>
<processor>
<str name="fieldName">_ttl_</str>
<str name="value">+30SECONDS</str>
</processor>
<processor
class="solr.processor.DocExpirationUpdateProcessorFactory">
<str name="ttlFieldName">_ttl_</str>
<str name="ttlParamName">_ttl_</str>
<int name="autoDeletePeriodSeconds">30</int>
<str name="expirationFieldName">_expire_at_</str>
</processor>
<processor>
<str name="fieldName">_expire_at_</str>
</processor>
<!-- UUIDUpdateProcessorFactory will generate an id if none is present in
the incoming document --> <processor
class="solr.UUIDUpdateProcessorFactory" /> screen-shot-2016-11-21-at-101045-am.png Hope that helps. Thanks, Sujitha
... View more
Labels:
08-18-2016
05:43 AM
1 Kudo
Solr indexing the
MySQL database table on HDP 2.5 Tech Preview: Solr version used: solr 4.9.0 Step1: Downloaded the solr 4.9.0.zip from https://archive.apache.org/dist/lucene/solr/4.9.0/ Step2: Extract the file: Step3: modify the solrconfig.xml, schema.xml and add the
db-data-config.xml at Step4: add the jar at this location
a.vi solrconfig.xml: add these lines in between
the config tags. <lib
dir="../../../contrib/dataimporthandler/lib/"
regex=".*\.jar" /> <lib dir="../../../dist/"
regex="solr-dataimporthandler-\d.*\.jar" /> <lib dir="../../../lib/"
regex="mysql-connector-java-5.0.8-bin.jar" /> <requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">db-data-config.xml</str>
</lst> </requestHandler>
b.vi schema.xml add the below line: <dynamicField name="*_name" type="text_general" multiValued="false"
indexed="true" stored="true"
/>
c.Create a file called db-data-config.xml at the
same path later in this session I would create a database employee in mysql
server add these <dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/employees" user="root"
password="hadoop" />
<document>
<entity name="id" query="select emp_no as 'id',
first_name, last_name from employees limit 1000;" />
</document> </dataConfig> After this
is complete run the below command (d) to start solr and check if solr is up and
running at url below: 8983 is the default port of solr d.java –jar start.jar http://localhost:8983/solr/#/
e.select the core selector as collection1. f.Click on Data Import, expand configuration and check if its
pointing to our db-data-config.xml file we created. g.After the completion of Step5 below click on execute on the page. Step5: Setting
up database: Import an already
available database into Mysql:
Ref:
https://dev.mysql.com/doc/employee/en/employees-installation.html shell> tar -xjf
employees_db-full-1.0.6.tar.bz2 shell> cd
employees_db/ shell> mysql -t
< employees.sql With this
installation of employees db in mysql is complete. Step6: With this our indexing is complete using
Solr. To do: I will try indexing the tables in Mysql using latest
version of Solr. Reference: http://blog.comperiosearch.com/blog/2014/08/28/indexing-database-using-solr/ Hope this helps…. Thanks, Sujitha
... View more
Labels:
01-18-2017
10:25 AM
Hello. I did the similar stuff and it works fine when starting\stopping service. But for restart it fails. Looks like it runs status check after stop and status check fails because pid file is already deleted: Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stacks/HDP/2.5/services/CATALOGER/package/scripts/application.py", line 28, in <module>
Master().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute
method(env)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 709, in restart
self.status(env)
File "/var/lib/ambari-agent/cache/stacks/HDP/2.5/services/CATALOGER/package/scripts/application.py", line 25, in status
Execute ( format("cat {pid_file}") );
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 273, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 71, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 93, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 141, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 294, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'cat /opt/app/application.pid' returned 1. cat: /opt/app/application.pid: No such file or directory
... View more
06-24-2016
12:41 AM
2 Kudos
How to make Mysql Database
as Hive’s instance: Install Mysql if not available: brew update brew doctor brew upgrade brew install
mysql mysql.server
restart mysql_secure_installation login to mysql
-> mysql –u root –p Enter password: Happy Mysql
learning…. Mysql is
already installed on Hortonworks sandbox. Steps: Confirm with mysql –u root –p Import an already
available database into Mysql: Ref: https://dev.mysql.com/doc/employee/en/employees-installation.html shell> tar -xjf
$HOME/Downloads/employees_db-full-1.0.6.tar.bz2 shell> cd employees_db/ shell> mysql -t <
employees.sql With this installation of
employee db in mysql is complete. Configuration
of Mysql Instance with Hive: From
HIVE create Mysql metastore [root@host]#
mysqladmin -u root create hivedb mysql>
USE hivedb; mysql>
CREATE USER 'hive'@'localhost' IDENTIFIED BY 'hive'; mysql>
GRANT ALL PRIVILEGES ON *.* TO 'hive'@'localhost'; With
this we confirm that mysql database is the Hive’s new metastore. Suppose to perform a full import of the ‘employees’
and ‘salaries’ tables into HDP: Tables created in
Hive Create database employees; Use employees; CREATE EXTERNAL TABLE IF NOT EXISTS employees ( emp_no INT, birth_date DATE, first_name
VARCHAR(14), last_name
VARCHAR(16), gender STRING, hire_date DATE ) STORED AS TEXTFILE; CREATE TABLE IF NOT EXISTS salaries ( emp_no INT, salary INT, from_date DATE, to_date DATE ) STORED AS TEXTFILE; sqoop import --connect
jdbc:mysql://172.16.16.128:3306/employees --username=hive --password=hive
--driver com.mysql.jdbc.Driver --table=employees --hive-import
--hive-table=empl.employees --target-dir=wp_users_import –direct sqoop import --connect
jdbc:mysql://172.16.16.128:3306/employees --username=hive --password=hive
--driver com.mysql.jdbc.Driver --table=employees --hive-import
--hive-table=empl.salaries --target-dir=wp_users_import –direct Suppose we need to perform some cleansing of data
using Regex expressions of Hive: use empl; drop table
empl.empl_clean; show tables; create table
empl.empl_clean(emp_no INT, birth_date STRING, first_name STRING, last_name STRING,gender
STRING, hire_date STRING ); insert overwrite table
empl.empl_clean SELECT regexp_replace(employees.emp_no,
'\t', '')emp_no, regexp_replace(employees.birth_date,
'\t', '')birth_date, regexp_replace(employees.first_name,
'\t', '')first_name, regexp_replace(employees.last_name,
'\t', '')last_name, regexp_replace(employees.gender,
'\t', '')gender, regexp_replace(employees.hire_date,
'\t', '')hire_date from empl.employees; select * from
empl.empl_clean limit 100; Cleansing the
salaries table: use empl; drop table
empl.salary_clean; create table
empl.salary_clean(emp_no INT,salary INT, from_date STRING, to_date STRING); insert overwrite table
empl.salary_clean SELECT regexp_replace(salaries.emp_no,
'\t', '')emp_no, regexp_replace(salaries.salary,
'\t', '')salary, regexp_replace(salaries.from_date,
'\t', '')from_date, regexp_replace(salaries.to_date,
'\t', '')to_date from empl.salaries; select * from
empl.salary_clean limit 100; Happy Learning….
... View more
Labels:
06-21-2016
08:59 PM
6 Kudos
SQOOP CONNECTIONS: Sqoop
is a tool designed to transfer data between Hadoop and relational databases.
You can use Sqoop to import data from a relational database management system
(RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS),
transform the data in Hadoop MapReduce, and then export the data back into an
RDBMS. Reference: sqoop user guide: https://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html JDBC ORACLE:
Examples for Import: sqoop-import
--connect jdbc:oracle:thin:@db.test.com:PORT:INSTANCE_NAME --table
DW_DATAMART.HCM_EMPLOYEE_D --fields-terminated-by '\t' --lines-terminated-by
'\n' --username SSANKU -P sqoop-import
--connect jdbc:oracle:thin:@db.test.com:PORT:INSTANCE_NAME --table
DW_DATAMART.HCM_EMPLOYEE_D --fields-terminated-by '\t' --lines-terminated-by
'\n' --username SSANKU -P JDBC ORACLE: Example
for Select: The eval tool allows users to quickly run simple SQL queries
against a database; results are printed to the console. This allows users to
preview their import queries to ensure they import the data they expect. sqoop-eval
--connect jdbc:oracle:thin:@db.test.com:PORT:INSTANCE_NAME --select * from
DW_DATAMART.HCM_COMPANY_D JDBC INFORMIX:
example JDBC Informix:
Examples for Import: sqoop-import
--connect jdbc:informix-sqli://4jane.soi.com:15062/common:INFORMIXSERVER=ids_4jane
--driver com.informix.jdbc.IfxDriver --table portal_request_params –username
username -P Sqoop Import to HBASE table: Examples: sqoop-import
--connect jdbc:oracle:thin:@db.test.com:PORT:INSTANCE_NAME --username ssanku
--P --table DW_DATAMART.PAY_PAY_CHK_OPTION_D --hbase-table
DW_DATAMART.PAY_PAY_CHK_OPTION_D --column-family cf1 --hbase-create-table If no primary key defined on the
oracle table sqoop-import
--connect jdbc:oracle:thin:@db.test.com:1725:hrlites --username ssanku --P
--table PSMERCHANTID --hbase-table PSMERCHANTID --column-family cf
--hbase-row-key MERCHANTID --hbase-create-table --split-by MERCHANTID sqoop-import
--connect jdbc:oracle:thin:@db.test.com:PORT:INSTANCE_NAME --username ssanku
--P --table DW_DATAMART.PAY_PAYGROUP_D --hbase-table DW_DATAMART.PAY_PAYGROUP_D
--column-family cf1 --hbase-create-table
sqoop-import
--connect jdbc:oracle:thin:@db.test.com:1725:hrlites --username ssanku --P
--table PSMERCHANTID --hbase-table PSMERCHANTID --column-family cf
--hbase-create-table --split-by MERCHANTID Sqoop Import to HIVE table from Mysql
Database: Examples: sqoop
import --connect jdbc:mysql://172.16.16.128:3306/employees -- username=hive
--password=hive --driver com.mysql.jdbc.Driver --table=employees -- hive-import
--hive-table=empl.employees --target-dir=wp_users_import –direct sqoop import --connect jdbc:mysql://172.16.16.128:3306/employees --
username=hive --password=hive --driver com.mysql.jdbc.Driver --table=employees
-- hive-import --hive-table=empl.salaries --target-dir=wp_users_import --direct
... View more
Labels: