Member since
04-04-2016
166
Posts
168
Kudos Received
29
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2901 | 01-04-2018 01:37 PM | |
4915 | 08-01-2017 05:06 PM | |
1574 | 07-26-2017 01:04 AM | |
8919 | 07-21-2017 08:59 PM | |
2609 | 07-20-2017 08:59 PM |
05-06-2016
05:41 PM
3 Kudos
@Nilesh Below given is your solution: Input: mysql> select * from SERDES; +----------+------+----------------------------------------------------+ | SERDE_ID | NAME | SLIB
| +----------+------+----------------------------------------------------+ | 56 | NULL |
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | 57 | NULL |
org.apache.hadoop.hive.ql.io.orc.OrcSerde | | 58 | NULL |
NULL
| | 59 | NULL |
org.apache.hadoop.hive.ql.io.orc.OrcSerde | | 60 | NULL |
org.apache.hadoop.hive.ql.io.orc.OrcSerde | | 61 | NULL |
org.apache.hadoop.hive.ql.io.orc.OrcSerde | | 62 | NULL |
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | +----------+------+----------------------------------------------------+ 7 rows in set (0.00 sec) Command: sqoop import --connect jdbc:mysql://test:3306/hive \
--username hive \
--password test \ --table
SERDES \
--hcatalog-database test \
--hcatalog-table SERDES \
--create-hcatalog-table \
--hcatalog-storage-stanza "stored as orcfile" \ --outdir
sqoop_import \ -m 1 \
--compression-codec org.apache.hadoop.io.compress.SnappyCodec \ --driver com.mysql.jdbc.Driver Logs: ... ... 16/05/06 13:30:46 INFO hcat.SqoopHCatUtilities: HCatalog
Create table statement: create table `demand_db`.`serdes` ( `serde_id`
bigint, `name`
varchar(128), `slib`
varchar(4000)) stored as orcfile ... ... 16/05/06 13:32:55 INFO mapreduce.Job: Job
job_1462201699379_0089 running in uber mode : false 16/05/06 13:32:55 INFO mapreduce.Job: map 0% reduce 0% 16/05/06 13:33:07 INFO mapreduce.Job: map 100% reduce 0% 16/05/06 13:33:09 INFO mapreduce.Job: Job
job_1462201699379_0089 completed successfully 16/05/06 13:33:09 INFO mapreduce.Job: Counters: 30 File System
Counters FILE:
Number of bytes read=0 FILE:
Number of bytes written=297179 FILE:
Number of read operations=0 FILE:
Number of large read operations=0 FILE:
Number of write operations=0 HDFS:
Number of bytes read=87 HDFS:
Number of bytes written=676 HDFS:
Number of read operations=4 HDFS:
Number of large read operations=0 HDFS:
Number of write operations=2 Job
Counters Launched
map tasks=1 Other
local map tasks=1 Total
time spent by all maps in occupied slots (ms)=14484 Total
time spent by all reduces in occupied slots (ms)=0 Total
time spent by all map tasks (ms)=7242 Total
vcore-seconds taken by all map tasks=7242 Total
megabyte-seconds taken by all map tasks=11123712 Map-Reduce
Framework Map
input records=8 Map
output records=8 Input
split bytes=87 Spilled
Records=0 Failed
Shuffles=0 Merged
Map outputs=0 GC
time elapsed (ms)=92 CPU
time spent (ms)=4620 Physical
memory (bytes) snapshot=353759232 Virtual
memory (bytes) snapshot=3276144640 Total
committed heap usage (bytes)=175112192 File Input
Format Counters Bytes
Read=0 File Output
Format Counters Bytes
Written=0 16/05/06 13:33:09 INFO mapreduce.ImportJobBase: Transferred
676 bytes in 130.8366 seconds (5.1668 bytes/sec) 16/05/06 13:33:09 INFO mapreduce.ImportJobBase: Retrieved 8
records. Output: hive> select * from serdes; OK 56 NULL org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 57 NULL org.apache.hadoop.hive.ql.io.orc.OrcSerde 58 NULL NULL 59 NULL org.apache.hadoop.hive.ql.io.orc.OrcSerde 60 NULL org.apache.hadoop.hive.ql.io.orc.OrcSerde 61 NULL org.apache.hadoop.hive.ql.io.orc.OrcSerde 62 NULL org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 63 NULL org.apache.hadoop.hive.ql.io.orc.OrcSerde Time taken: 2.711 seconds, Fetched: 8 row(s) hive>
... View more
05-06-2016
04:34 PM
3 Kudos
@Kaliyug Antagonist Unicode file: [root@test test]# pwd /root/test [root@test test]# cat xyz Les caractères accentués (Français) En données nous avons confiance Données, données, partout et tous les noeuds étaient déconnecté Données, données, partout [root@test test]# External table DDL: create external table demand_db.unicode (data string) COMMENT 'External table for data cleansing' LOCATION '/tmp/test/'; External table location: [root@test ~]# hdfs dfs -mkdir -p /tmp/test [root@test ~]# hdfs dfs -chmod -R 777 /tmp/test [root@test ~]# hdfs dfs -ls /tmp Output: hive> create external table unicode > (data string) > COMMENT
'External table for data cleansing' > LOCATION
'/tmp/test/'; OK Time taken: 0.502 seconds hive> select * from unicode; OK Les caractères accentués (Français) En données nous avons confiance Données, données, partout et tous les noeuds étaient déconnecté Données, données, partout Time taken: 0.897 seconds, Fetched: 8 row(s) hive> Conclusion: You do not need to covert unicode character set. Also String works perfectly in this case. Thanks
... View more
05-06-2016
04:10 PM
@omkar pathallapalli Let us know if the solution I posted worked for you. Thanks
... View more
05-06-2016
05:54 AM
@santosh rai Out of curiosity do you have any specific used case for using 2.2?
... View more
05-06-2016
04:49 AM
2 Kudos
@omkar pathallapalli This issue should be resolved by adding --driver com.mysql.jdbc.Driver at the end of the sqoop command. For example: sqoop import --connect $DB_CONNECTION --username $DB_USERNAME --password $DB_PASSWORD --table salaries --target-dir /tmp/salaries --outdir sqoop_import -m -1 --fields-terminated-by ',' --driver com.mysql.jdbc.Driver
Thanks
... View more
05-06-2016
03:29 AM
3 Kudos
@Sunile Manjee The supported policies for late data handling are:
backoff: Take the maximum late cut-off and check every specified time. exp-backoff (default): Recommended. Take the maximum cut-off date and check on an exponentially determined time. final:Take the maximum late cut-off and check once. For example, a late cut-off of hours (8) means data can be delayed by up to 8 hours: <late-arrival cut-off="hours(6)”/> The, late input in the following process specification is handled by the /apps/myapp/latehandle workflow: <late-process policy="exp-backoff" delay="hours(2)”> <late-input input="input" workflow-path="/apps/myapp/latehandle" /> </late-process> So this means that for 8 hours till feed arrives the workflow will be retried. Once the feed arrives within that window, the window will be reset. Now inside /apps/myapp/latehandle you can put your own logic (It may be a sqoop/hive/shell etc etc). The processing here will determine what will happen to that late feed. For simplified scenarios we can run the actual workflow or might modify for a special workflow which handles the dependencies and boundary cases. Thanks
... View more
04-20-2016
07:19 PM
3 Kudos
After my initial research below is what I found about the security options in HDF:
1. To enable the User Interface to be accessed over HTTPS instead of HTTP, the "security properties" heading in the nifi.properties file
needs to be edited. 2. The user authentication is aided by the Login Identity Provider which is a pluggable mechanism for authenticating users via
their username/password. a. Login Identity Provider integrates with a Directory Server to authenticate users using LDAP.
username/password authentication can be enabled by referencing this provider in nifi.properties. b. .Login Identity Provider also integrates with a Kerberos Key Distribution Center (KDC) to authenticate users.
NiFi can be configured to use Kerberos SPNEGO (or "Kerberos Service") for authentication. Note: By default NiFi will require client certificates for authenticating users over HTTPS. So explicitly which Login Identity Provider to use needs to be configured in nifi.properties file. 3. Levels of Access in HDF can be controlled by setting up the user of the Authority Provider (Admin) who can then give
the corresponding roles to the requesting users. Below roles are supported: i) Administrator ii) Data Flow Manager iii) Read Only iv) Provenance v) NiFi 4. Out of the box NiFi provides several options to encrypt and decrypt the data.
The EncryptContent processor allows for the encryption and decryption of data, both internal to NiFi and integrated with external systems,
such as openssl and other data sources and consumers. Detailed information can be found in HDF documentation: https://docs.hortonworks.com/HDPDocuments/HDF1/HDF-1.2/bk_AdminGuide/content/ch_administration_guide.html Thanks
... View more
04-20-2016
07:05 PM
@emaxwell Thank you
... View more
04-20-2016
02:23 PM
2 Kudos
I am planning to use HDF for a particular used case for ingestion of a lot of flat files and some sensitive metadata from relation databases. In conjunction it will work with HDP 2.4 cluster. My question is apart from the out of the box security provided by Apache nifi itself what are the other security best practices which should be implemented for HDF. For more info the HDP cluster will be secured using kerberos, ranger and knox. Thanks.
... View more
Labels:
- Labels:
-
Apache NiFi
-
Cloudera DataFlow (CDF)
04-20-2016
03:01 AM
1 Kudo
Hi, @Peter Coates Assuming you have moderate number of files did you tried the below option: bash$ hadoop distcp2 -f hdfs://nn1:8020/srclist hdfs://nn2:8020/bar/foo Where srclist contains (you can populate this file by recursive listing) hdfs://nn1:8020/foo/dir1/a
hdfs://nn1:8020/foo/dir2/b More info here: https://hadoop.apache.org/docs/r1.2.1/distcp2.html Please let me know if this works. Thanks
... View more
- « Previous
- Next »