Member since
05-16-2016
783
Posts
111
Kudos Received
39
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
877 | 06-12-2019 09:27 AM | |
1712 | 05-27-2019 08:29 AM | |
3224 | 05-27-2018 08:49 AM | |
2894 | 05-05-2018 10:47 PM | |
1935 | 05-05-2018 07:32 AM |
02-04-2017
09:17 PM
if you are having trouble passing the special characte, trying using --map-column-java and you should be able to suprass those errors . type cast to your needs. or You can use --map-column-hive sqoop import ... --map-column-java id=String,value=Integer
... View more
02-04-2017
10:25 AM
There is no direct way of enforcing a strict mode to Impala However you can cotrol the run time behaviour of the queries by issuing SET statement for example controling the memory limit , max_scan_range_lenth Please refer this link SET STATEMENT
... View more
01-30-2017
08:26 AM
1 Kudo
As far I am concerned I dont think there is no way to dectect them kill automatically . One way to slove the long runining query is to inspect your query flow , streaming also enable strict mode - In this mode some risky query will not be performed like cartesian product , ordering without limit clause.
... View more
01-29-2017
09:43 AM
Could you add --verbose to your sqoop command paste the full log ?
... View more
01-28-2017
12:43 AM
is it authorizedkey or authorized_key should file be in user:hdfs group ? could you please clarify
... View more
01-25-2017
12:04 AM
Hi Can you anyone help me out with below configuration file and please let me know how to generate the SSH private key file - We configuring HA namenode cluster for testing purpose. <property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/exampleuser/.ssh/id_rsa</value>
</property> Not my area - I have no idea as to how to generate this file /home/exampleuser/.ssh/id_rsa Thanks
... View more
Labels:
01-24-2017
06:31 PM
You can use Impala or hive convert into Parquet stream from Flume is a good option . Below is the sample flume configuration. agent1.sources = kafka-source
agent1.channels = memory-channel
agent1.sinks = hdfs-sink
agent1.sources.kafka-source.type = org.apache.flume.source.kafka.KafkaSource
agent1.sources.kafka-source.zookeeperConnect = zookeeperHost:2191
agent1.sources.kafka-source.topic = hello-kafka-topic_Test
agent1.sources.kafka-source.groupId = flume
agent1.sources.kafka-source.channels = memory-channel
agent1.sources.kafka-source.interceptors = i1
agent1.sources.kafka-source.interceptors.i1.type = timestamp
agent1.channels.memory-channel.type = memory
agent1.channels.memory-channel.capacity = 10000
agent1.channels.memory-channel.transactionCapacity = 1000
agent1.sinks.hdfs-sink.type = hdfs
agent1.sinks.hdfs-sink.hdfs.path = hdfs://matty:8020/tmp/kafka/%{topic}/%y-%m-%d
agent1.sinks.hdfs-sink.hdfs.fileType = DataStream
agent1.sinks.hdfs-sink.channel = memory-channel Hive create table as parquet CREATE TABLE parquet_test_table (
id int,
str string,
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
STORED AS
INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'; we store the data in Avro format from Flume, then we use Impala to convert the data to Parquet .
... View more
01-24-2017
01:35 AM
sqoop job --exec jd -- --delete-target-dir --target-dir /user/matt/stud/studs --as-avrodatafile My Enviroment - sqoop-1.4.4-cdh5.0.0 I tried the same as you did but i did get the file already existing error under DEBUG which is ignored . if you dont mind could you share your directory structure. also could please try using -- verbose in your sqoop --exec it will give us some idea and share with us if you can. what version of sqoop and CDH are you runining in your machine. -rw-r--r-- 1 matt supergroup 0 2017-01-24 04:09 /user/matt/stud/_SUCCESS
-rw-r--r-- 1 matt supergroup 14 2017-01-24 04:09 /user/matt/stud/part-m-00000
-rw-r--r-- 1 matt supergroup 20 2017-01-24 04:09 /user/matt/stud/part-m-00001
-rw-r--r-- 1 matt supergroup 24 2017-01-24 04:09 /user/matt/stud/part-m-00002
-rw-r--r-- 1 matt supergroup 6 2017-01-24 04:09 /user/matt/stud/part-m-00003
drwxr-xr-x - matt supergroup 0 2017-01-24 04:23 /user/matt/stud/studs
[matt@localhost ~]$ hadoop fs -ls /user/matt/stud/studs/
Found 5 items
-rw-r--r-- 1 matt supergroup 0 2017-01-24 04:23 /user/matt/stud/studs/_SUCCESS
-rw-r--r-- 1 matt supergroup 301 2017-01-24 04:23 /user/matt/stud/studs/part-m-00000.avro
-rw-r--r-- 1 matt supergroup 307 2017-01-24 04:23 /user/matt/stud/studs/part-m-00001.avro
-rw-r--r-- 1 matt supergroup 311 2017-01-24 04:23 /user/matt/stud/studs/part-m-00002.avro
-rw-r--r-- 1 matt supergroup 293 2017-01-24 04:23 /user/matt/stud/studs/part-m-00003.avro
... View more
01-15-2017
07:32 PM
Few Suggestions that could help you out. How many reducers per btypes have you configured ? Check how many reducers are being invoked ? If you are doing a Join make sure that the large table is stated in last of the query , so it can be streamable. Would consider enabling the parallel execution mode in Hive. Enable if you can the Local Mode Have you enabled Jvm reuse ?
... View more
01-01-2017
06:47 PM
Can this be done in production ? mate
... View more
12-31-2016
03:50 AM
The mapreduce is getting successed and I am able to check the results in the hdfs. The problem is when i try to see in the histroy server the jobs are not there checkout the logs found this error 16/12/31 06:34:27 ERROR hs.HistoryFileManager: Error while trying to move a job to done
org.apache.hadoop.security.AccessControlException: Permission denied: user=mapred, access=READ, inode="/user/history/done_intermediate/matt/job_1483174306930_0005.summary":matt:hadoop:-rwxrwx---
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:182)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5461)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5443)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:5405)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1680)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1632)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1612)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1586)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:482)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:322)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
at sun.reflect.GeneratedConstructorAccessor29.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1139)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1127)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1117)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:264)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:231)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:224)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1290)
at org.apache.hadoop.fs.Hdfs.open(Hdfs.java:309)
at org.apache.hadoop.fs.Hdfs.open(Hdfs.java:54)
at org.apache.hadoop.fs.AbstractFileSystem.open(AbstractFileSystem.java:619)
at org.apache.hadoop.fs.FileContext$6.next(FileContext.java:785)
at org.apache.hadoop.fs.FileContext$6.next(FileContext.java:781)
at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
at org.apache.hadoop.fs.FileContext.open(FileContext.java:781)
at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.getJobSummary(HistoryFileManager.java:953)
at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.access$400(HistoryFileManager.java:82)
at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo.moveToDone(HistoryFileManager.java:370)
at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo.access$1400(HistoryFileManager.java:295)
at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$1.run(HistoryFileManager.java:843)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744) looks like a permission but i am not sure where i should change em and what should be the chmod value below is my current configuration sudo -u hdfs hadoop fs -mkdir /user
$ sudo -u hdfs hadoop fs -mkdir /user/matt
$ sudo -u hdfs hadoop fs -chown matt /user/matt
$ sudo -u hdfs hadoop fs -mkdir /user/history
$ sudo -u hdfs hadoop fs -chmod 1777 /user/history
$ sudo -u hdfs hadoop fs -chown mapred:hadoop \
/user/history
can someone please help me with this issue .
... View more
Labels:
- Labels:
-
CDH Manual Installation
-
MapReduce
-
YARN
11-29-2016
05:42 AM
Since you are going to use third party webpage I assuming that you wont be able to integrate or deploy flume sdk.if the webpage is ok in sending data via HTTP rather than using Flume's RPC , then I think HTTP source would be a good fit. From a client point of view HTTP source will act like a web server that accepts flume event.Either you can write your own Handler or use HTTPSourceXMLHandler in your configuration , the default Handler accepts Json format . The format which HTTPSourceXMLHandler accept is state below <events>
<event 1 2 3 ..>
<headers>
<header 1 2 3 ..>
</header>
<body> </body>
</event..>
</events> The handler will parse the XML into flume events and pass it on to the HTTP Source. Which will then pass on to Channel and goes to Sink or Another agent depends on the flow.
... View more
11-21-2016
11:34 PM
Check the all the impala and hive demon status using the below command , if anyone one of them is not runing up please start and fire the invalidate metadata for refersh. sudo service impala-state-store status
note - if not started please replace status with start. sudo service impala-catalog status
sudo service hive-metastore status sudo service impala-server status
... View more
11-16-2016
08:20 AM
I will certainly try to write the query in single line as per you suggestion , but i am wondering why do we need a placeholder $CONDITIONS When we are forcing the sqoop to perform only one job by using -num-mappers 1 ?
... View more
11-15-2016
02:44 AM
I am facing the sql error and sqoop error in two senarios . I performing this data for testing. DB - mysql Sqoop version : Sqoop version: 1.4.4-cdh5.0.0 My table citi +------+------------+-----------+
| id | country_id | city |
+------+------------+-----------+
| 10 | 101 | omaha |
| 11 | 102 | coloumbus |
| 12 | 103 | buff |
+------+------------+-----------+ table country +------------+---------+
| country_id | country |
+------------+---------+
| 101 | us |
| 102 | in |
| 103 | nz |
+------------+---------+ below is my sqoop import sqoop import \
> --connect jdbc:mysql://localhost/ms4 \
> --username xxx \
> --password yyy \
> --query 'SELECT citi.id, \
> country.name, \
> citi.city \
> FROM citi \
> JOIN country USING(country_id) \
> --num-mappers 1 \
> --target-dir cities below is the error i am seeing . I dont find anything wrong with my --query to my knoweldge . 16/11/15 05:27:02 INFO manager.SqlManager: Executing SQL statement: SELECT citi.id, \
country.name, \
citi.city \
FROM citi \
JOIN country USING(country_id) \
WHERE (1 = 0)
16/11/15 05:27:02 ERROR manager.SqlManager: Error executing statement: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '\
country.name, \
citi.city \
FROM citi \
JOIN country USING(country_id) \
WHERE' at line 1
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '\
country.name, \
citi.city \
FROM citi \
JOIN country USING(country_id) \
WHERE' at line 1
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.Util.getInstance(Util.java:386)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1990)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2625)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2119)
at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2283)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:699)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:708)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:243)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForQuery(SqlManager.java:233)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:356)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1298)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1110)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:396)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:506)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:222)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:231)
at org.apache.sqoop.Sqoop.main(Sqoop.java:240)
16/11/15 05:27:02 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1116)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:396)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:506)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:222)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:231)
at org.apache.sqoop.Sqoop.main(Sqoop.java:240) As a work around or trying to see if the WHERE $CONDITIONS is problem or not i skip WHERE $CONDITIONS by forcing the sqoop to use one mapper sqoop import \
> --connect jdbc:mysql://localhost/movielens \
> --username training \
> --password training \
> --query 'SELECT citi.id, \
> country.name, \
> citi.city \
> FROM citi \
> JOIN country USING(country_id)' \
> --num-mappers 1 \
> --target-dir cities I pretty sure we can force sqoop to avoid parallelism ,but it is complaining or throwing error. ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Query [SELECT citi.id, \
country.name, \
citi.city \
FROM citi \
JOIN country USING(country_id)] must contain '$CONDITIONS' in WHERE clause.
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:352)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1298)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1110)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:396)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:506)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:222)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:231 I would greatlt appreciate for any kind of information or solution. Thanks
... View more
Labels:
- Labels:
-
Sqoop
11-04-2016
09:01 PM
We had this exception for a while and it gone by itself. as far as i am concerned this exception occurs when namenode block locations is not fresh . check if you have HDFS Block skew condition . if you see this offten then its problem because it clearly denotes that it is missing some block otherwise you can ignore it.
... View more
11-04-2016
12:07 AM
Can you verify where you logs are pointing . also verify this in your hive-site.xml and make sure that value is true . <property>
<name>hive.server2.logging.operation.enabled</name>
<value>true</value>
</property The above should help you. also for fruther information refer this link https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-LogFiles
... View more
11-02-2016
07:06 PM
1 Kudo
Sqoop export will transfer data to Database using Insert statement as soon the user fires the export command ,Sqoop will connect to database to fetch the metada about the table. The only prequist that pertains to sqoop export command is that (--table parameter) table must exist prior to runining sqoop. You can have the table with primary key or not is up to your design . The user have to make sure that there not be any constraint violations while performing the Sqoop export(i,e INSERT)
... View more
11-02-2016
04:31 AM
2 Kudos
1 . Type show tables in Hive and note down the tables . 2 . check under user/hive/warehuse/ using Hue -> File Browser or command line that Customer folder or categories folders are already being populated . if so Remove it using HUE->File browser-Delete or Drop table command from Hive. Then re run the script and please let me know . Or Simply change the last line of the script sqoop import-all-tables \
-m 1 \
--connect jdbc:mysql://quickstart:3306/retail_db \
--username=retail_dba \ --password=cloudera \
--compression-codec=snappy \
--as-sequencefile \
--warehouse-dir=/user/hive/warehouse \
--hive-overwrite
... View more
11-01-2016
09:35 PM
Could you replace --as-parquetfile with --as-sequencefile and let me know if you are able pass through the error.
... View more
11-01-2016
07:43 PM
Its throwing a class cast exception , meaning you are trying to cast java.lang.String to org.apache.avro.generic.IndexedRecord which is not comptabile . Could you provide the table schema and your sqoop import command.
... View more
11-01-2016
04:42 AM
Could you let us know the version you are using ? Also as far I am concered only left semi joins are supported in Hive.
... View more
10-24-2016
10:39 PM
1 Kudo
You have to bucket the hive table but not sorted . Streaming to a unpartionated table is currently not supported. In your case please check your table schema table - m_tel_record
... View more
10-20-2016
09:42 PM
Execute the below commands , to have a better insight. SHOW LOCKS <TABLE_NAME>;
SHOW LOCKS <TABLE_NAME> EXTENDED;
SHOW LOCKS <TABLE_NAME> PARTITION (<PARTITION_DESC>);
SHOW LOCKS <TABLE_NAME> PARTITION (<PARTITION_DESC>) EXTENDED;
Does you hive supports concurrency ? hive.support.concurrency = default (false) Are you using HiveServer2 ?
... View more
- Tags:
- Us
10-13-2016
07:43 PM
kerjo I was thinking a work around of type casting in the hive side . I understand that your map-column-hive is being ignored . Correct me if I am wrong.
... View more
10-13-2016
06:37 AM
Would consider trying type casting BIGINT TO Timestamp . Also please refer this document , I read it long back. I am quoting it from the cloudera manul document If you use Sqoop to convert RDBMS data to Parquet, be careful with interpreting any resulting values from DATE, DATETIME, or TIMESTAMP columns. The underlying values are represented as the Parquet INT64 type, which is represented as BIGINT in the Impala table. The Parquet values represent the time in milliseconds, while Impala interprets BIGINT as the time in seconds. Therefore, if you have a BIGINT column in a Parquet table that was imported this way from Sqoop, divide the values by 1000 when interpreting as the TIMESTAMP type. I guess there is underlying problem with Timestamp when you use Parquet file. http://www.cloudera.com/documentation/archive/impala/2-x/2-1-x/topics/impala_parquet.html#parquet_data_types_unique_1
... View more
10-12-2016
09:07 PM
1 Kudo
impalad daemon is the one that is not able to access the jar for query processing since you have set the hdfs permission as 700. Your assumption is right and thats what I was refering in my previous post by stating Impala does not support HDFS-level user impersonation.
... View more
- « Previous
- Next »