Member since
09-17-2015
103
Posts
61
Kudos Received
18
Solutions
08-11-2021
08:47 AM
Some people use the Boto3 library to browse their Amazon bucket from Python, and I was searching the same for Azure. This is far from being optimized, but it could be a starting point.
First things first, we need to find the Azure access token. Notice those keys are supposed to rotate so you have to have that in mind.
In Azure portal, let's get to the Storage Account in the ResourceGroup defined for your account, and click on Access keys
There are two keys (for rotation without interruption), let's copy the first one.
In my CML project, I'm defining an AZURE_STORAGE_TOKEN environment variable with that key:
As you see above, 'STORAGE' variable has been populated. If you want it to be automatically populated, here's some code:
!pip3 install git+https://github.com/fletchjeff/cmlbootstrap#egg=cmlbootstrap
from cmlbootstrap import CMLBootstrap
# Instantiate API Wrapper
cml = CMLBootstrap()
# Set the STORAGE environment variable
try :
storage=os.environ["STORAGE"]
except:
storage = cml.get_cloud_storage()
storage_environment_params = {"STORAGE":storage}
storage_environment = cml.create_environment_variable(storage_environment_params)
os.environ["STORAGE"] = storage
Now the project! Install the required libraries:
pip3 install azure-storage-file-datalake
Here is the code listing files on the "datalake" path. This is not handling all exceptions and so on, that's really a starting point only and not meant to be used in a production environment.
!pip3 install azure-storage-file-datalake
import os, uuid, sys, re
from azure.storage.filedatalake import DataLakeServiceClient
from azure.core._match_conditions import MatchConditions
from azure.storage.filedatalake._models import ContentSettings
def initialize_storage_account(storage_account_name, storage_account_key):
try:
global service_client
service_client = DataLakeServiceClient(account_url="{}://{}.dfs.core.windows.net".format(
"https", storage_account_name), credential=storage_account_key)
except Exception as e:
print(e)
def list_directory_contents(path):
try:
file_system_client = service_client.get_file_system_client(container)
paths = file_system_client.get_paths(path)
for path in paths:
print(path.name)
except Exception as e:
print(e)
storage = os.environ['STORAGE']
storage_account_key = os.environ['AZURE_STORAGE_TOKEN']
m = re.search('abfs://(.+?)@(.+?)\.dfs.core\.windows\.net', storage)
if m:
container = m.group(1)
storage_name = m.group(2)
initialize_storage_account(storage_name, storage_account_key)
list_directory_contents("datalake")
Happy browsing!
... View more
Labels:
09-22-2020
02:00 AM
2 Kudos
In Cloudera Machine Learning experience (or CDSW for the on-prem version), projects are backed with git. You might want to use GitHub on your projects, so here is a simple way to do that.
First things first: there are basically two ways of interacting with git/GitHub: HTTPS or SSH; We'll use the latter to make the authentication easy. You might also consider SSO or 2FA for enhancing security, here we'll focus on the basics.
To make this authentication going on under the hood, copy our SSH key from CML to Github.
Find your SSH key in the Settings of CML:
Copy that key and add it in Github, under the SSH and GPG keys in your github.com settings: Add SSH key.
Put cdsw in the Title and paste your ssh content in the Key:
Let's start with creating a new project on github.com:
The important thing here is the access mode we want to use: SSH
In CML, start a new project with a template:
Open a Terminal window in a new session:
Convert the project to a git project: cdsw@qp7h1qllrh9dx1hd:~$ git init
Initialized empty Git repository in /home/cdsw/.git/
Add all files to git: cdsw@qp7h1qllrh9dx1hd:~$ git add .
Commit of the project in GitHub: cdsw@qp7h1qllrh9dx1hd:~$ git commit -m "initial commit"
[master (root-commit) 5d75525] initial commit
47 files changed, 14086 insertions(+)
create mode 100755 .gitignore
create mode 100644 LICENSE.txt
create mode 100755 als.py
[...]
Add a remote origin server with the "URL" of the remote repository where your local repository will be pushed: cdsw@qp7h1qllrh9dx1hd:~$ git remote add origin git@github.com:laurentedel/MyProject.git
Make the current Git branch a master branch: cdsw@qp7h1qllrh9dx1hd:~$ git branch -M master
Finally, push the changes (so all files for the first commit) to our master, so on github.com: cdsw@qp7h1qllrh9dx1hd:~$ git push -u origin master
The authenticity of host 'github.com (140.82.113.4)' can't be established.
RSA key fingerprint is SHA256:nThbg6kXUpJWGl7E1IGOCspRomTxdCARLviKw6E5SY8.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'github.com,140.82.113.4' (RSA) to the list of known hosts.
Counting objects: 56, done.
Delta compression using up to 16 threads.
Compressing objects: 100% (46/46), done.
Writing objects: 100% (56/56), 319.86 KiB | 857.00 KiB/s, done.
Total 56 (delta 1), reused 0 (delta 0)
remote: Resolving deltas: 100% (1/1), done.
To github.com:laurentedel/MyProject.git
* [new branch] master -> master
Branch 'master' set up to track remote branch 'master' from 'origin'.
There you go!
Now we can use the git commands are used to Modify file(s): cdsw@qp7h1qllrh9dx1hd:~$ echo "# MyProject" >> README.md
What's our status? cdsw@qp7h1qllrh9dx1hd:~$ git status
On branch master
Your branch is up to date with 'origin/master'.
Untracked files:
(use "git add <file>..." to include in what will be committed)
README.md
nothing added to commit but untracked files present (use "git add" to track)
Commit/push: cdsw@qp7h1qllrh9dx1hd:~$ git add README.md
cdsw@qp7h1qllrh9dx1hd:~$ git commit -m "adding a README"
[master 7008e88] adding a README
1 file changed, 1 insertion(+)
create mode 100644 README.md
cdsw@qp7h1qllrh9dx1hd:~$ git push -u origin master
Warning: Permanently added the RSA host key for IP address '140.82.114.4' to the list of known hosts.
Counting objects: 3, done.
Delta compression using up to 16 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 290 bytes | 18.00 KiB/s, done.
Total 3 (delta 1), reused 0 (delta 0)
remote: Resolving deltas: 100% (1/1), completed with 1 local object.
To github.com:laurentedel/MyProject.git
5d75525..7008e88 master -> master
Branch 'master' set up to track remote branch 'master' from 'origin'.
Happy commits!
... View more
Labels:
12-03-2018
04:28 PM
This article has been set on a HDP 2.5.3 version, you may consider adjusting some parameters to reflect your actual version. We'll here set Kafka loglevel through the Logging MBean with jConsole. For that, the first step is to enable JMX access: add in Kafka configs/kafka-env template export KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false"
export JMX_PORT="9999"
For avoiding JMX port conflicts like mentioned in https://community.hortonworks.com/articles/73750/kafka-jmx-tool-is-failing-with-port-already-in-use.html,
let’s modify /usr/hdp/current/kafka/bin/kafka-run-class.sh on all broker
nodes: replace # JMX port to use
if [ $JMX_PORT ]; then
KAFKA_JMX_OPTS="$KAFKA_JMX_OPTS -Dcom.sun.management.jmxremote.port=$JMX_PORT"
fi
with # JMX port to use
if [ $ISKAFKASERVER = "true" ]; then
JMX_REMOTE_PORT=$JMX_PORT
else
JMX_REMOTE_PORT=$CLIENT_JMX_PORT
fi
if [ $JMX_REMOTE_PORT ]; then
KAFKA_JMX_OPTS="$KAFKA_JMX_OPTS -Dcom.sun.management.jmxremote.port=$JMX_REMOTE_PORT"
fi After brokers has been restarted, lets modify the logLevel with
jConsole: $ jconsole <BROKER_FQDN>:<JMX_PORT> It launches a jconsole window, asking for Retry insecurely, go ahead
with that go to the Mbeans tab then Kafka/kafka.log4jController/Attributes, and double-click on the Value of Loggers to get all Log4j controllers You can see the kafka logger above those presented is set to INFO. We can check it using the getLogLevel Operations entering the kafka
loggerName Fortunately, you can also set the value without restarting with the
setLogLevel operation, putting in DEBUG or TRACE for example.
... View more
Labels:
10-17-2017
10:09 AM
2 Kudos
When starting spark-shell, it tries to bind to port 4040 for the SparkUI. If that port is already taken because of another spark-shell session active, it tries then to bind on 4041, then 4042, etc. Each time the binding didn't suceed, there's a huge WARN stack trace which could be filtered [user@serv hive]$ SPARK_MAJOR_VERSION=2 spark-shell
SPARK_MAJOR_VERSION is set to 2, using Spark2
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/09/20 11:49:43 WARN AbstractLifeCycle: FAILED ServerConnector@2d258eff{HTTP/1.1}
{0.0.0.0:4040}: java.net.BindException: Address already in use
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.spark_project.jetty.server.ServerConnector.open(ServerConnector.java:321)
at org.spark_project.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)
at org.spark_project.jetty.server.ServerConnector.doStart(ServerConnector.java:236)
at org.spark_project.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$newConnector$1(JettyUtils.scala:333)
at org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$httpConnect$1(JettyUtils.scala:365)
at org.apache.spark.ui.JettyUtils$$anonfun$7.apply(JettyUtils.scala:368)
at org.apache.spark.ui.JettyUtils To filter that stacktrace, let's put that class log4j verbosity in ERROR level in /usr/hdp/current/spark2-client/conf/log4j.properties # Added for not having stack traces when binding to SparkUI
log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
... View more
Labels:
06-20-2017
01:28 PM
1 Kudo
adapted 2 requests for Postgres (and added vacuum): CREATE TEMPORARY TABLE tmp_request_id AS SELECT MAX(request_id) AS request_id FROM request WHERE create_time <= (SELECT (EXTRACT(epoch FROM NOW()) - 2678400) * 1000 as epoch_1_month_ago_times_1000);
CREATE TEMPORARY TABLE tmp_task_id AS SELECT MAX(task_id) AS task_id FROM host_role_command WHERE request_id <= (SELECT request_id FROM tmp_request_id);
CREATE TEMPORARY TABLE tmp_upgrade_ids AS SELECT upgrade_id FROM upgrade WHERE request_id <= (SELECT request_id FROM tmp_request_id);
DELETE FROM execution_command WHERE task_id <= (SELECT task_id FROM tmp_task_id);
DELETE FROM host_role_command WHERE task_id <= (SELECT task_id FROM tmp_task_id);
DELETE FROM role_success_criteria WHERE request_id <= (SELECT request_id FROM tmp_request_id);
DELETE FROM stage WHERE request_id <= (SELECT request_id FROM tmp_request_id);
DELETE FROM topology_logical_task;
DELETE FROM requestresourcefilter WHERE request_id <= (SELECT request_id FROM tmp_request_id);
DELETE FROM requestoperationlevel WHERE request_id <= (SELECT request_id FROM tmp_request_id);
DELETE FROM upgrade_item WHERE upgrade_group_id IN (SELECT upgrade_group_id FROM upgrade_group WHERE upgrade_id IN (SELECT upgrade_id FROM tmp_upgrade_ids));
DELETE FROM upgrade_group WHERE upgrade_id IN (SELECT upgrade_id FROM tmp_upgrade_ids);
DELETE FROM upgrade WHERE request_id <= (SELECT request_id FROM tmp_request_id);
DELETE FROM request WHERE request_id <= (SELECT request_id FROM tmp_request_id);
DELETE FROM topology_host_task;
DELETE FROM topology_host_request;
DELETE FROM topology_logical_request;
DELETE FROM topology_host_info;
DELETE FROM topology_hostgroup;
DELETE FROM topology_request;
DROP TABLE tmp_upgrade_ids;
DROP TABLE tmp_task_id;
DROP TABLE tmp_request_id;
VACUUM FULL VERBOSE ANALYZE;
... View more
01-18-2017
02:17 PM
On RHEL/CentOS you might encounter an exception when trying to stop or restart Oozie : resource_management.core.exceptions.Fail: Execution of 'cd /var/tmp/oozie && /usr/hdp/current/oozie-server/bin/oozie-stop.sh' returned 1. -bash: line 0: cd: /var/tmp/oozie: No such file or directory This is likely because of a shell crontab /etc/cron.daily/tmpwatch which delete files/directories unmodified for 30d+ [root@local ~]# cat /etc/cron.daily/tmpwatch
#! /bin/sh
flags=-umc
/usr/sbin/tmpwatch "$flags" -x /tmp/.X11-unix -x /tmp/.XIM-unix \
-x /tmp/.font-unix -x /tmp/.ICE-unix -x /tmp/.Test-unix \
-X '/tmp/hsperfdata_*' 10d /tmp
/usr/sbin/tmpwatch "$flags" 30d /var/tmp
for d in /var/{cache/man,catman}/{cat?,X11R6/cat?,local/cat?}; do
if [ -d "$d" ]; then
/usr/sbin/tmpwatch "$flags" -f 30d "$d"
fi
done
Just recreate the directory and you're good to go [root@local ~]# mkdir /var/tmp/oozie
[root@local ~]# chown oozie:hadoop /var/tmp/oozie
[root@local ~]# chmod 755 /var/tmp/oozie
... View more
Labels:
06-28-2016
01:43 PM
1 Kudo
When using a postgresql DB for Hue, you might have encoutered
[root@hue ~]# cd /usr/lib/hue
[root@hue hue]# source ./build/env/bin/activate
(env)[root@hue hue]# hue syncdb
Traceback (most recent call last):
File "/usr/lib/hue/build/env/bin/hue", line 9, in <module>
load_entry_point('desktop==2.6.1', 'console_scripts', 'hue')()
File "/usr/lib/hue/desktop/core/src/desktop/manage_entry.py", line 60, in entry
execute_manager(settings)
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/core/management/__init__.py", line 438, in execute_manager
utility.execute()
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/core/management/__init__.py", line 379, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/core/management/__init__.py", line 261, in fetch_command
klass = load_command_class(app_name, subcommand)
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/core/management/__init__.py", line 67, in load_command_class
module = import_module('%s.management.commands.%s' % (app_name, name))
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/utils/importlib.py", line 35, in import_module
__import__(name)
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/South-0.8.2-py2.6.egg/south/management/commands/__init__.py", line 10, in <module>
import django.template.loaders.app_directories
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/template/loaders/app_directories.py", line 21, in <module>
mod = import_module(app)
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/utils/importlib.py", line 35, in import_module
__import__(name)
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/contrib/admin/__init__.py", line 1, in <module>
from django.contrib.admin.helpers import ACTION_CHECKBOX_NAME
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/contrib/admin/helpers.py", line 1, in <module>
from django import forms
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/forms/__init__.py", line 17, in <module>
from models import *
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/forms/models.py", line 6, in <module>
from django.db import connections
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/db/__init__.py", line 77, in <module>
connection = connections[DEFAULT_DB_ALIAS]
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/db/utils.py", line 91, in __getitem__
backend = load_backend(db['ENGINE'])
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/db/utils.py", line 32, in load_backend
return import_module('.base', backend_name)
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/utils/importlib.py", line 35, in import_module
__import__(name)
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/db/backends/postgresql_psycopg2/base.py", line 24, in <module>
raise ImproperlyConfigured("Error loading psycopg2 module: %s" % e)
django.core.exceptions.ImproperlyConfigured: Error loading psycopg2 module: No module named psycopg2
Download the psycopg2 module https://pypi.python.org/packages/source/p/psycopg2/psycopg2-2.6.1.tar.gz#md5=842b44f8c95517ed5b792081a2370da1 Then install it with easy_install (env)[root@hue hue]# easy_install /root/psycopg2-2.6.1.tar.gz
Processing psycopg2-2.6.1.tar.gz
Running psycopg2-2.6.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-IDKkmV/psycopg2-2.6.1/egg-dist-tmp-CR0nZv
zip_safe flag not set; analyzing archive contents...
psycopg2.tests.test_types_basic: module references __file__
psycopg2.tests.test_module: module references __file__
Adding psycopg2 2.6.1 to easy-install.pth file
Installed /usr/lib/hue/build/env/lib/python2.6/site-packages/psycopg2-2.6.1-py2.6-linux-x86_64.egg
Processing dependencies for psycopg2==2.6.1
Finished processing dependencies for psycopg2==2.6.1
done ! (You might however consider Ambari views)
... View more
Labels:
05-12-2016
02:00 AM
1 Kudo
Using Sqoop (Sql from/to Hadoop), you can use a password file instead of a plaintext password, which is more secure : <arg>--password-file</arg>
<arg>hdfs://NAMENODE/teradata.password</arg> You may end with error like 3737 [main] ERROR org.apache.sqoop.teradata.TeradataSqoopExportHelper - Exception running Teradata export job
com.teradata.connector.common.exception.ConnectorException: java.sql.SQLException: [Teradata Database] [TeraJDBC 15.00.00.20] [Error 8017] [SQLState 28000] The UserId, Password or Account is invalid. If you set your “Password” pass with vi you’ll end with a line feed control character, making the password invalid. To find if there’s a LF ending the password file, use od (display file in octal format): [root@localhost ~]# od -c teradata.password
0000000 P a s s w o r d \n
0000011 You’ll have to delete your newline control character using tr : [root@localhost ~]# tr -d '\n' < teradata.password > teradata.password.new
[root@localhost ~]# od -c teradata.password.new
0000000 P a s s w o r d
0000010
... View more
Labels:
02-01-2016
05:08 PM
5 Kudos
I'll use iperf tool, which is available in the EPEL repo if you're using RH/CentOS. Installation is pretty straightforward : # yum -y install iperf For use, you'll have to execute one instance on the first machine as the server, and another on the other machine as the client. Execute as a server (-s) on port (-p) 2000 : [root@machine01 ~]# iperf -s -p 2000
------------------------------------------------------------
Server listening on TCP port 2000
TCP window size: 85.3 KByte (default)
Now execute client on the other machine for 25 seconds (-t 25) with 5 seconds (-i5) between each periodic report, so we'll have 5 reports [root@machine02 ~]# iperf -c machine01 -p 2000 -i5 -t25
------------------------------------------------------------
Client connecting to machine01, TCP port 2000
TCP window size: 23.2 KByte (default)
------------------------------------------------------------
[ 3] local 10.195.196.18 port 45284 connected with 10.195.196.48 port 2000
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 5.0 sec 3.04 GBytes 5.22 Gbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 5.0-10.0 sec 3.37 GBytes 5.80 Gbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 10.0-15.0 sec 3.34 GBytes 5.73 Gbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 15.0-20.0 sec 3.34 GBytes 5.74 Gbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 20.0-25.0 sec 3.13 GBytes 5.38 Gbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-25.0 sec 16.2 GBytes 5.58 Gbits/sec
or simply send 10GB over the network : [root@machine02 ~]# iperf -c machine01 -p 2000 -n 10000M
------------------------------------------------------------
Client connecting to machine01, TCP port 2000
TCP window size: 23.2 KByte (default)
------------------------------------------------------------
[ 3] local 10.195.196.18 port 45619 connected with 10.195.196.48 port 2000
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-15.4 sec 9.77 GBytes 5.44 Gbits/sec
... View more
01-13-2016
11:10 PM
10 Kudos
in the documentation, you can find on the Sqoop part You can use Sqoop to import data into HDFS or directly into Hive. However, Sqoop can only import data into Hive as a text file or as a SequenceFile. To use the ORC file format, you must use a two-phase approach: first use Sqoop to move the data into HDFS, and then use Hive to convert the data into the ORC file format[...] However, we can use the Sqoop-HCatalog integration feature, which is a table abstraction. Let's use the HDP 2.3.2 sandbox, starting with creating our Hive table, stored as ORC: [root@sandbox ~]# hive
hive> CREATE TABLE cds (id int, artist string, album string) STORED AS ORCFILE;
hive> INSERT INTO TABLE cds values (1,"The Shins","Port of Morrow");
hive> select * from cds;
1 The Shins Port of Morrow
Now let's build our MySQL table [root@sandbox ~]# mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 85
Server version: 5.1.73 Source distribution
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> use test;
Database changed
mysql> CREATE TABLE albums (id INT, artist varchar(255), album varchar(255), primary key(id));
Query OK, 0 rows affected (0.03 sec)
mysql> INSERT INTO albums VALUES (2, 'Family Of The Year', 'Loma Vista'), (3, 'Michel Petrucciani', 'Trio in Tokyo');
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> GRANT ALL PRIVILEGES ON * . * TO 'newbie'@'localhost';
Query OK, 0 rows affected (0.01 sec)
mysql> exit;
Bye
Now the import part, with the added "--driver com.mysql.jdbc.Driver" to avoid error like 16/01/13 22:04:14 ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@4d574915 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries. as described by @Mark Lochbihler in this HCC article [root@sandbox ~]# sqoop import --connect jdbc:mysql://localhost/test --username newbie --table albums --hcatalog-table cds --driver com.mysql.jdbc.Driver
[...]
16/01/13 22:58:20 INFO mapreduce.ImportJobBase: Beginning import of albums
16/01/13 22:58:21 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM albums AS t WHERE 1=0
16/01/13 22:58:21 INFO hcat.SqoopHCatUtilities: Configuring HCatalog for import job
[...]
16/01/13 23:00:03 INFO mapreduce.ImportJobBase: Retrieved 2 records.
We retrieved our 2 records, let's check that in Hive : hive> select * from cds;
OK
1 The Shins Port of Morrow
2 Family Of The Year Loma Vista
3 Michel Petrucciani Trio in Tokyo
Time taken: 11.485 seconds, Fetched: 3 row(s)
... View more
Labels: