Member since
03-04-2019
11
Posts
0
Kudos Received
0
Solutions
11-20-2019
07:55 PM
I have a Dell 7920 server with Ubuntu 16.04 installed. I have OpenJDK 8 installed. Earlier in the year, I had installed CDH 5.16.2 in anticipation of carrying out a project, which was subsequently carried out on a separate platform (management decision) Last week I installed CUDA and tested it to ensure that it was working. On running nvcc -V, I get the following: Built on Sun_Jul_28_19:07:16_PDT_2019 Cuda compilation tools, release 10.1, V10.1.243 I had been using CDH 5.16.2 the instance for training myself up and have decided to upgrade it. Through my investigations, I observed through the documentation that the amount of work required to upgrade from CDH 5.16.2 to CDH 6.3.1 was far more than a mere uninstalling of CDH 5.16.2 and subsequent installation of 6.3.1. Therefore I proceeded with the uninstallation of 5.16.2 as per https://docs.cloudera.com/documentation/enterprise/5-16-x/topics/cm_ig_uninstall_cm.html Upon completion of uninstallation, I found that many CDH 5.16 files still existed in my system, especially in /etc/apt/sources.list.d, which I removed and stored elsewhere. I then carried out the following Step 1 and 2 (already done since I have OpenJDK 😎 in the installation process https://docs.cloudera.com/documentation/enterprise/latest/topics/configure_cm_repo.html#cm_repo 1. Download the cloudera.list file for your OS version to the /etc/apt/sources.list.d/ directory on the Cloudera Manager Server host. You can find the URL in the Repo File column in the Cloudera Manager 6 Version and Download Information table for the Cloudera Manager version you want to install. 2. Import the repository signing GPG key: (I heard removed the previous key prior to this step) wget https://archive.cloudera.com/cm6/6.3.0/ubuntu1604/apt/archive.key sudo apt-key add archive.key Update your system package index by running: sudo apt-get update I then carried executed the commands in Step 3 https://docs.cloudera.com/documentation/enterprise/latest/topics/install_cm_server.html sudo apt-get install cloudera-manager-daemons cloudera-manager-agent cloudera-manager-server for which I receive the following messages: Reading package lists... Done Building dependency tree Reading state information... Done You might want to run 'apt-get -f install' to correct these: The following packages have unmet dependencies: cloudera-manager-server-db-2 : Depends: cloudera-manager-server (= 5.16.1-1.cm5161.p0.1~xenial-cm5) but 6.3.1~1466458.ubuntu1604 is to be installed nvidia-cuda-toolkit : Depends: nvidia-cuda-dev (= 7.5.18-0ubuntu1) but it is not going to be installed E: Unmet dependencies. Try 'apt-get -f install' with no packages (or specify a solution). I am now a little lost as to how I should proceed. Please advice. Thank you
... View more
Labels:
11-13-2019
12:54 AM
Thanks @Shelton. Just to check: When Python directly connects with the MySQL database, wouldn't that infer that only Python is working on the tables, and not Spark? This is more so since an ODBC is used and not a JDBC. Is there a way for Spark to directly access the MySQL database similar to what @jsensharma pointed out above?
... View more
11-10-2019
07:28 PM
Hi @jsensharma. Thanks for the reply. I realised that I had not worded my query properly and am actually looking to be able to connect to MySQL / Postgres through Apache spark using a Jupyter Workbook. I am more comfortable working with Python and am not at all familiar with Scala /Java (can't stand them either). Apologies in advance for the vague request earlier
... View more
11-10-2019
05:52 PM
I have installed Cloudera 5.16.2 on Ubuntu 16.04.
Most of the data I need to work on are in MySQL or Postgresql databases and I want to use Apache Spark to work on this data directly. I prefer working with PySpark and have installed and configured Anaconda through Cloudera Manager.
How would I connect to MySQL / Postgres using Apache Spark on a Jupyter Notebook? Could you give me a step by step guide to achieve this?
Thank you
... View more
Labels:
11-10-2019
05:27 PM
I had to install Java 8 and set JAVA_HOME to this
... View more
06-18-2019
05:58 PM
Hi Istvan, I had managed to carry out the changes successfully. Am grateful for your assistance.
... View more
05-28-2019
02:57 AM
Hi Istvan, Thank you for your reply and apologies for the delay. I have carried out the changes you adviced, including changing the following: Solr HTTP Port (default 8983) to 8986 Solr Admin Port (default 8984) to 8987 Solr HTTPs Port (default 8985) to 8988 ZK Client Port (default 2181) to 2182 Quorum Port (default 3181) to 3182 Election Port (default 4181) to 4182 JMX Remote Port (default 9010) to 9011 I then started Cloudera becuase it was not running since I shut it down months prior. Everything worked normally. The Banana dahsboard I mentioned earlier worked fine, indicating that there is no conflict with Cloudera Search, as had happened much earlier. However, in trying to create a collection in Cloudera Search as a means of testing my installation, I ran into the solr-env.sh file, where SOLR_ZK_ENSEMBLE still points to port 2181. Since you had stated that Cloudera Manager would update the affected client configurations on starting /restarting, I am at a loss as to how this could be. Should I change these configurations manually? If so, what should I change, and what is the process for this? Thank you
... View more
05-09-2019
07:28 PM
I have completed the installation of CDH 5.16.1 on Ubuntu 16.04 LTS with OpenJDK 8.
My employer has developed a UI using Banana (currently packaged with Fusion) that I am required to run on a VM hosted in my machine. The Banana UI needs Solr both to operate and to store data collections that can be visualised. In this case, the Solr instance within the VM uses ports 8983 and 7574. The Banana UI also has a unique security feature developed by my employer (which I have to familirise myself with) that does not yet allow me to host the entire contents of the VM on my server which also hosts the Cloudera CDH instance.
Due to the above arrangement, it is not possible for me to startup Cloudera CDH (though Cloudera Manager is still running). This has proven to be a difficulty since I need to use Solr and also use the various tools (Spark etc).
Could you tell me how I could change the default port for Cloudera Solr (from 8983 to an appropriate port) and for Cloudera Zookeeper (from 2181 to an appropriate port)? I hma not very familiar with the layout or intricacies of the Cloudera Manager on CDH 5.16.1. Thank you.
... View more
05-09-2019
06:47 PM
My apologies for the delay. I was occupied with an urgent, unrelated project. I have used your advice and Sqoop works fine now. Thank you.
... View more
03-10-2019
09:10 PM
I run a PC with Ubuntu 16.04 LTS. I have installed CDH 5.16.1 (installation by parcels) and had accidentally allowed it to install Oracle Java 7 into my default Java directory (/usr/lib/jvm). Prior to my CDH installation I had installed OpenJDK 8 on my PC in the same location.
I then downloaded the JDBC for Postgresql for Java 8 and higher postgresql-42.2.5.jar and placed it in the recommended directory (/var/lib/sqoop).
In attempting to connect to my database and list the tables present, I got the following error:
Exception in thread "main" java.lang.UnsupportedClassVersionError: org/postgresql/Driver : Unsupported major.minor version 52.0 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190) at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:872) at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52) at org.apache.sqoop.manager.CatalogQueryManager.listTables(CatalogQueryManager.java:102) at org.apache.sqoop.tool.ListTablesTool.run(ListTablesTool.java:49) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
I understand that it points to a conflict in the JDK versions that are present in my PC. I then replaced the JDBC with postgresql-42.2.5.jre7.jar but still received the same error.
I then changed the Java Home Directory on Cloudera Manager to that of my OpenJDK 8, followed up with replacing the JDBC once again with postgresql-42.2.5.jar in the same location, but the error remains.
I have checked if it were possible that the error may be with Sqoop, and have removed the JDBC driver from /var/lib/sqoop, but I receive the same error.
Please help.
... View more
Labels:
03-04-2019
12:28 AM
I have completed the installation of CDH 5.16.1 on Ubuntu 16.04 LTS with OpenJDK 8. I am now in the process of testing the different components to ensure each works as predicted.
In my attempt to test Sqoop using the Cloudera Hadoop Tutorial, I ran the following command:
sqoop import-all-tables \
-m {{cluster_data.worker_node_hostname.length}} \
--connect jdbc:mysql://{{cluster_data.manager_node_hostname}}:3306/retail_db \
--username=retail_dba \
--password=cloudera \
--compression-codec=snappy \
--as-parquetfile \
--warehouse-dir=/user/hive/warehouse \
--hive-import
but received the following error:
Could not load db driver class: com.mysql.jdbc.Driver
In my search for a solution, I downloaded and placed the mysql conector in the following folders according to the advice given:
/var/lib/sqoop
/opt/cloudera/parcels/CDH-5.16.1-1.cdh5.16.1.p0.3/lib/sqoop
I am still getting the error above. Please help.
... View more