About jeroenr

jeroenr · ‎08-20-2020

Phoenix parcel need to be removed from my cluster, but I've no idea if someone actually created a phoenix table using sqlline The Cloudera documentation just says: Before you uninstall the Phoenix parcel, you must disable all your Phoenix system tables, and tables created using Phoenix. How can I find all tables created using Phoenix? Either using hbase shell or sqlline

jeroenr · ‎10-17-2019

thanks, that put me in the right direction for completeness, just setting SPARK_HOME was not sufficient, it was missing py4j setting PYTHONPATH fixed that issue export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2 export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH Now pyspark shows: version 2.3.0.cloudera3

jeroenr · ‎10-16-2019

We're on CDM 5.14.2 with CDH 5.13.3 Both Spark 1.6 and Spark 2.3.3 are installed (some apps are still using Spark 1.6, can't remove it yet) Now when I'm starting pyspark with config file for Spark2, it still runs pyspark with Spark 1.6 e.g. pyspark --properties-file /etc/spark2/conf/spark-defaults.conf it shows after the ASCII Spark logo: version 1.6.0 using verbose mode it shows the paths are pointing to Spark 2 spark.yarn.jars,local:/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/jars/* Why is pyspark still referring to Spark 1.6 ? How can I force it to use spark 2.3.3 ?

jeroenr · ‎04-01-2019

note that on CDH 5.9.1 there's also a hardcoded size limit in Hive, you also need to upgrade to later CDH version Hive on CDH 5.13.3 is not checking for hardcoded size limit of 4000 characters

jeroenr · ‎04-01-2019

Oracle varchar2 has a maximum size of 4000 characters On Oracle 12c a new parameter is introduced: max_string_size the default value is "standard", which limits varchar2 to 4000 characters It can be changed to "extended", which would allow varchar2 to be 32k, but that affects the whole instance and can't be reverted back to "standard". in case of any issues you're stuck with this parameter This is a new feature, there's a change for bugs or side effects using this feature Besides that, latest Apache Hive is using CLOB instead of varchar2(4000), so that's my preferred approach

jeroenr · ‎03-25-2019

Hi On CDH 5.9.1 we're having an issue with an extranel table definition in Hive to an Hbase table with a lot of columns The hbase.columns.mapping in the external table definition is longer than 4000 characters and causes an error: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Put request failed : INSERT INTO SERDE_PARAMS (PARAM_VALUE,SERDE_ID,PARAM_KEY) VALUES (?,?,?) ) Hive Metastore is using an Oracle backend to store table definitions. On Oracle 12c the table is created with datatype varchar2, which is limited by default to 4000 characters. There seem to be two solutions: 1. change max_string_size on Oracle from standard to extended 2. change datatype from varchar2 to clob (using add column, copy data, drop old column, rename new column) My preference would be to use option 2, because this only applies to the affected tables, while the max_string_size parameter affects the whole instance and can't be rolled back in case of issues. But the clob datatype only works with regular selects, it can't handle functions like to_char, substr, instr Tables affected: - COLUMNS_V2 - TABLE_PARAMS - SERDE_PARAMS - SD_PARAMS Any reason to avoid changing the datatype from varchar2 to clob for these Hive Metastore tables in the Oracle backend? thanks

jeroenr · ‎06-07-2017

thanks, it was the http:// in the proxy server config

jeroenr · ‎06-07-2017

also tested the repo using the actual ip address instead of xxx.net the error message shows connectException: http://1.2.3.4/apps/cloudera/kafka/manifest.json got the same error the same test using curl was running fine again I've been searching other cases and found some references to issues when there's an Oracle db used for CDM I've got the same setup with an Oracle db, any specific tables I could check for any issues?

jeroenr · ‎06-07-2017

xxx.net is the repo server I've configured the proxy server and the proxy port via Administration > Settings > Network in CM proxy server: http://abcd.net port: 8080 then checking for parcels the error message shows connectException: http://xxx.net/apps/cloudera/kafka/manifest.json setting the same proxy on the cmdline on the same host: export http_proxy=http://abcd.net:8080 curl http://xxx.net/apps/cloudera/kafka/manifest.json the manual test runs fine

jeroenr · ‎06-07-2017

Hi I've configured a proxy server for the parcel download from a remote repo I've restarted the cloudera-scm-server and cloudera-scm-agent processes, but still getting the same error note: I masked the actual hostname by xxx.net if I use curl from the same host, using the same http proxy, i can download the manifest file So I guess the issue must be somewhere in CDM any suggestion welcome 2017-06-07 11:01:06,913 ERROR ParcelUpdateService:com.cloudera.parcel.components.ParcelDownloaderImpl: (19 skipped) Unable to retrieve remote parcel repository manifest java.util.concurrent.ExecutionException: java.net.ConnectException: http://xxx.net/apps/cloudera/kafka/manifest.json at com.ning.http.client.providers.netty.NettyResponseFuture.abort(NettyResponseFuture.java:297) at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:104) at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:399) at org.jboss.netty.channel.DefaultChannelFuture.addListener(DefaultChannelFuture.java:145) at com.ning.http.client.providers.netty.NettyAsyncHttpProvider.doConnect(NettyAsyncHttpProvider.java:1041) at com.ning.http.client.providers.netty.NettyAsyncHttpProvider.execute(NettyAsyncHttpProvider.java:858) at com.ning.http.client.AsyncHttpClient.executeRequest(AsyncHttpClient.java:512) at com.ning.http.client.AsyncHttpClient$BoundRequestBuilder.execute(AsyncHttpClient.java:234) at com.cloudera.parcel.components.ParcelDownloaderImpl.getRepositoryInfoFuture(ParcelDownloaderImpl.java:534) at com.cloudera.parcel.components.ParcelDownloaderImpl.getRepositoryInfo(ParcelDownloaderImpl.java:492) at com.cloudera.parcel.components.ParcelDownloaderImpl.syncRemoteRepos(ParcelDownloaderImpl.java:344) at com.cloudera.parcel.components.ParcelDownloaderImpl$1.run(ParcelDownloaderImpl.java:416) at com.cloudera.parcel.components.ParcelDownloaderImpl$1.run(ParcelDownloaderImpl.java:411) at com.cloudera.cmf.persist.ReadWriteDatabaseTaskCallable.call(ReadWriteDatabaseTaskCallable.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.ConnectException: http://xxx.net/apps/cloudera/kafka/manifest.json at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:100) ... 16 more Caused by: java.nio.channels.UnresolvedAddressException at sun.nio.ch.Net.checkAddress(Net.java:101) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622)

Online	Offline
Last Visited	‎11-20-2024 04:43 AM

Member Since	‎06-16-2016 01:17 AM
Last Visited	‎11-20-2024 04:43 AM
Posts	22

Cloudera Community

Re: pyspark using Spark 2.3

Re: JAVA_HOME not set for clients

list Phoenix tables in hbase shell or sqlline

Re: pyspark using Spark 2.3

pyspark using Spark 2.3

Re: change varchar2 to clob on Oracle for Hive Met...

Re: change varchar2 to clob on Oracle for Hive Met...

change varchar2 to clob on Oracle for Hive Metasto...

Re: parcel download from repo via proxy failing

Re: parcel download from repo via proxy failing

Re: parcel download from repo via proxy failing

parcel download from repo via proxy failing