About rachid-berkane

rachid-berkane · ‎09-17-2020

use hadoop as root password ( you may be asked to change it )

rachid-berkane · ‎07-22-2020

ambari files view (same PB for Hue File browser) is not the good tool if you want to upload (very) big files. it's running in JVMs, and uploading big files will use more memory (you will hit maximum availaible mem very quickly and cause perfs issues to other users while you are uploading ) BTW it's possible to add other ambari server views to increase perfs (it may be dedicated to some teams/projects ) for very big files prefer Cli tools : scp to EDGE NODE with a big FS + hdfs dfs -put. or distcp or use an object storage accessible from you hadoop cluster with a good network bandwidth

rachid-berkane · ‎07-22-2020

increasing VDI size doesn't mean partitions/filesystems inside Guest VM will be increased you will need to resize partitions/PVs/VGs/LVs too and possibly move some partitions ( ex disk storage : |///--> / FS ///|.... Swap ....|--> free space ///| here you need to swapoff/delete swap, resize / FS , recreate swap and swapon etc...) use gparted or CLI tools (pvresize, lvresize for LV ... resize2fs for FS ) to increase filesystems it is important you understand how to partition disks in linux. you may loose data if you're not sure what you do example of what you may need to do if using LVM sudo pvresize /dev/sdb // if you VDI is seen as a second disk inside your VM (here it's used as LVM PV) sudo lvresize -l +100%FREE /dev/mapper/datavg-data // resize LV to maximum availaible sudo e2fsck -f /dev/mapper/datavg-data sudo resize2fs /dev/mapper/datavg-data // resize the FS inside the LV to maximum availaible

rachid-berkane · ‎05-21-2020

apt-get installation doesn't seem to install any bitcoin package same thing via python package manager (pip ...) it's probably a mistake in the dockerfile anyway the docker image is old, and the github repo doesn't seem to exist any more

rachid-berkane · ‎05-21-2020

cloudera CDP is based on a Cloudera Runtime Version + a Cloudera Manager version that is compatible (with that cloudera runtime) https://docs.cloudera.com/cdpdc/7.0/release-guide/topics/cdpdc-release-notes-links.html at time of writing: CDP DC 1.0 uses Cloudera runtime 7.0.3 and cloudera manager 7.0.3 the Cloudera Runtime Component Version is aimed to keep a set of consistent hadoop components versions that can work together. it will also make more easy to migrate from CDH/HDP if services/components versions are the same or close to the runtime components versions of CDP if i'm not wrong, there is actually only one CDP DC (1.0) version with minor updates of CM and cloudera components runtime versions https://docs.cloudera.com/cloudera-manager/7.0.3/release-notes/topics/cm-release-notes.html https://docs.cloudera.com/runtime/7.0.3/release-notes/topics/rt-runtime-component-versions.html

rachid-berkane · ‎05-17-2020

in your nifi app log if you have "out of memory" errors it may indicate that you need to add more memory (or a possible memory leak) increasing heap too much will cause long GC (especially with Old Generation Objects) you may need to use a more efficient GC if you use large heap size in bootstrap.conf uncomment line : #java.arg.13=-XX:+UseG1GC as you know, some processors are memory intensive queues between processors use memory, ... until nifi.queue.swap.threshold is reached (default 20000 -> if reached nifi will push to disk remaining DFs ) -> too much swapping in queues will realy affect performances. (memory + disk) there are many points that can help you to determine where the PB may be for ex. (processor 1) --> queue --> (processor 2) - check queue occupation : if always full for some processors parallelize more in following processors (especially if processor a re not memory intensive, and for ex if the processor is cpu intenssive and look likes it's not ingestings quickly from queue adding more threads will help to relieve the queue ). also ading back-pressus/control-rate before. in tha case just checking "in" and "out" of processor 2 will confirm that (high IN and low OUT), increase nb of threads -> (threadsScheduling>Concurrent Tasks), sometimes tuning "Run Schedule" and "Run Duration" may also help depending of ingres DFs nature/size and processor type and how parallelization is impacted take a look at "Total queued data" in the status bar (under components bar on top of the UI) keep in mind that processors may be I/O or CPU or memory intensive or all, and parallelizing more to reduce backlogged data may solve memory PBs but it will cause more threads overuse. adding more NIFI nodes will remain the ultimate solution to increase resources to increase performances, you may also want to take a look at nifi.provenance.repository.index.threads nifi.provenance.repository.query.threads nifi.provenance.repository.index.shard.size using different filesystems (with good IOPS) for directories for provenance repository, content repository, flowfile repository and more

rachid-berkane · ‎05-13-2020

your PB may be caused by caused by several reasons firstly i think 1024 is not enough, you should increase it opened files may be increasing day after day ( an application may stream more data from/into splitted files) a spark application may also import/open more libraries today etc... please check opened file by the user (that runs spark jobs) to find the possible cause lsof -u myUser ( | wc -l ... ) check lsof (lsof +D directory) , and find how many opened files per job and how many jobs are runing etc...

rachid-berkane · ‎05-13-2020

the point is to correctly set the fqdn no matter how it's done as long as it is correctly configured (it needs to be configured on all hosts in the cluster) ( /etc/sysconfig/network , network manager comands, /etc/hosts (avoiding 127.0.0.1, ::1 <-> fqdn etc... or via another admin tool ) some key services need a correctly set fqdn (kerberos REALM trust for ex will be based on domain thus localy get fqdn . geting host instead of host.domain.etc may cause issues in many hadoop services)

rachid-berkane · ‎05-09-2020

could you confirm you can do queries on any database first and check ambari DB is restored sudo -u postgres psql \l+ ... if ok could you check ambari jdbc connector is updatet ? (using adequate connector path, and if needed download it and install it) ambari-server setup --jdbc-db=postgres --jdbc-driver=/usr/share/java/postgresql96-jdbc.jar in case new jdbc version in installed in same location make sure old one is not already in use ( lsof kill or restart host/vm )

rachid-berkane · ‎05-09-2020

after you renamed old installation pg DIR and created new one using initDb (in old installation DIR) a pg_upgrade step needs to be done (https://www.postgresql.org/docs/9.6/pgupgrade.html) for ex ( with /usr/pgsql-9.6 as your new pgsql9.6 bin DIR installation path and /usr/bin/ as olg binDIR) /usr/pgsql-9.6/bin/pg_upgrade --old-datadir /var/lib/pgsql/data/ --new-datadir /var/lib/pgsql/9.6/data/ --old-bindir /usr/bin/ --new-bindir /usr/pgsql-9.6/bin/ restart postgres after pg_upgrade is done if (psql --version) gives old version update PATH also systemd unit files etc... and pgsql_profile (/var/lib/pgsql/.pgsql_profile) / bash_profile /etc/ files etc...

Online	Offline
Last Visited	‎09-23-2020 10:31 AM

Member Since	‎10-25-2019 09:18 AM
Last Visited	‎09-23-2020 10:31 AM
Posts	15
Kudos received	7

Cloudera Community

Re: Does Impala or the framework have a bitcoin de...

Re: CDP releases

Re: Nifi cluster CPU Utilization / RAM always pea...

Re: How to can access the HDFS using my local linu...

Re: HDP 3.0.1 Sandbox has all services showing RED

Re: Unable to upload file into HDFS

Re: Unable to upload file into HDFS

Re: Does Impala or the framework have a bitcoin de...

Re: CDP releases

Re: Nifi cluster CPU Utilization / RAM always pea...

Re: Previously working spark jobs only now throwin...

Re: Security Recommendation Ambari 2.7.5 and RHEL ...

Re: Postgresql upgrade from 9.2 to 9.6 on Ambari 2...

Re: Postgresql upgrade from 9.2 to 9.6 on Ambari 2...