Member since
10-25-2019
15
Posts
7
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1190 | 05-21-2020 06:39 AM | |
1169 | 05-21-2020 05:05 AM | |
6715 | 05-17-2020 11:58 AM | |
5699 | 05-05-2020 01:22 AM |
09-17-2020
11:33 PM
use hadoop as root password ( you may be asked to change it )
... View more
07-22-2020
09:24 PM
ambari files view (same PB for Hue File browser) is not the good tool if you want to upload (very) big files. it's running in JVMs, and uploading big files will use more memory (you will hit maximum availaible mem very quickly and cause perfs issues to other users while you are uploading ) BTW it's possible to add other ambari server views to increase perfs (it may be dedicated to some teams/projects ) for very big files prefer Cli tools : scp to EDGE NODE with a big FS + hdfs dfs -put. or distcp or use an object storage accessible from you hadoop cluster with a good network bandwidth
... View more
07-22-2020
08:56 PM
increasing VDI size doesn't mean partitions/filesystems inside Guest VM will be increased you will need to resize partitions/PVs/VGs/LVs too and possibly move some partitions ( ex disk storage : |///--> / FS ///|.... Swap ....|--> free space ///| here you need to swapoff/delete swap, resize / FS , recreate swap and swapon etc...) use gparted or CLI tools (pvresize, lvresize for LV ... resize2fs for FS ) to increase filesystems it is important you understand how to partition disks in linux. you may loose data if you're not sure what you do example of what you may need to do if using LVM sudo pvresize /dev/sdb // if you VDI is seen as a second disk inside your VM (here it's used as LVM PV) sudo lvresize -l +100%FREE /dev/mapper/datavg-data // resize LV to maximum availaible sudo e2fsck -f /dev/mapper/datavg-data sudo resize2fs /dev/mapper/datavg-data // resize the FS inside the LV to maximum availaible
... View more
05-21-2020
06:39 AM
2 Kudos
apt-get installation doesn't seem to install any bitcoin package same thing via python package manager (pip ...) it's probably a mistake in the dockerfile anyway the docker image is old, and the github repo doesn't seem to exist any more
... View more
05-21-2020
05:05 AM
1 Kudo
cloudera CDP is based on a Cloudera Runtime Version + a Cloudera Manager version that is compatible (with that cloudera runtime) https://docs.cloudera.com/cdpdc/7.0/release-guide/topics/cdpdc-release-notes-links.html at time of writing: CDP DC 1.0 uses Cloudera runtime 7.0.3 and cloudera manager 7.0.3 the Cloudera Runtime Component Version is aimed to keep a set of consistent hadoop components versions that can work together. it will also make more easy to migrate from CDH/HDP if services/components versions are the same or close to the runtime components versions of CDP if i'm not wrong, there is actually only one CDP DC (1.0) version with minor updates of CM and cloudera components runtime versions https://docs.cloudera.com/cloudera-manager/7.0.3/release-notes/topics/cm-release-notes.html https://docs.cloudera.com/runtime/7.0.3/release-notes/topics/rt-runtime-component-versions.html
... View more
05-17-2020
11:58 AM
1 Kudo
in your nifi app log if you have "out of memory" errors it may indicate that you need to add more memory (or a possible memory leak) increasing heap too much will cause long GC (especially with Old Generation Objects) you may need to use a more efficient GC if you use large heap size in bootstrap.conf uncomment line : #java.arg.13=-XX:+UseG1GC as you know, some processors are memory intensive queues between processors use memory, ... until nifi.queue.swap.threshold is reached (default 20000 -> if reached nifi will push to disk remaining DFs ) -> too much swapping in queues will realy affect performances. (memory + disk) there are many points that can help you to determine where the PB may be for ex. (processor 1) --> queue --> (processor 2) - check queue occupation : if always full for some processors parallelize more in following processors (especially if processor a re not memory intensive, and for ex if the processor is cpu intenssive and look likes it's not ingestings quickly from queue adding more threads will help to relieve the queue ). also ading back-pressus/control-rate before. in tha case just checking "in" and "out" of processor 2 will confirm that (high IN and low OUT), increase nb of threads -> (threadsScheduling>Concurrent Tasks), sometimes tuning "Run Schedule" and "Run Duration" may also help depending of ingres DFs nature/size and processor type and how parallelization is impacted take a look at "Total queued data" in the status bar (under components bar on top of the UI) keep in mind that processors may be I/O or CPU or memory intensive or all, and parallelizing more to reduce backlogged data may solve memory PBs but it will cause more threads overuse. adding more NIFI nodes will remain the ultimate solution to increase resources to increase performances, you may also want to take a look at nifi.provenance.repository.index.threads nifi.provenance.repository.query.threads nifi.provenance.repository.index.shard.size using different filesystems (with good IOPS) for directories for provenance repository, content repository, flowfile repository and more
... View more
05-13-2020
04:56 AM
your PB may be caused by caused by several reasons firstly i think 1024 is not enough, you should increase it opened files may be increasing day after day ( an application may stream more data from/into splitted files) a spark application may also import/open more libraries today etc... please check opened file by the user (that runs spark jobs) to find the possible cause lsof -u myUser ( | wc -l ... ) check lsof (lsof +D directory) , and find how many opened files per job and how many jobs are runing etc...
... View more
05-13-2020
04:07 AM
1 Kudo
the point is to correctly set the fqdn no matter how it's done as long as it is correctly configured (it needs to be configured on all hosts in the cluster) ( /etc/sysconfig/network , network manager comands, /etc/hosts (avoiding 127.0.0.1, ::1 <-> fqdn etc... or via another admin tool ) some key services need a correctly set fqdn (kerberos REALM trust for ex will be based on domain thus localy get fqdn . geting host instead of host.domain.etc may cause issues in many hadoop services)
... View more
05-09-2020
12:21 PM
could you confirm you can do queries on any database first and check ambari DB is restored sudo -u postgres psql \l+ ... if ok could you check ambari jdbc connector is updatet ? (using adequate connector path, and if needed download it and install it) ambari-server setup --jdbc-db=postgres --jdbc-driver=/usr/share/java/postgresql96-jdbc.jar in case new jdbc version in installed in same location make sure old one is not already in use ( lsof kill or restart host/vm )
... View more
05-09-2020
02:43 AM
after you renamed old installation pg DIR and created new one using initDb (in old installation DIR) a pg_upgrade step needs to be done (https://www.postgresql.org/docs/9.6/pgupgrade.html) for ex ( with /usr/pgsql-9.6 as your new pgsql9.6 bin DIR installation path and /usr/bin/ as olg binDIR) /usr/pgsql-9.6/bin/pg_upgrade --old-datadir /var/lib/pgsql/data/ --new-datadir /var/lib/pgsql/9.6/data/ --old-bindir /usr/bin/ --new-bindir /usr/pgsql-9.6/bin/ restart postgres after pg_upgrade is done if (psql --version) gives old version update PATH also systemd unit files etc... and pgsql_profile (/var/lib/pgsql/.pgsql_profile) / bash_profile /etc/ files etc...
... View more