|
|
How large are the models, and what floating point precision are you using?
If the plan is to train a single model across two RTX 4090s, this may be severely limited by communication between the two cards via the PCIe bus, unless there are some specific features of your training pipeline that can mitigate this.
If you are dealing with smaller models, and are performing several training runs in parallel (e.g. for hyperparameter optimisation), then the two RTX 4090s will have vastly superior performance to that of a single RTX 5880 Ada.
Another advantage of using a single RTX 5880 Ada is a much lower power consumption (and heat generation). 285W vs. 900W! |
|