Unfortunately, it is not as simple as in the TD-DFT case.
(in ORCA you a specific TD-DFT gradient via the IROOT command)
For CASSCF, if you do a state average calculation, your orbitals are only optimized for the average of all states that were included in the averaging. That means, that each individual state is not at a variational minimum and the regular CASSCF gradient equation does not apply. In this case one has to solve the CP-CASSCF equations to the get the state specific gradients. This comes at a significant extra cost. We are working on that. Hence, for the time being you cannot get what you want unless you manage to converge the CASSCF for each state individually.