How are optimization problems solved?

To me it was always some kind of mystery how to solve optimization problems. After all, the only way we can compute an unknown is by reducing it into a linear equation; in the context of minimization, to get to a linear equation that gives the minimum is always done through taking some derivative. I didn't know anything beyond this to actually do a computation -- without computing derivatives, I thought that we would be completely lost. So how come can we find the minimum in various sorts of problems including those with non-differentiable objective/constraint functions or yet worse with inequality constraints?

It turns out that we somehow convert problems to differentiable ones. For some problems, the possibility of this conversion can be difficult to see.

For convex optimization problems, the solution hierarchy goes like this:

$$ \begin{matrix} \text{Non-differentiable objective function with inequality constraints} \\ \downarrow \\ \text{Smooth objective function, inequality constraints} \\ \downarrow \\ \text{Smooth unconstrained} \\ \downarrow \\ \text{Quadratic unconstrained} \\ \downarrow \\ \text{Linear equations} \end{matrix} $$

That is, the only things that we know how to solve directly (w/o iterations) are quadratic, unconstrained functions. But if we more generally have a smooth and unconstrained function, then we can solve it iteratively -- by solving a succession of quadratically unconstrained optimization problems. Going higher in the hierarchy, we can solve problems that involve inequality constraints iteratively by solving at each iteration a problem with that involves a smooth and unconstrained objective function. At the top of the hierarchy we have problems with non-differentiable objective functions. We first convert the problem with the non-differentiable objective function to a problem with a differentiable function by adding additional inequality constraints to accurately represent the non-differentiable function. For example, if we want to minimize $$|x|$$ w.r.t. $$x$$ (assume unconstained), we can instead minimize $$t$$ w.r.t. $$x,t$$ subject to $$-t < x < t$$. The epigraph trick that is used here is a very common trick to convert problems with non-differentiable objective functions into problems with differentiable functions.