Gradient descent reduces the error produced by a Cost Function in order to determine the best parameters for a hypothesis.
See also Normal Equation for faster, but less efficient at higher number of features.
Repeat these steps for a set number of repetitions, or until the reduction between runs becomes negligible.
The equation before calculating the partial derivative:
$$ \text{Repeat} \left\{ \theta_j := \theta_j - \alpha\frac{\partial}{\partial\theta_j}J(\theta) \right\} $$
Once you calculate the partial derivative, the equation is:
$$ \text{Repeat} \left\{ \theta_j := \theta_j - \alpha\frac{1}{m}\sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} \right\} $$
These should be repeated simultaneously for all $\theta$. Keep in mind that $x_0$ is assumed to be equal to 1, so for $\theta_0$ you can remove $x_j^{(i)}$.
To run gradient descent in a vectorized way you first need to prep a matrix $X$ with a column for each training set. You will also need a vector $y$ with all of the output variables.
Given a matrix $X$ where each column is an item of a training set and each row contains the variables $x_0 \rightarrow x_n$ (where $x_0=1$), you can calculate the new $\theta$ values using the following formula (which uses the Linear Hypothesis).
$$ \theta = \theta - \frac{\alpha}{m} X^T(X\theta - y) $$
Or the following using the Logistic Hypothesis :
$$ \theta = \theta - \frac{\alpha}{m} X^T(g(X\theta) - y) $$