How to speed up python/numpy code?

Is there some way to avoid using for loops to speed up the code below:
All arrays are numpy, I can’t figure out a way to vectorize this operation

rows, cols = Y.shape
M = np.zeros((cols, cols))
for i in range(cols):
    for j in range(i, cols):
        M[i, j] = sum(Y[:, i] == Y[:, j])/rows
        M[j, i] = M[i, j]

Implementation of this expression:

M(i , j) = \frac{1}{n}\sum_{p=1}^{n}[[y_{pi} == y_{pj}]]

PS: After writing this post, I replaced sum with np.sum and the speed up is already significant. But I would still appreciate inputs specially on ways to eliminate for loop

Hi @sanjayk,
From the nested loop range, it seems you’re iterating the loop through 0 to i, but in the equation below, the summation term seems to be 1 to n(rows).
Can you recheck it once again?

could you mention which module is this expression a part of

If Y is n x m matrix, then M is m x m matrix. The way M is defined, M(i, j) = M(j, i), a symmetric matrix . Iteration is over columns so. Y[:, i] will basically be a nx1 vector. So as I understand, sum(Y[:, i] == Y[:, j]) will implicitly iterate over n (without writing n anywhere).

Finally since M is symmetric, I calculated only the upper triangle (thats why the 2nd for loop starts with i) and copied it to lower by doing M(j, i) = M(i, j).
Did I miss something?

This is not from any module, general python/numpy programming question. There is an expression (mentioned in the post) and how to implement it efficiently (Matrix can have several 100s or more rows and columns)

Okay, understood.
The fact that we’re tabulating the values, into an n*n matrix, I’m not sure if these loops can be avoided.