Numpy array multiplication along specific axis

I want to multiply two numpy arrays of different shapes along a specific axes without swapping them manually or adding "dummy" dimensions if possible.

For two arrays, let's say A of shape (3,2,2) and B of shape (3) i want to get the prodcut of the form np.swapaxes(np.swapaxes(A,0,2)*B),0,2) or alternatively A*B[:,None,None] However for larger arrays both methods seem to be adding noticeable calculation time to the process compared with initializing A with a shape of (2,2,3) to begin with. I can not do that though because i need A to be of the (3,2,2) shape for a later multiplication operation.

Can I avoid the double swapping of axes or the adding of "None" dimensions to improve calculation time?

EDIT: Hi and thanks for all the answers, here is a more detailed example of what i am trying to compute exactly:

import numpy as np
from time import time

A = np.random.rand(100,100,2,20,100)
B = np.random.rand(100)
C = np.random.rand(2,20,100)
D = np.random.rand(100,100,2,20,100)

start = time()
for k in range(98,-1,-1):
    A[k,:,:,:,:] = np.pad(C[:,:,k:],((0,0),(0,0),(0,k)), 
    constant_values=1)*np.pad(B[k:],((0,k)), 
    constant_values=1)*A[k+1,:,:,:,:] + D[k,:,:,:,:]
end = time()
print('straight forward product:    ', end-start)

start = time()
for k in range(98,-1,-1):
    A[k,:,:,:,:] = np.pad(B[k:],((0,k)), constant_values=1) 
    [:,None,None,None]*A[k,:,:,:,:]*np.pad(C[:,:,k:],((0,0),(0,0), 
    (0,k)), constant_values=1)*A[k+1,:,:,:,:] + D[k,:,:,:,:]
end = time()
print('None Dummy-Dimensions:   ', end-start)

the second method takes almost twice the time for me, but I think now it is actually due to the order of the multiplication operators rather than what axis is multiplied to what axis.

edit2: I am aware that the results of those two calculations are not the same. In the first case it was just that B and C both had to be multiplied to A along the 5th axis, so order was not important and I just happened to pick this one. In the next case B needed to be multiplied to A along the 2nd axis so i changed the multiplication order, not thinking it would do anything performance wise, i suspected the lost performance comes from the B[:,None,None,None] notation.

Solution

In my quick tests, multiplying along the third axis is actually faster than along the last.

In [135]: A=np.ones((2,2,100,100,100)); B=np.ones((100))

In [136]: timeit A*B
17.9 ms ± 333 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [137]: timeit A*B[:,None,None]
16 ms ± 1.75 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

einsum

The other answer suggested einsum. I'm not seeing any improvement

In [138]: timeit np.einsum('ijklm,k->ijklm',A,B)
22.7 ms ± 576 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [139]: timeit np.einsum('ijklm,m->ijklm',A,B)
22.8 ms ± 2.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)