- The number of threads limited at 32.

- The memory bandwidth 30 times lower than the computation capability.

The straightforward solution is to use the GPU for what it is essentially designed; "Compute a large array of independant floating point values with exactly the same code, the same instruction excuted on the same time." on many simplied processors. In this way, we will able to run the 512 cores.

The simplest way is to transform the terms (string of alphanumeric characters) in values. First step: A hash routine using a parallel algorithm [1] will perform this transformation. Next step: The hash values of terms in the dictionary are compared with the hash value of the searched term. Final step: All the product wi*xi will be computed. Both comparisons and multiplications are performed in parallel.

1 - First step

__Example of MD6 routine, excerpt from Faster file matching using GPGPUs [1]:__

dim3 grid(num_blocks,1,1);

dim3 threads(16,1,1);

md6_compress<<

cudaThreadSynchronize();

int tx = threadIdx.x % num_threads;

int ty = threadIdx.y;

step = tx + ty;

index = 89;

loop_body(A,rsArgs[tx],lsArgs[tx],step,S,index);

if ( tx == 15 ) //last thread

{

N += 16;

S = ((S << 1) ^ (S >> (W-1)) ^ (S & Smask));

}

_syncthreads();

/* MD6 compression routine */

__device__

void loop_body(md6_word* A,int rs,int ls,int step, unsigned long long int x, int i)

{

x ^= A[i+step-t5];

x ^= A[i+step-t0];

x ^= ( A[i+step-t1] & A[i+step-t2] );

x ^= ( A[i+step-t3] & A[i+step-t4] );

x ^= ( x >>rs );

A[i+step] = x ^ ( x << ls );

}

.... To be continued ....

[1] Faster file matching using GPGPUs, Deephan Mohan and John Cavazos, Department of Computer and Information sciences, University of Deleware, June27 2010

## No comments:

## Post a Comment