[Advanced] Programming Challenge: 1

**ghostdog74** · August 9th, 2008

then why not write in assembly?

**kknd** · August 9th, 2008

Here is my try:

Runs in 6 seconds here. The bottleneck seems to be the I/O.

PHP Code:


/**

 * Author: Lucas Hermann Negri <kkndrox@gmail.com>

 * 

 * Compile with: 

 * 

 * gcc -O3 Matrix.c -o Matrix -fopenmp

 * 

 * Run with: 

 * 

 * export OMP_NUM_THREADS=2 (or the number of the cores)

 * time ./Matrix (The correct time is the "real", not the "user")

 * 

 * Local test: Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz ~ 6.388 seconds

 */



#include <stdio.h>

#include <string.h>

#include <omp.h>



#define MAXN 1000



double matrixA[MAXN][MAXN], matrixB[MAXN][MAXN], matrixC[MAXN][MAXN];

int i, j, k;



void mult()

{

    bzero(matrixC, sizeof(matrixC)); /* Initialize the result matrix */

    

    /* O(n^3) */

    #pragma omp parallel for

    for(i = 0; i < MAXN; ++i)

           for(j = 0; j < MAXN; ++j)

            for (k = 0; k < MAXN; ++k)

                matrixC[i][j] += matrixA[i][k] * matrixB[k][j];

}



void read()

{

    /* Read the multiplicand matrix */

    FILE* multiplicand = fopen("multiplicand", "r");

    

    for(i = 0; i < MAXN; ++i)

        for(j = 0; j < MAXN; ++j)

            fscanf(multiplicand, "%lf", &matrixA[i][j]);



    fclose(multiplicand);



    /* Read the multiplier matrix */

    FILE* multiplier   = fopen("multiplier", "r");



    for(i = 0; i < MAXN; ++i)

        for(j = 0; j < MAXN; ++j)

            fscanf(multiplier, "%lf", &matrixB[i][j]);

            

    fclose(multiplier);

}



void print()

{

    FILE* res = fopen("result", "w");

    

    /* Prints the result matrix */

    for(i = 0; i < MAXN; ++i)

        for(j = 0; j < MAXN; ++j)

            fprintf(res, "%lf\n", matrixC[i][j]);

    

    fclose(res);

}



int main(int argc, char* argv[])

{

    read();

    mult();

    print();

    

    return 0;

}

**Lster** · August 9th, 2008

I think you're supposed to output it to a text file.

EDIT: I've updated my code.

**kknd** · August 9th, 2008

Thanks, I've changed it now.

**mike_g** · August 9th, 2008

I think you guys could make some speed imporvements by flattaning your 2d arrays. As with a 2d array each write requires an add and multiply. Reading is even slower requiring divide and mod.

I thought that array flattening would be sometng the optimiser would do for you but apparently its not. In my test program the 2d array takes around three times longer than the flat array.

When compiled with -O2 or -03 the flat array surprisingly turns out around 4 times faster.

Code:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define SIZE 1000

int a1[SIZE][SIZE];
int a2[SIZE][SIZE];

int b1[SIZE*SIZE];
int b2[SIZE*SIZE];


int main()
{
	srand(time(NULL));
	int x, y, z, r;

	unsigned t = clock();
	for(z=0; z<50; z++)
	for(y=0; y<SIZE; y++)
	for(x=0; x<SIZE; x++)
	{
		a1[x][y] = 100;
		a2[x][y] = 100;
		r = a1[x][y];
		r = a2[x][y];
	}
	printf("time: %lu\n", clock()-t);
	
	int all = SIZE * SIZE;
	t = clock();
	for(z=0; z<50; z++)
	for(x=0; x<all; x++)
	{
		b1[x]=100;
		b2[x]=100;
		r = b1[x];
		r = b2[x];
	}
	printf("time: %lu\n", clock()-t);	

	return 0;
}

WW · August 10th, 2008

For those who have provided running times of their code: It would be interesting to see a breakdown of the time into input, processing and output times.

**tamoneya** · August 10th, 2008

Originally Posted by WW

For those who have provided running times of their code: It would be interesting to see a breakdown of the time into input, processing and output times.

I agree. It seems to be turning out that it is a large part IO. I would like to see this not be the case since that isnt the goal of the challenge. I would propose actually increasing N from 1000 to something more like 10000. While file IO will increase by ~10 times the matrix computation will increase by ~1000 assuming you use a O(n^3) algorithm.

**Buttink** · August 10th, 2008

For this challenge do we just have to read the txt files, OR does the actual program have to open the tar bz2 file itself then read the files?

**slavik** · August 10th, 2008

to answer some questions:

the matrices in question are 1000 by 1000 and trust me, because of buffering, the file I/O is not very noticable (look at the system time output by time, that is the I/O)

as for reading files, the input is the two text files, you have to read those, I compressed them because each is like 8.6MB. so the program just needs to read the text files and write output to the third file.

yes, I am running a 64bit version of ubuntu.

tam, please understand that the code reads 2 million floats, which all together will probably be done in under a second (internal hard drives have at least 40MB/sec sequential read speed).

in this case 2 1000x1000 matrices produces 1 billion multiplication operations which are done on floats, so they are not as fast as integers.

I have code that multiplies this and trust me, the I/O is insignificant.

**cprofitt** · August 10th, 2008

well... when I get better and learn C I will take a poke at these challenges... for now... I will stick with the beginners challenges.

Thread: [Advanced] Programming Challenge: 1

Thread Tools

Display

Re: [Advanced] Programming Challenge: 1

Re: [Advanced] Programming Challenge: 1

Re: [Advanced] Programming Challenge: 1

Re: [Advanced] Programming Challenge: 1

Re: [Advanced] Programming Challenge: 1

Re: [Advanced] Programming Challenge: 1

Re: [Advanced] Programming Challenge: 1

Re: [Advanced] Programming Challenge: 1

Re: [Advanced] Programming Challenge: 1

Re: [Advanced] Programming Challenge: 1

Bookmarks

Bookmarks

Posting Permissions