then why not write in assembly?
then why not write in assembly?
Here is my try:
Runs in 6 seconds here. The bottleneck seems to be the I/O.
PHP Code:
/**
* Author: Lucas Hermann Negri <kkndrox@gmail.com>
*
* Compile with:
*
* gcc -O3 Matrix.c -o Matrix -fopenmp
*
* Run with:
*
* export OMP_NUM_THREADS=2 (or the number of the cores)
* time ./Matrix (The correct time is the "real", not the "user")
*
* Local test: Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz ~ 6.388 seconds
*/
#include <stdio.h>
#include <string.h>
#include <omp.h>
#define MAXN 1000
double matrixA[MAXN][MAXN], matrixB[MAXN][MAXN], matrixC[MAXN][MAXN];
int i, j, k;
void mult()
{
bzero(matrixC, sizeof(matrixC)); /* Initialize the result matrix */
/* O(n^3) */
#pragma omp parallel for
for(i = 0; i < MAXN; ++i)
for(j = 0; j < MAXN; ++j)
for (k = 0; k < MAXN; ++k)
matrixC[i][j] += matrixA[i][k] * matrixB[k][j];
}
void read()
{
/* Read the multiplicand matrix */
FILE* multiplicand = fopen("multiplicand", "r");
for(i = 0; i < MAXN; ++i)
for(j = 0; j < MAXN; ++j)
fscanf(multiplicand, "%lf", &matrixA[i][j]);
fclose(multiplicand);
/* Read the multiplier matrix */
FILE* multiplier = fopen("multiplier", "r");
for(i = 0; i < MAXN; ++i)
for(j = 0; j < MAXN; ++j)
fscanf(multiplier, "%lf", &matrixB[i][j]);
fclose(multiplier);
}
void print()
{
FILE* res = fopen("result", "w");
/* Prints the result matrix */
for(i = 0; i < MAXN; ++i)
for(j = 0; j < MAXN; ++j)
fprintf(res, "%lf\n", matrixC[i][j]);
fclose(res);
}
int main(int argc, char* argv[])
{
read();
mult();
print();
return 0;
}
Last edited by kknd; August 9th, 2008 at 07:40 PM. Reason: Optimization
I think you're supposed to output it to a text file.
EDIT: I've updated my code.
Last edited by Lster; August 9th, 2008 at 07:24 PM.
I think you guys could make some speed imporvements by flattaning your 2d arrays. As with a 2d array each write requires an add and multiply. Reading is even slower requiring divide and mod.
I thought that array flattening would be sometng the optimiser would do for you but apparently its not. In my test program the 2d array takes around three times longer than the flat array.
When compiled with -O2 or -03 the flat array surprisingly turns out around 4 times faster.
Code:#include <stdio.h> #include <stdlib.h> #include <time.h> #define SIZE 1000 int a1[SIZE][SIZE]; int a2[SIZE][SIZE]; int b1[SIZE*SIZE]; int b2[SIZE*SIZE]; int main() { srand(time(NULL)); int x, y, z, r; unsigned t = clock(); for(z=0; z<50; z++) for(y=0; y<SIZE; y++) for(x=0; x<SIZE; x++) { a1[x][y] = 100; a2[x][y] = 100; r = a1[x][y]; r = a2[x][y]; } printf("time: %lu\n", clock()-t); int all = SIZE * SIZE; t = clock(); for(z=0; z<50; z++) for(x=0; x<all; x++) { b1[x]=100; b2[x]=100; r = b1[x]; r = b2[x]; } printf("time: %lu\n", clock()-t); return 0; }
For those who have provided running times of their code: It would be interesting to see a breakdown of the time into input, processing and output times.
I agree. It seems to be turning out that it is a large part IO. I would like to see this not be the case since that isnt the goal of the challenge. I would propose actually increasing N from 1000 to something more like 10000. While file IO will increase by ~10 times the matrix computation will increase by ~1000 assuming you use a O(n^3) algorithm.
Desktop: Q6600 OC: 343 x 9, 4 GB RAM, 8600 GTS Twinview (22",17"), 1.5 TB RAID 5
Laptop: Lenovo T61 T7300 @ 2 GHz, 2GB RAM, Nvidia 140M Quadro, 160 GB harddrive
Remember to mark posts as [SOLVED] when your problem is resolved
For this challenge do we just have to read the txt files, OR does the actual program have to open the tar bz2 file itself then read the files?
to answer some questions:
the matrices in question are 1000 by 1000 and trust me, because of buffering, the file I/O is not very noticable (look at the system time output by time, that is the I/O)
as for reading files, the input is the two text files, you have to read those, I compressed them because each is like 8.6MB. so the program just needs to read the text files and write output to the third file.
yes, I am running a 64bit version of ubuntu.
tam, please understand that the code reads 2 million floats, which all together will probably be done in under a second (internal hard drives have at least 40MB/sec sequential read speed).
in this case 2 1000x1000 matrices produces 1 billion multiplication operations which are done on floats, so they are not as fast as integers.
I have code that multiplies this and trust me, the I/O is insignificant.
I am infallible, you should know that by now.
"My favorite language is call STAR. It's extremely concise. It has exactly one verb '*', which does exactly what I want at the moment." --Larry Wall
(02:15:31 PM) ***TimToady and snake oil go way back...
42 lines of Perl - SHI - Home Site
well... when I get better and learn C I will take a poke at these challenges... for now... I will stick with the beginners challenges.
Bookmarks