PDA

View Full Version : An attempt to make a compression program



WarMonkey
March 15th, 2008, 09:21 PM
What I was trying to do was make a program that changes "helllo" to "he3lo" or "ggggggggg" to "9g". I can get it to work by printing the out put on the screen; with I/O for files, it sort of works, but I have two problems:

1) If test.txt (the input file) is hello, text.txt (the output file) is:


he2lo
6


2) If test .txt is "fffffffffffffffffffffffffffffffffffffffffffff", text.txt is blank.

Here is my code (please don't laugh :)):

#include <iostream>
#include <fstream>
#include <sstream>

using namespace std;

int main(){
char * input;
long size;
ifstream infile ("test.txt",ifstream::binary);
ofstream outfile ("text.txt");
infile.seekg(0,ifstream::end);
size=infile.tellg();
infile.seekg(0);
cout << size;
int a=1;
string c;
char firstarray[2];
int n=-1;

std::ostringstream stm;


char compress[] = { 'H', 'e', 'l', 'l','l', 'o', '\0' };
input = new char [size];
infile.read (input,size);

cout << input;
infile.close();
//engine
while (n<size) {
n++;
while (input[n]==input[n+a]){
a++;
}
if (input[n]==input[n+a-1]){
if (a>1){

cout << a;
stm << a;
std::cout << stm.str();
c=stm.str();
cout << c;
sprintf( firstarray,"%d", a);
outfile.put (firstarray[0]);

}
}
if (input[n]!=input[n+1]){

cout << input[n];

outfile.put (input[n]);
}

}
cout << "\n";

}

If you see anyway I can fix one of the two problems, or if you see any way to simplify the code, please tell me what I should do. Thanks.

lloyd_b
March 15th, 2008, 11:16 PM
Ouch - my head hurts :)

Okay, some suggestions. One major problem you have is that you are only initializing "a" to 1 at the start of the program - you really need to initialize it inside of the while() loop (otherwise, it'll go buggy if there's more than one set of duplicate letters).

Second - Here's a quick-n-dirty rewrite of the main loop. Note: this requires "n" to be initialized to 0, rather than -1:



int n = 0;

...
...
...

// engine.
while (n < size) {
a = 1; // Initialize the "dup counter"

// count duplicates (if any)
while (input[n] == input[n + a]) {
a++;
}

// if dups, then output the number, followed by the letter
if (a > 1) {
cout << a;
cout << input[n];
} else
cout << input[n]; // Otherwise, just output the letter
n += a; // Advance "n" to the first non-matching letter

}
cout << "\n";

I left out your output file handling, and a section who's purpose I simply didn't understand. Here's what I get for output:

lloyd@laptop:~/stuff$ cat test.txt
aaabbdddddddddddfjgzzzzzzzzzz
lloyd@laptop:~/stuff$ ./comp
30aaabbdddddddddddfjgzzzzzzzzzz
3a2b11dfjg10z
and

lloyd@laptop:~/stuff$ cat test.txt
jjjjjjjjjjjjjjj
lloyd@laptop:~/stuff$ ./comp
16jjjjjjjjjjjjjjj
15j

Is this more or less what you were looking for?

WarMonkey
March 15th, 2008, 11:22 PM
Thanks for the re-write.

I forgot to take the cout's out. Just ignore them, I was trying to see where I had gone wrong.

mssever
March 15th, 2008, 11:33 PM
What happens if your input file contains digits? :-k

WarMonkey
March 16th, 2008, 12:02 AM
What happens if your input file contains digits? :-k

I was just thinking about that. Maybe "&*(8)*&"="8"? But that wouldn't really help save memory.

mssever
March 16th, 2008, 12:29 AM
You could escape the digit (aa3333445 => 2a4\32\4\5), but a compression algorithm should probably result in a binary file and factor out larger patterns, too (abcabcabc).

Wybiral
March 16th, 2008, 12:36 AM
You could escape the digit (aa3333445 => 2a4\32\4\5), but a compression algorithm should probably result in a binary file and factor out larger patterns, too (abcabcabc).

Yes. In fact, you should just use Huffman (http://en.wikipedia.org/wiki/Huffman_coding).

Fbot1
March 16th, 2008, 02:27 AM
If you are really interested in this, you should have a look at this: http://www.cs.fit.edu/~mmahoney/compression/.

WarMonkey
March 17th, 2008, 07:32 PM
I get this message...



main2.cpp: In function ‘int main()’:
main2.cpp:44: error: invalid conversion from ‘char*’ to ‘char’
main2.cpp:44: error: initializing argument 1 of ‘std::basic_ostream<_CharT, _Traits>& std::basic_ostream<_CharT, _Traits>::put(_CharT) [with _CharT = char, _Traits = std::char_traits<char>]’


...when


#include <iostream>
#include <fstream>
#include <sstream>

using namespace std;

int main(){
char * input;
long size;
ifstream infile ("test.txt",ifstream::binary);
ofstream outfile ("text.txt");
infile.seekg(0,ifstream::end);
size=infile.tellg();
infile.seekg(0);
cout << size;
int a=1;
string c;
char firstarray[]= { '0', '0', '0', '0','0', '0', '0' };
int n=0;

std::ostringstream stm;


char compress[] = { 'H', 'e', 'l', 'l','l', 'o', '\0' };
input = new char [size];
infile.read (input,size);

cout << input;
infile.close();
// engine.
while (n < size) {
a = 1; // Initialize the "dup counter"

// count duplicates (if any)
while (input[n] == input[n + a]) {
a++;
}

// if dups, then output the number, followed by the letter
if (a > 1) {

sprintf( firstarray,"%d", a);
cout << a;
outfile.put (firstarray);
cout << input[n];

} else
cout << input[n]; // Otherwise, just output the letter
outfile.put (input[n]);
n += a; // Advance "n" to the first non-matching letter

}
cout << "\n";

}

but when the code is like:


#include <iostream>
#include <fstream>
#include <sstream>

using namespace std;

int main(){
char * input;
long size;
ifstream infile ("test.txt",ifstream::binary);
ofstream outfile ("text.txt");
infile.seekg(0,ifstream::end);
size=infile.tellg();
infile.seekg(0);
cout << size;
int a=1;
string c;
char firstarray[]= { '0', '0', '0', '0','0', '0', '0' };
int n=0;

std::ostringstream stm;


char compress[] = { 'H', 'e', 'l', 'l','l', 'o', '\0' };
input = new char [size];
infile.read (input,size);

cout << input;
infile.close();
// engine.
while (n < size) {
a = 1; // Initialize the "dup counter"

// count duplicates (if any)
while (input[n] == input[n + a]) {
a++;
}

// if dups, then output the number, followed by the letter
if (a > 1) {

sprintf( firstarray,"%d", a);
cout << a;
outfile.put (firstarray[0]);
cout << input[n];

} else
cout << input[n]; // Otherwise, just output the letter
outfile.put (input[n]);
n += a; // Advance "n" to the first non-matching letter

}
cout << "\n";

}

I get no errors. How would I get around this? I am trying to get the whole character sequence written to "text.txt" but it won't let me.

Also there are some unneeded things in the code, just ignore them. What is important is the red text.

mssever
March 17th, 2008, 08:17 PM
I know very little C++, but it looks like you're trying to cast an array to a string, which I'm guessing must be done explicitly.

C++ does support strings. Is there a reason you're using a char array instead?

Sockerdrickan
March 17th, 2008, 08:19 PM
try &firstarray[0] or *firstarray

Fbot1
March 17th, 2008, 08:25 PM
I got " error: invalid conversion from `char*' to `char' ". Do you have it set up so you can see the error messages?

Sockerdrickan
March 17th, 2008, 08:44 PM
What's the output supposed to be? I'm getting 0 with this modified code

#include <iostream>
#include <fstream>
#include <sstream>

using namespace std;

int main(){
char * input;
long size;
ifstream infile ("test.txt",ifstream::binary);
ofstream outfile ("text.txt");
infile.seekg(0,ifstream::end);
size=infile.tellg();
size+=1;
infile.seekg(0);
cout << size;
int a=1;
string c;
char firstarray[]= { '0', '0', '0', '0','0', '0', '0' };
int n=0;

std::ostringstream stm;


char compress[] = { 'H', 'e', 'l', 'l','l', 'o', '\0' };
input = new char [size];
infile.read (input,size);

cout << input;
infile.close();
// engine.
while (n < size) {
a = 1; // Initialize the "dup counter"

// count duplicates (if any)
while (input[n] == input[n + a]) {
a++;
}

// if dups, then output the number, followed by the letter
if (a > 1) {

sprintf( firstarray,"%d", a);
cout << a;
outfile.put (*firstarray);
cout << input[n];

} else
cout << input[n]; // Otherwise, just output the letter
outfile.put (input[n]);
n += a; // Advance "n" to the first non-matching letter

}
cout << "\n";

}

WarMonkey
March 17th, 2008, 09:03 PM
What's the output supposed to be? I'm getting 0 with this modified code


text.txt is supposed to be:


2eted4cd42t


if test.txt is



eetedccccdtttttttttttttttttttttttttttttttttttttttt tt


however, I get this for text.txt instead:



2eted4cd4t



In other words, the second output line of the terminal:

itzchak@warmonkey:~/Documents/Code/C++/Compressor$ ./a.out
52eetedccccdtttttttttttttttttttttttttttttttttttttt tttt
2eted4cd42t

Is what I want text.txt to be.

supirman
March 17th, 2008, 10:50 PM
don't use outfile.put(), as put take a single char.

Also, you don't need the sprintf, just use 'a' directly.
instead, use


outfile << a;

Fbot1
March 18th, 2008, 12:03 AM
Almost forgot, C has something called the "null" character that ends the string so you don't need to know the size. So I'd suggest using "while(input[n] != 0)" instead.

supirman
March 18th, 2008, 11:02 AM
There are quite a few other ways to clean the code up, but there is one major thing you need to fix.


input = new char [size];

When you 'new' something, you must 'delete' it.