Hi, I am trying to read various files into a buffer via the read system call. For text files the code pasted below works fine, but when I try to read in pdf files something goes wrong and only part of the file is read in. Here is the relevant code:
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#define MAX_SIZE 1000000 //max size of buffer
using namespace std;
main(int argc, char *argv[]){
int src_fd; //the source file descriptor
char buff[MAX_SIZE]; //the pointer to the buffer into which we read the file
struct stat stat_buf; // hold information about the source file
struct hostent *hostInfo;
//check that source file exists and can be opened (also initialize src)
if(argv[1]==NULL){
printf("you must provide name of source file as an arg\n");
exit(1);
}
//initialize the file descriptor for the source file and open file for reading
src_fd = open(argv[1], O_RDONLY);
// get size and permissions of the source file
fstat(src_fd, &stat_buf);
printf("got size and permissions of %s\n", argv[1]);
printf("its size is %i\n", stat_buf.st_size);
//zero out the buffer
memset(buff, 0x0, MAX_SIZE);
//printf("buffer zeroed out\n");
//read the contents of the file into the buffer
read(src_fd, buff, stat_buf.st_size);
printf("the size of the buffer is %i bytes\n", strlen(buff));
if( read < 0){
fprintf(stderr,"unable to read file into buffer: %s\n", strerror(errno));
exit(1);
}
exit(0);
}
The code runs just fine -- It just doesn't read the entire pdf file into the buffer. Does anyone have any idea what I am doing wrong? Any assistance is greatly appreciated. Thanks!
-Jeremy
RTFM READ(2)
RTFM
READ(2) Linux Programmer’s Manual READ(2) NAME read - read from a file descriptor SYNOPSIS #include <unistd.h> ssize_t read(int fd, void *buf, size_t count); DESCRIPTION read() attempts to read up to count bytes from file descriptor fd into the buffer starting at buf. If count is zero, read() returns zero and has no other results. If count is greater than SSIZE_MAX, the result is unspecified. RETURN VALUE On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number. It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or from a terminal), or because read() was interrupted by a signal. On error, -1 is returned, and errno is set appropriately. In this case it is left unspecified whether the file position (if any) changes.strlen(buff)
Assuming you know how C-strings and especially strlen(buff) work and that pdf files are binary and can contain every character value, I can't make sense of your statement
Also you should limit the read() size to MAX_SIZE to avoid buffer overflows, but you probably simplified the code here for test reasons; this would explain the missing close(), the too late error handling, the long include list, the superfluous variables and the strange C/C++ mixture.
But why do you compare a function pointer (
read) to zero and don't use a variable to hold the read()'s return value? Does this even compile?And why can't posters preview comments with their eyeballs instead of their nervous mouse fingers and post heaps of unindented source code with missing include file names?
if( read < 0){ Nice one.
if( read < 0){Nice one.