• Welcome to the Fantasy Writing Forums. Register Now to join us!

OpenOffice filesize question

Almyrigan Hero

Minstrel
Up until this point (as in, this very day,) I've written everything in WordPad. From novel manuscripts, to notes for other projects, to random scribblings, to backing up temporary online drafts, it's just what I've always used; and I'm nothing if not a creature of habit. Just last night, something finally clicked in my head, and I realized just how annoyed I was by the lack of a spellcheck. My browser has a spellcheck. Discord has a spellcheck. So why was I contenting myself to write an entire novels in something without spellcheck?

Apache OpenOffice was what I settled on. It was free, it was widely recommended, and it seemed pretty legit, as free software goes. Toyed with it a while, happy with the spellcheck and dictionary features, but after correcting and saving about four chapters, something a bit odd caught my eye. They'd each more than doubled in filesize! This occurs even if adding a single character is THE ONLY alteration I make to a file. I didn't change filetype, either; they're still .rtf's, just .rtf's that're apparently 2x less space-efficient.

Anybody else have this same experience? Is this 'supposed' to be happening? Is it necessary, or is there some box I can tick in options to save with the same... compression, I guess, as WordPad? I don't need any fancy formatting or anything, I'm just spellchecking what were already supposed to be my 'final' revisions.
 

Chasejxyz

Inkling
Are you familiar with HTML at all? Even if a webpage looks deceptively simple, there's still a lot going on under the hood that you can't see. There's all the stuff in the <head> tag which tells the browser stuff like what the name should be in the tab, stuff for Google etc. The paragraphs are formatted with <p> tags, which are then further formatted via stuff on the linked style sheet. You'll also notice that a lot of stuff looks weird, like it's not " it's &ldquo; since things like " are used in the code itself; the browser sees the & and says "oh okay I need to render it as something else."

The reason I bring this up is because this is, ultimately, how all documents work. OpenOffice is going to put new stuff in the "head tag" to do things that only OpenOffice can do. It probably has different labeling (imagine something did {/p} instead of <p>) for your text. This includes invisible characters like tabs and carriage returns, this includes any headers/footers you might have, this even includes metadata like when the file was first created or edited. OpenOffice is a more powerful program than WordPad, and because of that, the files needs more bits and bytes to do those things, which means it's a bigger file.

But it's 2021, why are you worried about this? My 190k word first draft is less than 1 MB and was made in Word365. Compare this to my ORIGINAL original first draft from 2007, which is also about 190k words, and was made in Word 97-2003, that's just over 1 MB. If a few hundred KB of extra space being used is causing you to worry, well, you have bigger problems, like you'll never be able to update your computer because there isn't enough room for said updates. How are you even able to use a web browser? Any time Google updates their logo your computer will crawl to a stop because it's using up all your available free space! Or, you know, you have a modern computer and micromanaging storage to that level is just making you stress for no reason.
 

CupofJoe

Myth Weaver
For my personal writing, I use LibreOffice [a co-variant of Open Office, but I like LO better] and for work, I use MS Office Word [and Office365].
Just occasionally I accidentally use Word for something personal and I've noticed that Word produces much smaller files [up to 70% smaller] than OpenOffice does. It doesn't bother me, just something I've noticed.
 

Almyrigan Hero

Minstrel
Are you familiar with HTML at all? Even if a webpage looks deceptively simple, there's still a lot going on under the hood that you can't see. There's all the stuff in the <head> tag which tells the browser stuff like what the name should be in the tab, stuff for Google etc. The paragraphs are formatted with <p> tags, which are then further formatted via stuff on the linked style sheet. You'll also notice that a lot of stuff looks weird, like it's not " it's &ldquo; since things like " are used in the code itself; the browser sees the & and says "oh okay I need to render it as something else."

The reason I bring this up is because this is, ultimately, how all documents work. OpenOffice is going to put new stuff in the "head tag" to do things that only OpenOffice can do. It probably has different labeling (imagine something did {/p} instead of <p>) for your text. This includes invisible characters like tabs and carriage returns, this includes any headers/footers you might have, this even includes metadata like when the file was first created or edited. OpenOffice is a more powerful program than WordPad, and because of that, the files needs more bits and bytes to do those things, which means it's a bigger file.

But it's 2021, why are you worried about this? My 190k word first draft is less than 1 MB and was made in Word365. Compare this to my ORIGINAL original first draft from 2007, which is also about 190k words, and was made in Word 97-2003, that's just over 1 MB. If a few hundred KB of extra space being used is causing you to worry, well, you have bigger problems, like you'll never be able to update your computer because there isn't enough room for said updates. How are you even able to use a web browser? Any time Google updates their logo your computer will crawl to a stop because it's using up all your available free space! Or, you know, you have a modern computer and micromanaging storage to that level is just making you stress for no reason.

Not pressed for space or stressed over it, I was just a bit confused because I've never previously had the experience of switching to a program that inflates the size of visibly identical files that drastically. Very informative regardless, though; I always thought individual filetypes were formatted a bit more uniformly than that.
 
Last edited:

skip.knox

toujours gai, archie
Moderator
I use LibreOffice as well (it's also free, btw, and is well-maintained). I just tried an experiment. Opened a 26k file. Added two lines of text. Saved it. The result was actually 22k. Why smaller? Because the file was over two years old and there've been new versions of LO.

By coincidence, that exact file was also in RTF format (same file date). I opened that, added the same two lines, and saved it. The result is 31k. Why? Dunno, except to say that RTF is often, though not always, larger than the .docx, .odt, or whatever format.

I expect such variations. I would not expect a full doubling of file size, though. I would try an experiment on another computer, just as a control. If it happens with Apache OO there, I'd blame the software as being inefficient.
 

Insolent Lad

Maester
I just went to one of my WIPs that stands at 58 KB in OpenOffice ODT format and did a 'save as' in RTF. The result was 280 KB. If you're going to use OpenOffice or LibreOffice you're better off saving your files in ODT. At least as long as you are still working on them.
 

skip.knox

toujours gai, archie
Moderator
RTF was never intended to be the default storage format for files (looking at you, Scrivener). So native file formats (docx, odt) are always going to be smaller. The more heavily formatted (images, lots of fonts, etc), the greater will be the file size difference, since RTF stores everything as plain text. If there's lots of formatting, that all has to be stored as ... oh, call it explanatory text on top of what you've actually written.
 

Chasejxyz

Inkling
RTF was never intended to be the default storage format for files (looking at you, Scrivener).

I am absolutely amazed by what a hot mess Scrivener is. It is an insanely powerful program, yes, but WHY is the "file structure" done like that? Why can't you do operators in search. Why is the default spellchecker so terrible.

RTF is "nice" in that it's a "neutral" format that most word editing programs can handle. In Ye Olden Dayze, only the latest version of Office could open .docx, .xlsx, .pptx etc. This happened when I was in college, so there were professors who would post .pptx slides or .docx rubrics but a decent chunk of students couldn't read those, because they were running Linux (it was a big computer science school, okay) or couldn't afford office so they were using something like OpenOffice or Google Docs. Lots of "I'm sorry but I can't open that, can you send me something else?" But nowadays pretty much everything can read .docx, but if Microsoft decides to invent .docz then we'll run into this issue all over again. But that's probably not going to happen since so much stuff lives online anyways.
 

skip.knox

toujours gai, archie
Moderator
I'm pretty happy with Scrivener. The file structure is Byzantine, true, but have you ever looked at the internal structure of a Word doc? It turns out that documents are complicated no matter how you parse them. I don't ever use the spellchecker, so I can't speak to that one, but I'm curious: are you talking about doing a regex in search?

I do remember those days. I was once a PC support tech at a university. Heck, I go back far enough to know what a .wpd file is and even to when there weren't any default file extensions at all for something like Wordstar or PerfectWriter, and you could get special programs just for translating file formats. Kids these days have it so easy. <g>
 

Chasejxyz

Inkling
I'm pretty happy with Scrivener. The file structure is Byzantine, true, but have you ever looked at the internal structure of a Word doc? It turns out that documents are complicated no matter how you parse them. I don't ever use the spellchecker, so I can't speak to that one, but I'm curious: are you talking about doing a regex in search?

The nice thing about a Word doc is that it is one file, and it is incredibly difficult for someone to screw it up accidentally. Not many people would think to open it up with notepad and then delete random chunks of text. It's also only 1 file, so it's easy to sync across cloud storage and work on across devices (since there's only 1 file to download). Meanwhile, Scrivener.

I'm not talking about regex (tho it does have that), I'm talking about basic stuff like "tag: 'romeo' NOT 'juliet' " for a scene that only has romeo in it or "label: 'incomplete' + text: 'finish this part' " You can only search tags, or labels, or text in the document, you can't do different types of things. What's the point of having all this metadata if you can't run a search that utilizes multiple pieces of it? I've searched both the 700+ page "help doc" plus Google, and everyone posting on various forums looking for the answer can't find it, either, or it just doesn't exist.

The spellcheck is just as bad as it was in MS Word 2003, and I'm using Scrivener 3.something, which came out earlier this year. The dictionary isn't saved between projects, so I have to add "transgender" every single time. Which is incredibly stupid.
 

skip.knox

toujours gai, archie
Moderator
I'm not quite sure how one screws up a Scrivener file accidentally (unless it's going in and directly editing the sub-files outside of Scrivener, against which the manual specifically warns), but I do know from 20 years of tech support that Word files can and do mess up! Or did, anyway. Maybe it's better now, I dunno.

I've worked for several years with Scrivener across cloud storage. Never had a problem. Works like a champ, including snapshot files, backups, all of it.

As for search, it looks like they rely on regular expressions to do boolean operators. I agree it would be nice to have some more intuitive shortcuts, but they seem content to provide a full regex engine. Anyway, it certainly can be done.

As for the dictionary, I don't know what's going on but I can say definitively and absolutely that words added are saved. We're fantasy writers around here, so of course we have tons of specialized words. They go in, they stay in, and this is true not just within a project but across projects. I just tested it to confirm. I hasten to add that if one adds a word locally on one computer, then has another project on another computer, that will of course be a separate dictionary.

I say all this not to try to argue and win points, but to be clear as to specific aspects of software, so that others coming to this thread won't go away with a wrong impression. Or at least can choose among impressions. I have any number of complains about Scrivener, but they're niggling, and 3 (Windows) is a great improvement over v2.

One final observation. If I were a short story writer, I might be more inclined toward a word processor solution. One story, one file. But I write a fantasy world, with multiple stories. I keep a separate Scrivener project for worldbuilding, and one project per novel (or short story, I have a few of those). All are set in the same world, so having one dictionary, keeping an overall timeline, common resources, all that would, imo, be a nightmare in something like Word. More than opinion, really, for that's where I started before moving to yWriter and then to Scrivener. Much depends on how one works.
 
Top