[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[ProgSoc] Re: algorithm answer (Was: algorithm question)
On Fri, 29 Jun 2001 22:15, Telford wrote:
] Another idea is to write a script that handles one file
] but which has a p probability of forking itself into the
] background to finish the job and a (1-p) probability of
] waiting to finish the job normally before returning.
] You can run find -exec on that one and then tune the p
] value to get the best overall speed most of the time.
] If you know that the job usually takes 6 hours and today it
] has gone over 10 then you probably hit a bad permutation
] so kill all the tasks and try again.
I liked that idea .. some kind of self-sensing (Sara Lee style)
script that adjusted to the environment .. but concerns of
becoming the next Robert Morris, plus the more genuine
problem of actually wanting to try running something this
year, led me to a simpler solution.
My script (yes, it just turned out easier, once I worked out
my algorithm) scans thru with find, dumps every dir entry
into a child script (so it can re-create the dir-structure later),
and every file entry into X number of other child scripts
(and where X may be the number of CPU's you have, but
it happily turned out to be easy (for me) to make this a
parameter). Once the entire structure is thusly analyzed,
the first child script is run - which creates all the sub-dirs,
which is a bit slow, since it's necessarily done over an SMB
or NFS connection .. and once that's done, the X number of
zipping-n-zapping scripts are then launched. Those script
just contain a bunch of lines that cat $file thru gzip and
dump it into the right place on /mnt/samba/whatnot
In my case this works relatively well .. it was easy to write,
and resolved (if not elegantly) the problem I had with my
input data - 150gig of data, in 100,000 files - mostly small,
but with several dozen 2gig extremely-compressable oracle
files strewn thru-out the directory structure.
] Yeah, you will have to replace the Solaris find because you
] don't have source to that but probably about a year after
] you release your patch for GNU find with SMP support,
] Sun will release Solaris find with SMP support that uses a
] completely incompatible command line option and requires
] two arbitrary environment variables to be set before it works.
<aside> Soon, this won't be a concern for me ... I've initiated
a plan to entirely ween my employers off Solaris and onto Debian.
Benchmarks are helping. A 4-way squillion dollar 2 year old
Sparc box with 2gig of RAM takes just over four hours to run
one of our regular oracle batch jobbies .. and 40 minutes on
our 2-way 1 year old $8,000 1gig RAM intel box. </aside>
Anyway - in keeping with this being a programmers society
and all .. the source to afore-mentioned exceedingly niche
utilities is available at :
And yes, bash scripting isn't my day job . . .
PS. To really brag, you need to have been on the Concorde,
not just looked at it. ; )
jedd == jedd at progsoc dot org
"The unemployment queue is no longer just for philosophy
majors - useful people are now being affected too."
-- Kent Brockman, The Simpsons.
You are subscribed to the progsoc mailing list. To unsubscribe, send a
message containing "unsubscribe" to email@example.com.
If you are having trouble, ask firstname.lastname@example.org for help.