[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ProgSoc] algorithm question



On Wed, Jun 27, 2001 at 09:27:01PM +1000, jedd wrote:
>  Therefore, I want to go thru every file, pipe it through gzip,
>  and output it straight onto the nfs mount (so I effectively end
>  up with an exact replica, except everything on the target has
>  a .gz extension).  How I want to do it is by going thru every
>  directory, recursively, and forking the gzip&pipe processes,
>  but only up to 8 processes running concurrently .. and that's
>  the bit that I have no idea how to control.

Normally I would say, write a script that does what you want for
one file, then use find -exec to go through the directory.
In this case network bandwidth is probably more of a bottleneck than
CPU so more processors won't help you... at any rate even a
single threaded application gets some help from the SMP when
handling the IP interrupts, etc. Probably two CPUs at the most
would be helpful. If you want to use more CPUs then run bzip2
instead.

Recursive make does have a method of spawning up to a
limit of CPUs but dunno how easy it would be to apply to
this application.

Another idea is to write a script that handles one file
but which has a p probability of forking itself into the
background to finish the job and a (1-p) probability of
waiting to finish the job normally before returning.
You can run find -exec on that one and then tune the p
value to get the best overall speed most of the time.
If you know that the job usually takes 6 hours and today it
has gone over 10 then you probably hit a bad permutation
so kill all the tasks and try again.

>  Is there a way of doing this kind of thing in bash ?  If not,
>  how would you do it in any [other] language anyway?  I
>  don't really want to spend a huge amount of time fiddling
>  with this thing, but it seems to me that it can't be *that*
>  uncommon a requirement - to launch X number of tasks
>  on an X-SMP box, and maintain that number of tasks.

There is a bit of communication required to know which
tasks have been done but not such a lot. You could probably
extend the find command to keep a counter of how many exec
calls it has done, then wait for any child to return before
continuing. I'd guess it is not a difficult hack to the find
source and you could make your name in GNU history.

Yeah, you will have to replace the Solaris find because you
don't have source to that but probably about a year after
you release your patch for GNU find with SMP support,
Sun will release Solaris find with SMP support that uses a
completely incompatible command line option and requires
two arbitrary environment variables to be set before it works.

On that note, the correct command line option is
probably -j to keep in line with GNU make.

-- 
S1G: 6184402 seconds remaining			- Tel
-
You are subscribed to the progsoc mailing list. To unsubscribe, send a
message containing "unsubscribe" to progsoc-request@nospam.progsoc.uts.edu.au.
If you are having trouble, ask owner-progsoc@nospam.progsoc.uts.edu.au for help.