Resend my activation email : Register : Log in 
BCF: Bike Chat Forums


Downloading the Project Gutenberg Science Fiction Bookshelf

Reply to topic
Bike Chat Forums Index -> The Geek Zone
View previous topic : View next topic  
Author Message

Rogerborg
nimbA



Joined: 26 Oct 2010
Karma :

PostPosted: 16:56 - 03 Jun 2017    Post subject: Downloading the Project Gutenberg Science Fiction Bookshelf Reply with quote

https://www.gutenberg.org/wiki/Science_Fiction_(Bookshelf)

I got hacked off with not being able to bulk download the whole lot, so hacked up some Perl to bulk download the whole lot. I might as well share the h4xx while they download.

You'll need perl and wget and some sort of lunix shell, or Cygwin I suppose. This version pulls down the kindle.noimages content for each title and names the files as "Actual Title.mobi" rather than the rather unhelpful file number; other formats are available.

As tradition demands, any nerds reading this must strive for alpha-geek status by saying how they'd have done it better if they'd done it which they didn't. Whistle

Code:

use strict;

my $htmlIndexFile = "Science_Fiction_(Bookshelf)";

`mkdir books`;
`rm $htmlIndexFile`;
`wget -q "https://www.gutenberg.org/wiki/$htmlIndexFile"`;

my $document = do {
    local $/ = undef;
    open my $fh, "<", $htmlIndexFile
        or die "could not open $htmlIndexFile: $!";
    <$fh>;
};


my @books = ( $document =~ /title="ebook:(\d+)">([^<]+)</g );

my $numberOfBooks = (scalar @books) / 2;
my $book;

for ($book = 0; $book < $numberOfBooks; ++$book) {

    my $number = $books[$book * 2];
    my $title = $books[($book * 2) + 1];

    # Replace double quotes with singles so as not to spack the output
    $title =~ s/"/''/g;

    my $url = "https://www.gutenberg.org/ebooks/$number.kindle.noimages";
    my $filename = "books/$title.mobi";

    if (!-e $filename) {
        print "Getting $book of $numberOfBooks: $title, #$number\n";
        `wget -q --content-disposition $url --output-document "$filename"`;

        if ($? != 0 && $? != 2048) {
            print "Download of $title, #$number failed with code $?\n";
            exit(1);
        }
    }
}

____________________
Biking is 1/20th as dangerous as horse riding.
GONE: HN125-8, LF-250B, GPz 305, GPZ 500S, Burgman 400 // RIDING: F650GS (800 twin), Royal Enfield Bullet Electra 500 AVL, Ninja 250R because racebike
 Back to top
View user's profile Send private message You must be logged in to rate posts

TbirdX
Crazy Courier



Joined: 06 Dec 2015
Karma :

PostPosted: 17:56 - 03 Jun 2017    Post subject: Reply with quote

Nice.......but




I'd have written it using comic sans......in pink.
____________________
VFR800X - TTR250
 Back to top
View user's profile Send private message You must be logged in to rate posts

ScaredyCat
World Chat Champion



Joined: 19 May 2012
Karma :

PostPosted: 18:00 - 03 Jun 2017    Post subject: Reply with quote

Rogerborg wrote:

I got hacked off with not being able to bulk download the whole lot


Bit more reading and a little less grumpy would have helped:


Quote:
The Project Gutenberg Science Fiction CD is is available via BitTorrent from The Project Gutenberg BitTorrent Tracker.

The March 2007 Science Fiction Bookshelf CD


Rogerborg wrote:
As tradition demands, any nerds reading this must strive for alpha-geek status by saying how they'd have done it better if they'd done it


https://www.gutenberg.org/cdproject/pgsfcd-032007.zip.torrent


or you can download from my dropbox..
____________________
Honda CBF125 ➝ NC700X
Honda CBF125 ↳ Speed Triple
 Back to top
View user's profile Send private message You must be logged in to rate posts

Rogerborg
nimbA



Joined: 26 Oct 2010
Karma :

PostPosted: 18:50 - 03 Jun 2017    Post subject: Reply with quote

ScaredyCat wrote:
Quote:
The Project Gutenberg Science Fiction CD is is available via BitTorrent from The Project Gutenberg BitTorrent Tracker.

The March 2007 Science Fiction Bookshelf CD

Aww, bless, you clicked a link. Clapping

As you'll be aware, a lot of SF has dropped out of copyright in the last decade, and the SF bookshelf is regularly updated. Enjoy your obsolete ickle sub-archive though. Razz
____________________
Biking is 1/20th as dangerous as horse riding.
GONE: HN125-8, LF-250B, GPz 305, GPZ 500S, Burgman 400 // RIDING: F650GS (800 twin), Royal Enfield Bullet Electra 500 AVL, Ninja 250R because racebike
 Back to top
View user's profile Send private message You must be logged in to rate posts

ScaredyCat
World Chat Champion



Joined: 19 May 2012
Karma :

PostPosted: 19:46 - 03 Jun 2017    Post subject: Reply with quote

Rogerborg wrote:
Aww, bless, you clicked a link. Clapping

As you'll be aware, a lot of SF has dropped out of copyright in the last decade, and the SF bookshelf is regularly updated. Enjoy your obsolete ickle sub-archive though. Razz


Come back when you've read them all and I'll dig out the rest for you.
____________________
Honda CBF125 ➝ NC700X
Honda CBF125 ↳ Speed Triple
 Back to top
View user's profile Send private message You must be logged in to rate posts

Rogerborg
nimbA



Joined: 26 Oct 2010
Karma :

PostPosted: 20:00 - 03 Jun 2017    Post subject: Reply with quote

ScaredyCat wrote:
Come back when you've read them all and I'll dig out the rest for you.

I think one of us is confused about which of us has them all.

Did Perl kill your father?
____________________
Biking is 1/20th as dangerous as horse riding.
GONE: HN125-8, LF-250B, GPz 305, GPZ 500S, Burgman 400 // RIDING: F650GS (800 twin), Royal Enfield Bullet Electra 500 AVL, Ninja 250R because racebike
 Back to top
View user's profile Send private message You must be logged in to rate posts

ScaredyCat
World Chat Champion



Joined: 19 May 2012
Karma :

PostPosted: 21:45 - 03 Jun 2017    Post subject: Reply with quote

Rogerborg wrote:
Did Perl kill your father?


No. It's because you used backticks and wget, and not LWP.
____________________
Honda CBF125 ➝ NC700X
Honda CBF125 ↳ Speed Triple
 Back to top
View user's profile Send private message You must be logged in to rate posts

UnknownStuntm...
World Chat Champion



Joined: 13 Sep 2007
Karma :

PostPosted: 21:53 - 03 Jun 2017    Post subject: Re: Downloading the Project Gutenberg Science Fiction Booksh Reply with quote

Borg. My man. Tabs? TABS? In this day and age?


Spaces.


Amateur.


Lookachoo.
 Back to top
View user's profile Send private message You must be logged in to rate posts

Rogerborg
nimbA



Joined: 26 Oct 2010
Karma :

PostPosted: 22:08 - 03 Jun 2017    Post subject: Reply with quote

UnknownStuntman wrote:
Borg. My man. Tabs?

Wut?


ScaredyCat wrote:
you used backticks and wget, and not LWP.

It's the way I was raised, you racist.

I'm sure your script would have been better, if you'd written one.
____________________
Biking is 1/20th as dangerous as horse riding.
GONE: HN125-8, LF-250B, GPz 305, GPZ 500S, Burgman 400 // RIDING: F650GS (800 twin), Royal Enfield Bullet Electra 500 AVL, Ninja 250R because racebike
 Back to top
View user's profile Send private message You must be logged in to rate posts

ScaredyCat
World Chat Champion



Joined: 19 May 2012
Karma :

PostPosted: 23:41 - 03 Jun 2017    Post subject: Reply with quote

Rogerborg wrote:

It's the way I was raised, you racist.


Who were you raised by Bodgit, Patchit and Runn?

Rogerborg wrote:

I'm sure your script would have been better, if you'd written one.


You may as well have done the rest of it in bash script too, just using loads of backticks for it all.


Folded arms
____________________
Honda CBF125 ➝ NC700X
Honda CBF125 ↳ Speed Triple
 Back to top
View user's profile Send private message You must be logged in to rate posts

Rogerborg
nimbA



Joined: 26 Oct 2010
Karma :

PostPosted: 23:50 - 03 Jun 2017    Post subject: Reply with quote

You're lucky I didn't write it as a K&R C macro.

To be honest, the whole thread is just to milk pedant tears. Sweet, nourishing tears. Drooling
____________________
Biking is 1/20th as dangerous as horse riding.
GONE: HN125-8, LF-250B, GPz 305, GPZ 500S, Burgman 400 // RIDING: F650GS (800 twin), Royal Enfield Bullet Electra 500 AVL, Ninja 250R because racebike
 Back to top
View user's profile Send private message You must be logged in to rate posts

barrkel
World Chat Champion



Joined: 30 Jul 2012
Karma :

PostPosted: 01:18 - 04 Jun 2017    Post subject: Reply with quote

Code:
wget -a log -O - 'https://www.gutenberg.org/wiki/Science_Fiction_(Bookshelf)' | egrep -o 'ebook:[0-9]+">[^<]+' | sed -r 's|"|\\"|g;s|ebook:([0-9]+)\\">(.*)|wget https://www.gutenberg.org/ebooks/\1.kindle.noimages -O "\2.mobi" -a log|' | bash


Just cos I can. Thanks Rogerborg.
____________________
Bikes: S1000R, SH350; Exes: Vity 125, PS125, YBR125, ER6f, VFR800, Brutale 920, CB600F, SH300x4
Best road ever ridden: www.youtube.com/watch?v=s2MhNxUEYtQ
 Back to top
View user's profile Send private message You must be logged in to rate posts

Rogerborg
nimbA



Joined: 26 Oct 2010
Karma :

PostPosted: 09:30 - 04 Jun 2017    Post subject: Reply with quote

https://i.imgur.com/fwtajh9.jpg

I award you one alpha-geek award.

Disappointed that you didn't get some awk in there though. Razz
____________________
Biking is 1/20th as dangerous as horse riding.
GONE: HN125-8, LF-250B, GPz 305, GPZ 500S, Burgman 400 // RIDING: F650GS (800 twin), Royal Enfield Bullet Electra 500 AVL, Ninja 250R because racebike
 Back to top
View user's profile Send private message You must be logged in to rate posts
Old Thread Alert!

The last post was made 6 years, 320 days ago. Instead of replying here, would creating a new thread be more useful?
  Display posts from previous:   
This page may contain affiliate links, which means we may earn a small commission if a visitor clicks through and makes a purchase. By clicking on an affiliate link, you accept that third-party cookies will be set.

Post new topic   Reply to topic    Bike Chat Forums Index -> The Geek Zone All times are GMT + 1 Hour
Page 1 of 1

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum

Read the Terms of Use! - Powered by phpBB © phpBB Group
 

Debug Mode: ON - Server: birks (www) - Page Generation Time: 0.53 Sec - Server Load: 0.88 - MySQL Queries: 17 - Page Size: 86.89 Kb