You can now specify a range of pages:
Download here.
Until now, Google Book Downloader has used the default image resolution provided by Google, even though books.google.com allows you to zoom in on pages to see higher resolution. Version 2.1 fixes this, finding the highest resolution available for every book you download. To download Google Book Downloader, see the app page here.

Quality comparison between GBD v. 2.0.1 and 2.1: Click to see a larger version where you can see the difference.
As you can see, books downloaded with this new version look much better. Now, the technical details:
Books on Google Books have a zoom in button that allows you to see the pages on most books in higher detail. In order to get GBD to download higher detail images, I needed to somehow hijack this button. My first attempt was, of course, with Javascript. But the Javascript code that does the online zooming turned out to be very evasive—hidden inside layers of anonymous functions and called by Google’s custom event handling code. I eventually gave up on this.
In order to get around this problem, I used Javascript to first find the button I wanted to click, (using getElementById) but rather then trying to “click” the button with Javascript, passed the location of the button to Objective-C code. The native Objective-C code then used an NSEvent to simulate a click in the webpage.
If you’re interested, this code is all available on Github. It simply adds a few methods to WebKit’s WebView which allow you to find an element in a web page, click on a location in a web page, or click on an element.
Google Book Downloader 2.0 is out! What’s new: much faster, fewer bugs, and less important things. To download, see the product page here.
If you are into the technical details, maybe you are wondering what went on behind the scenes to make such improvements. First, a quick background on scraping:
One way to scrape: Start with an AJAX web app that you want to borrow some data from. Reverse engineer it until you understand the API that it is using to get data from the server. Then re-engineer your app to use this API.
However, there is another way to scrape: Start with someone else’s AJAX web app. Load it into a web browser engine. Use whatever hooks the web browser engine provides to do things like 1) run your own JavaScript on the page and 2) monitor the network traffic of the web app. Compared to the first way to scrape, this requires a lot less thinking!
As you may have guessed, GBD 1 used the first method, while GBD 2 used the second method. Not only does this prevent me from having to worry about Google’s AJAX calls, but it also simplifies the source code of the application. It also has the nice effect of speeding up the application.
Pwntcha is an open source tool for breaking CAPTCHAs. While it is a few years old and only works for very simple CAPTCHAs, it’s still and interesting project and would be a good place to start if you wanted to write a program to break more complex ones. To install it on OS X:
sudo port install libsdl_image
As an alternative, installing imlib2 would probably also work.
I encountered an error installing db46, one of the dependencies of libsdl and imlib2, which I fixed by installing the Java for Mac OS X 10.6 Update 3 Developer Package.
svn co svn://svn.zoy.org/caca/pwntcha/trunk pwntcha
cd pwntcha ./bootstrap ./configure sudo make install
curl -O http://hactheplanet.com/blog/wp-content/uploads/2011/01/authimage.jpeg pwntcha authimage.jpeg
Using the Ruby Mechanize library, I have been writing a Ruby class to allow automation of parts of Facebook like friending, status updating, and messaging.
You can get the class here and use it like so:
#!/usr/bin/ruby
# Require FacebookBot.rb from the same directory.
require File.join(File.dirname(__FILE__), 'FacebookBot.rb')
# Log in.
fb = FacebookBot.new("example@example.com", "secret")
# Accept all friend requests.
fb.acceptRequests
# Friend a whole page of suggested friends.
fb.suggestedFriends.each { |friendId| fb.requestFriend(friendId) }
# Display all personal messages and recent wall posts.
require "pp"
pp fb.personalMessages
pp fb.recentPosts
Using code from the UnMHT QuickLook project, I put together an app that renders MHTML files using WebKit on OS X. Features: print, export to PDF, and export to webarchive.
Google Book Downloader is my app that downloads Google Books/Book Previews in PDF format. The major problem with it so far has been that when it encodes JPEGs from Google’s servers into PDF format, there is a loss in quality of the image. So far I have found no way to avoid this loss of quality when making a PDF from JPEGs.
Today I’m releasing a new version (1.2) of Google Book Downloader in which you can choose not to save a PDF, but a folder of JPEGs. The folder also has an index.html file that makes it convenient to read the JPEGS in the right order.
Download it here.

Say we wanted to make a command line program like date use a fake time instead of the current one. We could do this by supplying a time() function to replace the time() function in libSystem.
How do we know that date uses time()? We use nm, which lists all the symbols used by a particular program:
$ nm -m /bin/date | grep _time (undefined [lazy bound]) external _time (from libSystem)
Once we know what function to replace, we can write a replacement function:
// time.c
#include <sys/time.h>
// This function will override the one in /usr/lib/libSystem.dylib.
time_t time(time_t *tloc)
{
// January 1st, 2000.
struct tm timeStruct;
timeStruct.tm_year = 2000 - 1900;
timeStruct.tm_mon = 0;
timeStruct.tm_mday = 1;
timeStruct.tm_hour = 0;
timeStruct.tm_min = 0;
timeStruct.tm_sec = 0;
timeStruct.tm_isdst = -1;
*tloc = mktime(&timeStruct);
return *tloc;
}
Then we compile the code as a dynamic library:
gcc -c time.c gcc -flat_namespace -dynamiclib -current_version 1.0 time.o -o libTime.dylib
To tell OS X’s dynamic linker to load our dynamic library, we need to set DYLD_INSERT_LIBRARIES to the path of the library. We also need to set DYLD_FORCE_FLAT_NAMESPACE, or our function will not override the old one. These settings and more can be found on the dyld man page.
The result:
$ date Sun Oct 24 13:21:12 EST 2010
$ DYLD_FORCE_FLAT_NAMESPACE=1 DYLD_INSERT_LIBRARIES=./libTime.dylib date Sat Jan 1 00:00:00 EST 2000
Here is a screenshot of an app I plan on releasing soon. As you can see, it has an iTunes-like interface, with some extra features for downloading music. You can search YouTube for songs, and download them from YouTube directly into your music library.
Leave a comment or contact me if you would like to beta test.
This PHP function fetches the contents of a URL as it exists in Google’s cache:
function cachedHTMLForURL($url)
{
// Request the cache from Google.
$googleRequestURL = "http://webcache.googleusercontent.com/search?q=" . urlencode("cache:" . $url);
$googleResponse = file_get_contents($googleRequestURL);
// Return false if Google did not have it.
if (preg_match("/^.*<title>cache:/", $googleResponse))
return false;
// Remove the first 3 lines of the response, which is inserted by Google.
$importantHTML = preg_replace("/^(.*\n){3}/", "", $googleResponse);
// Allow one line to be inserted, which corrects the base path of the site.
preg_match_all("/<base href=\"[^\"]*\">/", $googleResponse, $matches);
$base = $matches[0][0] . "\n";
return $base . $importantHTML;
}
Use like so:
echo cachedHTMLForURL("http://news.google.com/");