Original Link: https://www.anandtech.com/show/2407



Introduction

Since the arrival of AnandTech's very own Google Mini, Google has issued several updates to this little blue box. We're taking a look at Google's present-day Mini to provide up-to-date insights on this search solution for small-to-medium-sized businesses. The Mini product line currently sports four different licenses, ranging from $1,995 for the ability to search through 50,000 documents, to $8,995 for a machine that will handle up to 300,000 documents. Buyers can opt for an extra year of customer support, which will raise the price by $995. The Google Mini's hardware is identical, regardless of which license one chooses, and all the license plans offer full functionality.



Google's updates to the Mini resulted in a physically smaller enclosure and quite a few new features, some of which we will discuss specifically later in this article. We also investigated the benefits of the Mini's integration with Google Analytics, and to top it off, we did some benchmarking to see exactly how the Mini performs.



Scratching the surface

To provide hands-on experience, Google sent us a brand-new Mini with a license for 50,000 documents. As expected, we received a very complete package, including everything we needed to get our Mini up and running quickly, with minimal fiddling. Naturally, we wanted to take a peek at the Mini's insides, but Google is still not very keen on people trying to break into their systems. As we found in our prior review of the Mini, the case itself is solidly built, with exotic-looking tamper-resistant fasteners, and a large piece of plastic covers the top of the enclosure from front to rear, to inhibit removal .



The current Mini is half its former size, and was delivered with everything we needed to get started.

The Mini's internals reveal that Google has made some changes since AnandTech's last look. The machine's current specs are as follows:

  • Supermicro motherboard (P8SCT)
  • Two 1GB modules of PC4200 DDR2 RAM, running at 533MHz
  • A single 250GB Western Digital SATA2 Hard Drive
  • A 280 Watt PSU
  • A 3GHz Pentium 4 531 (Prescott core)

We questioned whether this hardware was the best choice for a search appliance, a point we touch on later on in the article.



A closer look at what makes the Mini tick.

Unfortunately, our journey of discovery didn't take us beyond the hardware itself. Booting up the Mini, we were greeted with the bootup procedure of Red Hat Linux, which ended at a fitting blue login screen, leaving us completely locked out of the mysteries of what makes Google tick. Google understandably guards its technology carefully from prying eyes.

As before, when installing the Mini, the administrator connects the provided crossover cable to the Mini's admin port to perform initial configuration. This stage of setup is very simple; just assign the Mini a static IP address on the network it'll be crawling, and configure other general network settings. These steps require no special knowledge, since the provided Quick Start guide explains them very nicely. After completing these installation steps, the Mini's administration console is available over the network using a web browser for further configuration. Setting up the Mini's crawler was as simple as giving it some addresses to start from, and adding URL patterns to include and to avoid. For example, if we wished our Google Mini to crawl two separate webservers and a fileserver, we would give the Mini a hyperlink to the websites, and a samba-link (smb://) to the fileserver, and the crawler would get to work.


Some links to the websites are all that is needed for the Mini to get started.

Starting the Mini's actual crawl is as simple as that, but there are many options to provide detailed control. In our case, we let the Mini crawl some of the websites running at our lab, but found that we met the 50,000 page limit in a matter of hours, so we mainly settled for samba crawling in most of our tests.



So what does it actually do?

The Google Mini, stripped of the fancy wording and vague feature descriptions, is a search bot. It looks for what it's configured to search for, keeps track of it, and keeps looking for more indefinitely, until it has reached its page limit. At that point, it stops adding new pages, but will keep its existing index properly updated. After setting it up and unleashing it on your unsuspecting web and file servers, you will find your Mini slowly gathering results and building up its index.

Once the Mini is online, a user visiting its IP address finds the familiar Google search page. As the indexing progresses, the Mini begins to give the results one would expect. It searches the designated websites, and automatically indexes the designated fileservers for files, as well as the contents of the files. The crawler handles common formats such as .pdf, .doc and .xls (full list available here).

One possible area of concern (which we certainly had) is the Mini's ability to search content that normally can't be accessed without authentication. The Mini includes some basic authentication methods for both websites and file shares. We configured our Mini to crawl our samba fileserver by providing it with an existing account with read rights, and although secure websites definitely provide a bigger challenge, the Mini is equipped for most basic authentication processes. We had no problem configuring the Google Mini to use HTTP-based and HTTPS-based login procedures, although more advanced authentication methods require the more expensive Google Search Appliance.

A full list of all possible authentication methods - and a comparison to what the GSA can do - can be found here.

Setting the Mini up properly might require some snooping into the help documentation (which, ironically, isn't searchable). Note that using the "Make Public" checkbox allows you to make secure search results public. The end result is that everyone would be able to see the corresponding URLs in their search-results page, but will need to authenticate if they choose to open a file that requires authentication to be accessed.

Leaving the "Make Public" checkbox unchecked would require people to authenticate before viewing search results from a secured webserver. However, the Mini doesn't yet support this type of restriction for fileservers, meaning that these files are publicly visible to anyone using the search system, so the Mini's administrator should take precautions not to index confidential files on file servers.


Adding in some credentials for our secure content.

Larger organizations may appreciate the ability to use different "collections" for different users or situations (see our previous Mini review for more information on collections). The Mini's administrator can create several collections of, for example, knowledge-base articles and news messages, and have these searched separately. Different front-ends can be added and customized to seamlessly fit the website in question, to make this separation transparent to the user; Anandtech's own search function is an example of this.

Once properly configured, the Google Mini is essentially a basic Google bot. For small intranet-based user groups, this might be all that's ever needed. However, integrating the Google Mini with existing websites supporting a large user base (such as Anandtech.com) calls for some extra functionality to make the addition more seamless, and to take advantage of the full capabilities of the system.



Exploring the Mini's possibilities...

Part of the Google Mini's appeal lies in its ability to customize the user's search experience to a great extent, thanks to considerable added functionality. We will be looking at some of these options now.

OneBox

One of these interesting additions to the Mini is the OneBox functionality. You might know OneBox already from your standard Google searches, where Google uses its different applications to provide you with specific results. In the example below, searching for the location of our lab on a map provides us with a special result that integrates the Google Maps application.


Doing a search on our lab's address gives us a nice map of the location.

Of course, this feature wasn't plainly copied over into the Mini, since people could simply use the standard Google to make use of these possibilities. Instead, Google released an SDK for companies to write their own OneBox modules, and makes existing modules available for download from their gallery. These modules can plug into various existing systems in the Mini owner's network, ranging from LDAP databases to Exchange servers, extracting company-specific data such as employees' contact information, charts and sales numbers. This is interesting for a mostly intranet-based use of the Mini, but the possibilities of OneBox reach beyond that. With the proper OneBox modules installed, a web site's fine-tuned search box could become a user's only need for navigation, and provide them with everything they could possibly want on a subject, on one single result page. This is especially interesting, considering that the faster a user finds the type of data he/she is looking for, the more likely he/she is to remain at the site, and browse beyond what he/she originally came to find.

Even though installing modules can be as simple as importing their configuration file through the Mini's administration panel, the real potential of this feature requires in-depth knowledge of the way things work, and what your users are actually looking for. Luckily, Google has added more functionality to give the Mini's owner better insight into the search experience of the site's users, so it can be improved.

Google analytics

Perhaps it's not really a part of the Google Mini's package in the truest sense, but Google Analytics is a valuable addition to any website, particularly for its users' search-result pages. Google Analytics doesn't offer an all-in-one solution to all of a web site's traffic problems, but rather a way for web masters to identify their visitors' behavior, and to perform optimizations based on what these visitors do.



An example of the results provided by Google Analytics (screenshot taken using an external application).

One could, for example, track which keywords users associate with particular subjects, and tune the search engine to provide more-relevant results (more on this possibility later). Google Analytics also answers such as how users arrived at the website, what searches they performed, and perhaps most importantly, whether the entry page was relevant to them, or whether they turned away immediately. These insights are very interesting to web admins looking to improve their site's usability.

This is what Google Analytics does, and has done for a while already. What's most interesting about Google Analytics' integration with the Google Mini, however, is that it links right in with the new optimization features the system has received. Proper implementation and development of OneBox modules (and any other search engine optimizations) is mostly a guessing game, as long as there are no proper usage statistics upon which to base these optimizations. While it is free, the Analytics platform provides an ideal playground for admins looking to take their usability one step further. Integration with the Google Mini can be done by simply including one's Google Analytics account ID in the front-end configuration for your search, and a snippet of code into every other page that should be tracked.

Search optimization

The results of long-term analysis are useful to tweak the workings of the search engine, and the Mini bundles quite a few tweaking options. The Mini's admin can take control of these tweaks from the front-end customization section of the administration console.



One of these tweaks is the Related Queries tab, which allows us to enter things like synonyms, and subjects that are very closely related to each other. The engine will then use these synonym entries to suggest other search queries to the user. One downside of the Related Queries feature, however, is that these queries need to be entered both ways. We couldn't really see a specific reason for this.


The integrated dynamic spelling suggestion is another notable tweaking capability. While indexing your content, the Mini creates a dictionary of sorts, containing the vocabulary used in your files. The dynamic spelling-suggestion feature kicks in when it encounters typos, allowing it to suggest alternate spellings to get better results.

This capability seems quite similar to the related-queries feature The very first time a typo is made, it simply returns no results, but the Mini adds it to its internal "list of typos" and compares it to the existing dictionary to find its closest match. The second time the typo is encountered, we get a suggestion to search for the closest matching word with search results.


In addition to the Related Queries and dynamic spelling suggestion tweaks, the Mini includes a feature to promote specific search results, named "KeyMatch". This feature might come in handy when you've added new content related to a certain subject, and you would like to make this clear to the users.


With KeyMatch, by simply adding the search terms that the result that the result should be matched with, along with the corresponding URL and a title, the Mini's admin can turn these pages into eye-catchers among the search results.



... And downsides

Though the Google Mini package is undoubtedly high in quality, we can't claim it's completely flawless. As we hinted previously, one of the first real limitations we ran into had to do with security. On both fileservers and webservers, the Mini is currently rather limited in properly handling secured content. The fact that the Google Search Appliance offers many of these capabilities isn't much consolation, because the price jump will be too large for smaller companies to afford, requiring them to either skip the crawling of their secure documents, or else find other solutions.

Our own internal wiki system is a good illustration of this problem. It contains information that's confidential to our lab, so we require users to authenticate before viewing any information. Since this system uses cookie-based authentication, the Mini was unable to crawl any of our wiki pages, which was actually the primary use we had in mind for it.

The Mini's inability to enforce authentication for our samba shares created another obvious problem for our lab, where access to nearly every file would ordinarily require authentication. Having the Mini crawl our fileserver left our documents open for anyone to see, and since the Mini serves the results as direct download links, our own safety measures were rendered ineffective .



Using the Mini to index our fileserver made our secured content downloadable to everyone with access to the Mini.

Luckily, this did not cause too much trouble in our case, since our intranet is only accessible to a limited group of users, but it is still a factor to keep in mind when considering a Mini. Some changes to existing security systems may be necessary to keep sensitive content safe.

The lack of control over the Mini itself also bothered us.. Not only are we completely locked out of the OS itself, but monitoring the system's status is out of the question. Though the administration panel does give us a "System status" page, the info provided here is very sparse, and might as well not have been there at all. We hope Google will implement more detailed monitoring here in the future.



A screenshot containing the full package of the Mini's monitoring tools.

Lastly, we would also appreciate an easier overview of the crawler, and more direct control of it. At this point, any directions it can be given seem to be put into a priority queue of sorts, which does get crawled first, but isn't really clear, and gives you no real feedback on the results of your commands. This may cause some confusion as to whether your commands are really being handled at all, and made us wish we could actually see what the crawler was doing in real time. Granted, there are actually some reporting options available, but we found them rather lacking, and using the wonderful Google Analytics system really puts the Mini's built-in reporting options to shame.

In general, we feel that there's still a lot of room for improvement in the Mini's management console, both in usability and general provision of information. We hope that Google looks into these issues when they release their next update for the machine.



Crunching numbers

The lack of system-performance information from the Google Mini system made it impossible to provide our readers with detailed information on the CPU and memory performance of the Google Mini. However, we couldn't resist testing Google's advertised claim of "25 queries per second." To run our tests, we are using our own software, a suite of benchmarking tools developed right here in the lab, currently code-named APUS (Application Unique Stress-testing). You might remember our mention of this application and a more detailed description from Johan's previous articles. Using a well-rounded mix of both simple and complex search queries, we stress-tested the Mini to see how well it performs.

Throughput

We found that the machine performs remarkably well, considering its aged hardware. It actually performs quite a bit better than Google claims, peaking at 40 responses per second. Mind you, this result is how quickly the actual results appear to the user at the search page, not merely the time it took for the Mini to search through its internal database.

In the graph below, the blue line is the most important. These are the actual responses returned to our software, whereas the red line is simply the amount requested. Under ideal conditions, if we sent the Mini a certain number of requests per second, we would expect it to return that same number of responses per second. In reality, this is not always the case, as the application's response time can be limited by several factors, which makes it impossible to guarantee the same throughput under all conditions.



The server actively starts blocking our connections when we push it beyond 100 requests per second, which is a fair limitation considering that a Mini shouldn't ever have to process requests at such a high rate.

Power consumption

The second thing we found interesting - especially in light of Google's environmental efforts - was the power consumption of the Mini, considering it is a machine intended to run non-stop. As we alluded to earlier, we were initially a bit worried that its hardware setup would be power-hungry, and we were not entirely mistaken. Our main problem with this setup is that, while its power consumption is definitely not over the top, it could still be much better. The CPU choice in particular seems a bit weak in this respect, and we feel that Google could have built a much "greener" machine (for example, by using a Core 2 mobile CPU).

The results below were recorded during the same test as our Throughput graph. Since we cannot measure the CPU load, we provide the power-consumption results as a function of the number of requests sent per second.



General speed

Giving people a sense of the "general" speed of an appliance is never an easy task, since speed is always relative to multiple factors such as CPU load or RAM usage. Since we have no way of quantifying these factors, we cannot simply judge the Mini to be "fast" or "slow" at what it does, but we can try to tell you how long it took our Mini to do certain things.

When starting from an empty index, it generally took our Mini at least 36 hours to build up 50,000 results on our fileserver, at which point its limit was reached. We checked its speed at regular intervals and found the crawling rate during that time to vary between 0 and 15 pages per second.



An overview of the crawler's status.

In continuous-crawl mode (which we used during all our testing), we generally had to wait 3-5 minutes before our recrawl commands were carried out.



Conclusion

After working with the Google Mini for a while in our own lab, it's easy to understand the appeal of a functional, well-tuned search appliance. Correctly configured, the Mini can dramatically enhance users' ability to find both web content, and files and their contents, on the network, and is likely to quickly repay the initial investment with increased productivity.

In terms of price, Google positions the Mini competitively, especially considering the amount of time it would take a company to implement their own fully functional search engine and maintain it. The Mini's array of standard features and flexibility make it too important not to consider.

However, we feel that even though its features are very rich and full of possibilities, the Mini is definitely not the "plug-and-play" machine Google advertises. Though the initial setup might be quick, there is a great deal of difference between the default "search bot" behavior and the fully integrated/optimized searching experience that the Mini is capable of, and we felt that threshold was still just a tad too high. We also noted the rather odd feature restrictions of the Mini as compared to the Google Search Appliance. The restrictions on the maximum number of pages to index is perfectly understandable, but the reasoning that smaller companies would have less use for advanced security systems doesn't make as much sense. A small company looking for an easy and cheap way out might end up surprised by the complexity involved in mastering the behavior of search engines.

To be fair, there simply is no true solution to the search-optimization problem yet. Optimizing searches to best match user behavior has always been a matter of close-up analysis, and the steps Google is taking to facilitate this process are still quite remarkable. As much as we would have loved to have a little peek at what was really going on inside that little blue box, we were satisfied to get a closer look at Google's efforts to put its technology to work in businesses all over the world, and the way they look to increase their users' satisfaction by consistency and simplicity. Though there is definitely room for improvement on the administrative side, the Mini definitely provides its end users the power and familiar feel of the Google web search.

In closing, we'd like to thank Peter Griffin of Google, who helped us out a great deal while exploring the Mini's features.

Log in

Don't have an account? Sign up now