DocumentCloud's search is powered by Solr, an open source search engine by the Apache Software Foundation. Most of the search syntax is passed through directly to Solr — you can read Solr's documentation directly for information on how its syntax works. This document will reiterate the parts of that syntax that are applicable to DocumentCloud, as well as parts of the search that are specific to DocumentCloud.
You may specify either single words to search for, such as document
or report
, or a phrase of multiple words to be matched as a whole, by surrounding it in double quotes, such as "the mueller report"
.
Terms can use ?
to match any single character. For example ?oat
will match both goat and boat. You may use *
to match zero or more characters, so J*
will match J, John, Jane or any other word beginning with a J. You may use these in any position of a term — beginning, middle or end.
<aside> <img src="/icons/key_yellow.svg" alt="/icons/key_yellow.svg" width="40px" />
This feature is only available to authenticated users. You may register for a free account at https://accounts.muckrock.com/.
</aside>
By appending ~
to a term you can perform a fuzzy search which will match close variants of the term based on edit distance. Edit distance is the number of letter insertions, deletions, substitutions, or transpositions needed to get from one word to another. This can be useful for finding documents with misspelled words or with poor OCR. By default ~
will allow an edit distance of 2, but you can specify an edit distance of 1 by using ~1
. For example, book~
will match book, books, and looks.
<aside> <img src="/icons/key_yellow.svg" alt="/icons/key_yellow.svg" width="40px" />
This feature is only available to authenticated users. You may register for a free account at https://accounts.muckrock.com/.
</aside>
Proximity searches allow you to search for multiple words within a certain distance of each other. It is specified by using a ~
with a number after a phrase. For example, "mueller report"~10
will search for documents which contain the words mueller and report within 10 words of each other.
Range searches allow you to search for fields that fall within a certain range. For example, pages:[2 TO 20]
will search for all documents with 2 to 20 pages, inclusive. You can use {
and }
for exclusive ranges, as well as mix and match them. Although this is most useful on numeric and date fields, it will also work on text fields: [a TO c]
will match all text alphabetically between a and c.
You can also use *
for either end of the range to make it open ended. For example, pages:[100 TO *]
will find all documents with at least 100 pages, while pages:[* to 20]
will find all documents with at most 20 pages.
Boosting allows you to alter how the documents are scored. You can make one of your search terms more important in terms of ranking. Use the ^
operator with a number. By default, terms have a boost of 1. For example, mueller^4 report
will search for documents containing mueller or report but give more weight to the term mueller.
By default, text is searched through title and source boosted to 10, description boosted to 5, and text boosted to 1. You can search any field specifically by using field:term
syntax. For example, to just search for documents with report in the title, you can use title:report
. The fielded search only affects a single term — so title:mueller report
will search for mueller in the title, and report in the default fields. You can use title:"mueller report"
to search for the exact phrase "mueller report" in the title, or use grouping, title:(mueller report)
to search for mueller or report in the title.