Subscribe to our Newsletter

The grassroots push to digitize India’s most precious documents

Moreover, most public libraries aren’t freely accessible to the public. “Getting access to many of our public libraries is so difficult, and after a point people will give up asking for access. That’s the case in many of our public-funded educational institutes too,” says Arul George Scaria, an associate professor at the National Law School of India University Bengaluru, who studies intellectual-property law. One of the best ways to liberate access to these libraries, he says, is through digitization.

Technologist Omshivaprakash H L felt the acute lack of such resources when he needed references for writing Wikipedia articles in Kannada, a southwestern Indian language. Around 2019, he heard that Carl Malamud, who runs Public Resource, a registered US charity, was already archiving books like Gandhi’s Hind Swaraj collection on Indian self-rule and works of the Indian government in the public domain. “I also knew that he used to buy a lot of these books from secondhand bookstores and take them to the US to get them digitized,” says Omshivaprakash. 

Public Resource had been working with the Indian Academy of Sciences, Bengaluru, to digitize its books using a scanner provided by the Internet Archive, but the efforts had tapered off. Omshivaprakash proposed engaging community members to help. During the weekends, these volunteers began scanning some of the books Omshivaprakash had and that Malamud had bought. “Carl really understood the idea of community collaboration, the idea of local language technology that we needed, and the kind of impact we were creating,” Omshivaprakash says.

The scanners use a V-shaped cradle to hold the books and two DSLR cameras to capture the pages in high resolution. The device is based on the Internet Archive’s scanner but was reengineered by Omshivaprakash and manufactured in India at a lower cost. Each worker can scan about 800 pages an hour. 

The more crucial parts of the operation happen after the scan: volunteers make sure to apply accurate metadata to make the scans findable on the Internet Archive, and optical character recognition, which has been fine-tuned to work better for a range of Indian language scripts, makes the text searchable and accessible through text-to-speech programs.

Public Resource funds the SoK project, and Omshivaprakash manages the operation, with the help of staff and volunteers. Collaborators have come through social media and word of mouth. For instance, a community member and Kannada teacher named Chaya Acharya approached Omshivaprakash with newspaper clippings of work by her grandfather, the renowned journalist and writer Pavem Acharya, who wrote articles on science and social issues as well as satirical essays. Unexpectedly, she found more articles by her grandfather in the existing Servants of Knowledge collection. “Simply by searching his name, I got many more articles from the archive,” she says. She began collecting copies of Kasturi, a prominent Kannada monthly magazine that Pavem Acharya had edited from 1952 to early 1975, and gave them to Omshivaprakash for digitizing. The old issues of the magazine contain rare writings and translations by popular Kannada authors, such as Indirabai by Gulavadi Venkata Rao, regarded as the first modern novel in Kannada, and a Kannada translation of Edgar Allan Poe’s famous short story “The Gold-Bug.”

Leave a Reply

Your email address will not be published. Required fields are marked *