Scientists globally are racing to save vital health databases taken down amid Trump chaos

Credit: Monty Rakusen/Getty
Last Thursday, 30 January, bioinformatician Niema Moshiri received a late-night message from a long-time collaborator urging him to back up the website of the US Centers for Disease Control and Prevention (CDC). At the time, rumours had been circulating that the public-health agency, which tracks disease outbreaks and makes its data publicly available, would start removing pages from its website, in response to executive orders issued by President Donald Trump directing government departments to take down public information on gender and diversity.
Moshiri, who is based at the University of California, San Diego, and is a self-described “data hoarder” who creates backups of his personal videos and online receipts and bills, was happy to help. “I never thought I would have to do it for information pages from the federal government,” he says.
In the past week, some US federal government websites containing important data sets and information on public health and demographics have been taken down, including those for global programmes to combat HIV and national surveys of chronic disease. Some have been reinstated; others not. “It was kind of shocking to me that they would just delete pages willy-nilly,” says Moshiri.
Moshiri is one of dozens of researchers in the United States and globally who are scrambling to capture public information on US federal government websites before it is tampered with or disappears. “A lot of people did similar parallel efforts, especially within their own areas of expertise,” says virologist Angela Rasmussen, who is based at the University of Saskatchewan in Saskatoon, Canada, and was the collaborator who messaged Moshiri. She stayed up till 2 a.m. last Friday manually downloading data sets, such as on influenza surveillance. Having created these backups, researchers are now working out how to make them publicly available, she adds.
The CDC and its parent agency, the US Department of Health and Human Services, say that all changes to their websites are in accordance with Trump’s executive orders.
Global effort
Over the weekend, Moshiri got in touch with Charles Gaba, a health-care policy and data analyst based near Detroit, Michigan.
Moshiri helped Gaba to create an alphabetical list of every CDC web link — amounting to more than 7,000 pages, which Gaba manually redirected to the version on the Wayback Machine, a service maintained by Internet Archive, a non-profit organization based in San Francisco, California, that regularly archives web pages, including material from government websites such as that of the CDC. Gaba then posted the entire list on his blog. “It took a couple of days,” says Gaba. “A lot of it is vital, and you don’t know what’s still there, what isn’t there.” Gaba has since posted a similar list of the entire website for the US Food and Drug Administration (FDA), organized by subject.
On his hard drive, Moshiri now has backups of the CDC’s website and data sets, as well as the websites for the FDA and other government agencies. Some of these he downloaded himself and others were first downloaded and shared by others online. He is also in the process of backing up the US Department of Agriculture’s website — all of these sites amount to hundreds of thousands of files and more than 130 gigabytes of uncompressed data, and yet all could fit on a USB flash drive. “These are pretty tiny,” he says.
Moshiri hasn’t shared his backups publicly. If his university agrees that it falls within his role as a faculty member, Moshiri wants to publish an exact untouched copy of the CDC website. And his long-term goal is to back up every federal-government website. “I have 100 terabytes of storage under my desk. In theory, I could stick all of these on.”
Kendra Albert, an attorney with the public-interest technology and media law firm Albert Sellars LLP in Philadelphia, says that works produced by federal government employees as part of their jobs are in the public domain. Generally speaking, it is legal to download government data sets, make backups of government websites and share them, they say. In circumstances in which copyrighted material is included in those data, making copies and sharing them would often fall under the doctrine of fair use if it is done for the purposes of research, advocacy or as a historical record to show what the site looked like at an earlier date, says Albert.
Internet Archive also supports an initiative, known as the End of Term Web Archive, to archive government web pages at the beginning and end of every US presidential term. James Jacobs, a librarian at Stanford University in California who works on the archive, hopes that it could become a central place for the public to access scientists’ archives of government websites.