Astronomy, big data and clouds – how technology helps to study the Universe
How do astronomers usually work? They agree with the observatory the date and time of using the telescope, and on the appointed day they conduct observations, uploading the collected data. But due to the fact that telescopes generate more and more useful information, traditional methods stop working. And scientists found a way out by using cloud technologies. Cloud4Y explains how Stargazers now work.
Vera Rubin Observatory in Chile can collect 20 terabytes of data per night. This is largely due to its good location. The research wide-angle mirror telescope is located at an altitude of 2715 m on the peak of El Peñon (this is northern Chile). The design of the telescope is unique in that it has a very wide field of view: 3.5 degrees in diameter or 9.6 square degrees. By comparison, both the Sun and the Moon are visible from Earth as objects that are 0.5 ° horizontally or 0.2 square degrees. Combined with the large aperture, this allows it to have an extremely high collecting force. In other words, the telescope is capable of acquiring data from vast areas of the sky simultaneously. The “engineering” first light is planned to be received in May 2021, the entire system – in October 2021, and in October 2022 to begin full operation
20 terabytes is about the same as in Sloan Digital Sky Survey, which offers the most detailed three-dimensional maps of the universe, and in which all data from 2000 to 2010 is collected. But that is not all. Project Square Kilometer Array, which should earn in 2020, will increase this volume a hundredfold, to 2 petabytes per day (when reaching maximum capacity in 2028). And the next generation equipment (ngVLA), according to the heads of the observatories, will generate hundreds of petabytes.
It is not easy to process such volumes of data. You can’t just download them and store them somewhere. And creating support for local computing resources to work is too expensive. According to some estimates, the cost of organizing an IT infrastructure from scratch and maintaining the staff required to support the Vera Rubin Observatory could approach $ 150 million over 10 years. So astronomers from Chile, like many of their colleagues, turned to the cloud. And here are the conclusions they have already drawn.
Investment in computing power is good for science
It is not enough to move the data to the cloud; researchers must be able to interact with it. Instead of the traditional model of work, when astronomers transferred data to their computers, they now download their code to work with the data available in the cloud. Thanks to the availability of online access to the scientific platform of the observatory (Jupyter notebooks for programming in Python, Julia, R, etc., application programming interfaces (APIs) for analyzing, viewing and searching data), users can write and run Python code for remote analysis of everything the observatory dataset on servers hosted at the National Center for Supercomputing Applications in Urbana, Illinois. And you don’t need to download anything to your computer.
In other branches of science, this approach is very effective. For example, the Pangeo project, which is a big data analysis platform for the earth sciences, has made petabytes of climate data public and computable, making it easier for researchers to collaborate.
Convenient even when working without big data
Evelina Momcheva, who works with a space telescope in Baltimore, Maryland, says she has encountered cases where projects using only medium-sized data have benefited from cloud computing. If only because the researchers were able to access resources vastly superior to the performance of their laptops. And, what is important, at a relatively low cost. And some cloud providers offer free resources for educational purposes.
In 2015, Momcheva and her colleagues only had an 8-core server for their 3D-HST project, which analyzed data from the Hubble Space Telescope to understand the forces that shape galaxies in the distant universe. Resources were scarce and they turned to the cloudstaking five 32-core machines. Why? But because after preliminary calculations it turned out that the analysis on our own machines would take at least three months. With a cloud provider, it took five days and less than $ 1,000. ”
Price isn’t everything
Disputes about whether cloud services are cheaper compared to their own IT infrastructure, if they subside, will not soon. Both sides have strong arguments. For example, a 2011 report by the US Department of Energy Magellan on cloud computing concluded that the department’s computing centers are generally cheaper than renting cloud services. However, a lot of water has flowed under the bridge since that time, and technologies have changed dramatically.
Optimization of work with cloud services, according to the University of Washington, can level these differences. The researchers were able to prove that the $ 43 cloud experiment was only $ 6 after a few months of work and cost optimization. They also estimated that completing the same tasks in comparable time using their own resources would cost the team about $ 75,000 (in hardware, electricity, and staff wages), with the servers running 87% of the time for three years.
Saving time often influences decision making. When your IT infrastructure takes nine months to process your data, and the cloud only takes a month, and for about the same money, then that eight months difference becomes very interesting.
Astronomers say they have no desire to cross over to one side. On the contrary, the use of local infrastructure for everyday tasks and “clouds” – for complex computing – is the optimal model for many research centers.
Data consolidation opens new horizons
Another thing astronomers love very much is the ability to combine multiple big data sets. Their combination can provide some information that would not be obvious for each set separately. That is, the more information astronomers gather together, the more useful it becomes.
Inspired by the project Data Commons NIH, where scientists store and exchange biomedical and behavioral data and software, the researchers plan to create the Astronomy Data Commons. Scientists at the University of Washington have already published one dataset called the Zwicky Transient Facility, which includes 100 billion observations of approximately 2 billion celestial objects. If this work is useful, other astronomers may follow suit. Then a whole astronomical ecosystem will be created, the possibilities of which can only be dreamed of.
It’s not enough to go to the cloud, you need to know how to use it
To work with data in the cloud, users need to create an account, choose one of the many options for interacting with information, install their own (often self-written or custom-made) software. Moreover, configure everything so that the software can run on several machines at the same time. Mistakes are inevitable, and they can cost researchers dearly, discouraging their interest in cloud technology. There was a case when inept graduate students “burned” a couple of thousand hours of processor time for nothing. Therefore, scientists are advised to first train “on cats”, launching small pilot projects using their own infrastructure.
It is also important not to forget about safety requirements. Although privacy and security in the cloud is better than on-premises resources, setting up a cloud infrastructure can be challenging. And the mistake of an inexperienced programmer will lead to the fact that your data will be available to the whole world. When using our own IT park, such problems are controlled more tightly. And in the cloud, it’s easy to screw up if you don’t listen to the recommendations of the provider’s technical experts.
In general, the desire of astronomers to use cloud resources for studying stellar systems, building models of the formation of Universes and storing “data lakes” is understandable. Heavy computing has long been left at the mercy of equipment in data centers. Cloud platforms have greatly transformed science and business, becoming an important tool for the development of human thought. The main thing is to use this tool correctly.
What else is interesting in the blog Cloud4Y
→ “Do it yourself”, or a computer from Yugoslavia
→ US Department of State will create its great firewall
→ Artificial intelligence sings about revolution
→ What is the geometry of the universe?
→ Easter eggs on topographic maps of Switzerland
Subscribe to our Telegram-channel so as not to miss another article. We write no more than twice a week and only on business.