The singer prays twice or we write an application to help the chorister

Picture to attract attention

Picture to attract attention

I have nothing against books on paper, but you can hardly argue with the fact that their electronic versions are sometimes more convenient due to their smaller size, weight, and the ability to quickly search the contents, unless, of course, these are just scanned pages. No one has canceled tables of contents and subject indexes, but full-text search can be a big help, and at one point I was asked to do something similar…

Knowing the negative attitude of many techies towards religious issues, I risk opening Pandora's box, however, at the same time I hope that this material will be of interest to a wide range of readers. Under the cut, we will dive into the magical world of Android development, and also lightly touch on the topics of databases, music theory and liturgical chants. Happy reading!

Background

Before moving on to the technical part, you need to bring readers up to date. One day, a good person from a liturgical choir came up with a proposal to implement a quick search in the songbook for mobile devices: Russian Catholics have a common collection of liturgical chants, which is actively used in all the dioceses of our vast land during services. In turn, those who sing in the choir primarily are those who have such an opportunity, often these are ordinary parishioners who do not always have a musical education behind them. Even if in this case we are talking about church music, it is obvious that the idea is applicable not only to it, but to almost any collection of notes for vocalists to enjoy.

The first working prototype for the Android OS was put together quite quickly, however, it cannot be called something serious: frankly bad or simply quite old practices and the absence of any architecture, for example, placing all the logic in an Activity, did not make it look good at all. However, this issue is easily fixable, unlike another problem: where to get the source material that the program should display? Even if I wanted to, I couldn’t physically reprint about four hundred scores – and the project was postponed indefinitely until better times… Which suddenly came about a year later – I was shown the website of the liturgical commission at the conference of bishops, where the coveted notes and texts were carefully are freely available in PDF format. Now you definitely won’t be able to get away with it 🙂

Formulating technical specifications

The initial requirements were not strictly formal, but provided the necessary minimum – you need to develop an application for mobile devices that will allow you to search for notes and texts by their name and number in the book without access to the network. According to the laws of the genre, in the process, after the first adequate version was ready, three more important requirements were added that would allow the program to be used by a larger circle of people, namely:

  1. Full text search. Not everyone remembers the number or title as it is written in the collection; searching through the text would be very helpful here.

  2. Possibility to listen to the melody, because Not everyone knows how to read music, much less read from sight.

  3. Adequate memory consumption. Less is better.

Data collection – part one

Here I was tempted to insert something like a reworked quote from Fear and Loathing in Las Vegas, because… In the process, many different technologies were tried, but first things first:

The first thing you needed to do was download all the sheet music to yourself. Unfortunately, the site does not provide the ability to download everything in one archive; doing everything manually through a browser is not our method. Python and the Beautiful Soup library for web page parsing come to the rescue. The main difficulty that arose in the process was that the site did not like my user-agent, and the connection was broken. By making the user-agent more similar to those sent by real browsers, everything worked out.

The works are grouped into sections, the URLs of which were fed to the script. Next, the necessary links for downloading files were pulled out using filtering by CSS selector and available keywords (I was interested in single-voice options).

About early development

It so happened that my first serious development experience was thanks to Java and Android, but ironically at work I have to deal with the web and .NET. During my long break from the Android world, a lot has changed. Google finally abandoned the Eclipse + ADT combination in favor of Android Studio, architectural components appeared, and priority shifted to Kotlin and Jetpack Compose. It is also worth noting one unpleasant personal point: my qualifications are not very high, and this is being corrected quite slowly – it is really difficult to study on your own. At first, all this confused me a little, and I made what seemed to me then a rational decision – to use more familiar technologies and try to start with .NET and C#. I've never been so wrong before…

Probably, many have heard about such technology as Xamarin. I'm sure you heard. In 2022, Microsoft, which owns Xamarin, will release the successor framework to .NET Multi-platform App UI as its replacement. In theory, everything is beautiful – using a common code base in C# and markup in XAML, you can develop for four platforms at once (Windows, Mac OS, Android and iOS, and unofficial support for GNU/Linux was added by enthusiasts) with access to native APIs and orientation to use MVVM architecture.

Unfortunately, at the stage of development when I met it (late 2023), MAUI was very crude, and problems were not long in coming:

  1. Despite the rich library of interface elements, these are the framework’s own components, which are quite different from those native to the target platform. Styling them sometimes turns into a non-trivial task with a bunch of workarounds and crutch solutions.

  2. There are no adequate tools for working with databases. Microsoft offers a simple third party ORM library. For my purposes, this would be enough, but the SQLite wrapper used by this library does not really work with Unicode, incl. with Cyrillic. For example, you can’t just take and compare strings without taking into account case. In theory, SQLite allows you to write your own collation, but any attempts to do something like that immediately led to a crash.

  3. For dessert: the resulting program drags along the .NET runtime, which is why the package swells quite well.

Some of this came out, for example, a more or less adequate looking and working Windows version, but the version for Android did not stand up to criticism.

Some of this came out, for example, a more or less adequate looking and working Windows version, but the version for Android did not stand up to criticism.

The stage of acceptance has arrived – fear has big eyes, and Android doesn’t bite. We will write a “native” version, which will be discussed further in the article.

Data collection – part two

Before you start writing code, you need to prepare the data. This section may seem uninteresting to some, and you may want to move straight to the next section, but we have what we have. It’s not enough to just download sheet music from the site; to implement a search by content, you need to somehow extract the text from them. It would also be nice to have sheet music in a machine-readable format – storing almost 400, albeit short, notes in memory would be overkill.

I was lucky that PDF files have a text layer, but when I tried to copy the text, the output was an unreadable mess. Retyping manually, as mentioned above, is not an option. Then I tried to run them through Tesseract (an open text recognition engine with support for a bunch of languages, including Russian and Latin). Recognition was successful, but there was one “but”: the notes drove the program crazy, and parts of the text that were under the notes were not recognized. Let's not despair; I once again carefully looked at the cracks obtained at the first stage, and thought: what if this is just a broken encoding? My intuition was that it was Windows-1251, read as Windows-1252. The assumption turned out to be correct, and very quickly I had a full directory with text files:

static void Main(string[] args)
        {
            string InputPath = @"Каталог с исходниками";
            string OutputPath = @"Каталог для результатов";

            string[] files = Directory.GetFiles(InputPath, "*.txt");

            Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

            Encoding win1251 = Encoding.GetEncoding(1251);
            Encoding win1252 = Encoding.GetEncoding(1252);
            Encoding utf8 = Encoding.UTF8;

            foreach(var path in files)
            {
                byte[] utf8buf = ReadFile(path);
                byte[] win1252buf = Encoding.Convert(utf8, win1252, utf8buf);
                string decoded = win1251.GetString(win1252buf);
                string filtered = Regex.Replace(decoded, "[^А-ЯЁа-яёA-Za-z0-9\\s\\.\\,\\!\\-]", "");
                filtered = Regex.Replace(filtered, "(?<=\\p{L})(\\s*-\\s*){1,}(?=\\p{L})", "");
                filtered = Regex.Replace(filtered, "\\s", " ");
                Console.WriteLine(filtered);
                filtered = Regex.Replace(filtered.Trim(), "\\s{2,}", " ");
                Console.WriteLine(filtered);

                var fileInfo = new FileInfo(path);
                string outPath = OutputPath + fileInfo.Name;

                using (FileStream fstream = new FileStream(outPath, FileMode.OpenOrCreate))
                {
                    byte[] buffer = win1251.GetBytes(filtered);
                    fstream.Write(buffer, 0, buffer.Length);
                }
            }        
        }

        public static byte[] ReadFile(string path)
        {
            byte[] buffer;
            FileStream fileStream = new FileStream(path, FileMode.Open, FileAccess.Read);
            try
            {
                int length = (int)fileStream.Length;
                buffer = new byte[length];    
                int count;   
                int sum = 0;  
  
                while ((count = fileStream.Read(buffer, sum, length - sum)) > 0)
                    sum += count;
            }
            finally
            {
                fileStream.Close();
            }
            return buffer;
        }

It’s probably a bit overkill to write such things in C#, but I used what was already at hand.

With sheet music, things are a little more complicated, although automated recognition tools also exist. For example, I used Audiveris. It's funny, but if the text recognition engine confused the notes, then the music recognition engine confused the text. Letters were taken for pause signs, duration modifiers like multiplies, staccato, accents and other rubbish that was not in the original. If we try to leave only the graphics in the source file, we quickly discover that all that remains of the entire content is the rulers and note stems. The notes themselves and symbols like keys were also made using text. Did this make me pore over the music editor? Yeah, of course! It's time to remember that PDF is very much a text format, albeit interspersed with binary data. In these sheet music, all text and graphics are compressed using the deflate algorithm, but we don’t need to decompress them.

8 0 obj
<</BaseFont/FPDJPS+LatinX#20Book/DescendantFonts[33 0 R]/Encoding/Identity-H/Subtype/Type0/ToUnicode 32 0 R/Type/Font>>
endobj

10 0 obj
<</Filter/FlateDecode/Length 8928>>
stream
*Каша из бинарных данных*

A quick look in a word processor reveals that the document uses four fonts: Officina, Times New Roman, LatinX Book (a surprisingly anti-search name), and Maestro. We only need to leave the last one, this is the font with the necessary characters used by the Finale program. We open our sheets with a script and, in a familiar way, using the magic of regular expressions, we get rid of everything unnecessary.

string[] files = Directory.GetFiles(InputPath, "*.pdf");

Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
Encoding win1252 = Encoding.GetEncoding(1252);

string pattern = @"^[\d\s]+obj[\w|\W]*?endobj";
Regex regex = new Regex(pattern, RegexOptions.Multiline);

foreach (string path in files)
{
    byte[] buffer = ReadFile(path);
    string content = win1252.GetString(buffer);

    List<string> objects = new List<string>();

    foreach (Match match in regex.Matches(content))
    {
        objects.Add(match.Value);
    }

    for (int i = 0; i < objects.Count - 1; i++)
    {
        if (objects[i].Contains("LatinX") || objects[i].Contains("OfficinaSerif"))
        {
            content = content.Replace(objects[i], "");
            content = content.Replace(objects[i + 1], "");
        }
    }

    FileInfo fileInfo = new FileInfo(path);

    using (FileStream fstream = new FileStream(OutputPath + fileInfo.Name, FileMode.OpenOrCreate))
    {
        byte[] wBuffer = win1252.GetBytes(content);
        fstream.Write(wBuffer, 0, wBuffer.Length);
    }
}

To be fair, it should be said that, of course, everything is imperfect, and I was lucky with the structure of the documents. In the texts, despite the general adequacy and the removal of all non-letter characters, there is still garbage left in some places, the word order is mixed up somewhere. In the case of sheet music, 207 out of 371 sheets were successfully recognized – in some places human participation was initially required. For example, in this snippet:

Let me explain: there is no division into measures or indication of the duration of notes, only the pitch of the sound, compare with the adapted fragment on the right. The machine, unfortunately, does not understand this, and it takes someone more knowledgeable in music than me to competently adapt it. One way or another, the amount of necessary monotonous manual labor is greatly reduced, which is good news.

Someone probably wondered: couldn't they just ask the authors for the source code, explaining their purpose, instead of overcoming difficulties for the sake of overcoming difficulties? Unfortunately no. I spoke with a person who was directly involved in the creation of the collection, but according to him, the compositor’s trace was lost in time like tears in the rainafter all, more than twenty years have passed.

Taming the green robot and filling out the database

Oddly enough, the most difficult stage was precisely the preparation of data; the actual development turned out to be much easier, despite the fact that in some moments I essentially had to learn all over again.

I suggest first “finishing off” the database. The Room library will be used to work with the database. The structure is simple – there is only one table in the database, which, in addition to the key, contains the number, title and lyrics of the song.

@Entity(tableName = "songs")
@Fts4(tokenizer = "unicode61")
data class Song (
    @PrimaryKey
    @ColumnInfo(name = "rowid")
    val rowId: Int,
    val num: String,
    val title: String,
    val lyrics: String?
)

Perhaps this is not a very good option, and it makes sense to add a second table containing separately specified paths to assets (in the current version, access is achieved by matching the file name and number). I draw your attention to the annotation @Fts4telling the DBMS that this is a virtual table that will be used for fast full-text search – the SQLite developers took care of us. The tokenizer parameter defines the rules according to which tokens will be extracted from the text for search. The default tokenizer is case sensitive for non-ASCII characters and punctuation, so it is not suitable for us.

We also need to define the Data Access Object (or simply DAO) interface, which contains data access methods, and directly describe the database instance:

@Dao
interface SongDao {
    @Query("select rowid, num, title from songs")
    fun getAllSongs(): List<Song>

    @Query("select rowid, num, title, snippet(songs) as lyrics from songs where lyrics match :query || '*'")
    fun performLyricsSearch(query: String): List<Song>

    @Query("select rowid, num, title from songs where num match :query || '*' or title match :query || '*'")
    fun performSearch(query: String): List<Song>
}

Now we need to create a repository that will access the DAO and manipulate data (or simply make a selection, as in our case). After the above manipulations, Room will generate a DAO implementation, as well as the necessary queries to create tables. In our case, the query can be entered manually, but with a more complex database structure it will be useful:

@Override
public void createAllTables(@NonNull final SupportSQLiteDatabase db) {
  db.execSQL("CREATE VIRTUAL TABLE IF NOT EXISTS `songs` USING FTS4(`num` TEXT NOT NULL, `title` TEXT NOT NULL, `lyrics` TEXT, tokenize=unicode61)");
// ...
}

Now comes the most unpleasant part: you need to fill out the database, and this time you still have to enter some of the data manually. Fortunately, you can work in a convenient spreadsheet editor like Excel or LibreOffice Calc, then export the painstakingly completed spreadsheet to CSV and generate an insert script using automated means:

using (StreamWriter writer = new StreamWriter(@"C:\Work\songs2db.txt", true))
{
    using (StreamReader reader = new StreamReader(@"C:\Work\title.csv", encoding: win1251))
    {
        string? line;
        while ((line = reader.ReadLine()) != null)
        {
            string[] parsed = line.Split(";");
            string path = @"C:\Work\Output\Decoded\" + parsed[0] + ".txt";
            byte[] win1251buf = ReadFile(path);
            byte[] utf8buf = Encoding.Convert(win1251, utf8, win1251buf);
            string lyrics = utf8.GetString(utf8buf);

            writer.WriteLine("insert into songs(num, title, lyrics) values('{0}', '{1}', '{2}');", parsed[0], parsed[1], lyrics);
        }
    }
}

A little about the interface

Like the base, the interface is extremely simple and essentially consists of two screens: on the first, a search is carried out, the results are displayed in RecyclerView (my kung fu level has not yet reached Compose), in the ViewHolder contained in the adapter, a simple OnClickListener is implemented, which causes a transition to the screen with sheet music:

inner class SongHolder(itemView: View) : RecyclerView.ViewHolder(itemView), View.OnClickListener {
    val textViewNum = itemView.findViewById<TextView>(R.id.text_view_num)
    val textViewTitle = itemView.findViewById<TextView>(R.id.text_view_title)
    val textViewSnippet = itemView.findViewById<TextView>(R.id.text_view_snippet)
    val context = itemView.context

    init { itemView.setOnClickListener(this) }

    override fun onClick(v: View) {
        val current: Song = songs[bindingAdapterPosition]
        val intent = Intent(context, SongActivity::class.java)
        intent.putExtra("NUM", current.num)
        context.startActivity(intent)
    }
}

With the display of sheet music, everything is more interesting: at first I used native librarybut it was in the now closed jCenter repository. Naturally, no one kept a local dependency cache – he was his own evil Pinocchio.

On the other hand, a system class was added to Android 5 that allows you to draw PDFs page by page into Bitmap. Nothing prevents us from writing a couple of extension functions like these:

fun PdfRenderer.Page.createBitmap(density: Int): Bitmap {
    // Размеры страницы - типографские точки (1/72 дюйма)
    val scaleFactor = density / 72
    val bitmap =
        Bitmap.createBitmap(width * scaleFactor, height * scaleFactor, Bitmap.Config.ARGB_8888)

    val canvas = Canvas(bitmap)
    canvas.drawColor(Color.WHITE)
    canvas.drawBitmap(bitmap, 0f, 0f, null)

    return bitmap
}

fun PdfRenderer.Page.renderAndClose(density: Int): Bitmap = use {
    val bitmap = createBitmap(density)
    render(bitmap, null, null, PdfRenderer.Page.RENDER_MODE_FOR_DISPLAY)
    bitmap
}

The pages, in turn, can be displayed in the same RecyclerView. It doesn’t look as nice as using the native library, but the resulting package loses about 16 megabytes. The main problem of this approach is to specify adequate scrolling and scaling, fortunately there is library under the MIT license, using the same mechanism under the hood, and implementing gesture processing.

We play by notes

The requirement to be able to listen to a melody was formulated above, and, as already mentioned, storing several hundred recordings is not our choice. Our choice is to get the notes in a machine-readable format and feed them to the synthesizer, since the MediaPlayer class from the system API allows you to listen to melodies in MIDI format. The set of instructions for the synthesizer does not take up much space in memory – successfully recognized sheets fit into 114 kilobytes, however, you will have to pay for this in sound quality, because the system bank of instruments is not adequate. On the other hand, with the task of playing a melody purely in order to understand what motive it is sung to, this option more than copes.

There is nothing supernatural in launching the player, the only thing you need to remember is that you should not do it in the main thread; I used coroutines and viewModelScope, although there may be better ways.

Reflecting

My little under-perfectionist is still dissatisfied with the sound quality – it's one of those things that can definitely be improved, but most likely trying to do so will cause more problems than benefits. For example, you can use a synthesizer like fluidsynth, which can be fed its own sound bank in the SoundFont 2 format, or completely replace MIDI with tracker modules – why not? The difficulty is that such approaches require the use of native libraries, which I had problems with. It’s not enough to just build a binary using NDK; most likely you’ll also have to write a JNI wrapper and pray that it all works. It’s especially annoying when the project is assembled, the studio sees native implementations of methods called from Java or Kotlin, but upon startup everything crashes loudly with UnsatisfiedLinkError.

As usually happens to me, the work is more of a proof of concept. Despite a fair amount of automation, there is still a lot of manual intervention required (crowdsourcing?), and the source does not include all the pages of the original book, for example, fragments of the liturgy in Latin. One way or another, now at least it's possible download and tryand also dig into the sources if desired. Constructive criticism is warmly welcomed.

PS Many thanks to Dima Roslyakov for the code review. The source code will have to go through more than one iteration of edits, but… This is a potentially endless process, and the article was lying in the drawer for a long time, I decided to publish the current section.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *