Cross reference problems in StackEdit

Cross reference

Posted by Jiayin Guo on February 21, 2019

1. Introduction

In the previous post. We are left with some unsolved problems:

  • Cross reference for theorem-like environment in StackEdit
  • LaTex style for theorem-like environment in StackEdit
  • Automatic numbering for theorem-like environment in StackEdit

This post solved the first three problems. We find the following problem are still interesting and will try to solve them in the later post:

  • One Markdown file being able to be rendered correctly on both StackEdit and GitHub Page
  • Automatic numbering for titles, sections and etc.

Below is the effect of cross reference.

Theorem 1.

Let be a complete -dimensional Riemannian manifold of finite volume and with pinched negative sectional curvature : there exits two constants such that

We will give the proof of Theorem 1 in Section 2.

With the code given as below:
enter image description here

It is implemented by the code attached at the end of this post.

We use pair of \\begin{env} and \\end{env} to indicate theorem-like environment the possible option of env can be thm for theorem, lem for lemma, etc.
We use \\label{tag} inside of theorem-like environment to label this environment and use \ref{some tag} to refer , wheresome tag should be consist of word character only.

The post consists of three parts. The first parts explains how Pagedown works. The second parts explains how Benweet and Vanabel’s treatment in this thread. Third part explains what is my adaptations.

2. How Pagedown works

The main reference is Pagedown wiki and its source code. We focus on how the a Markdown text can be translated into a HTML text.

There is a top object called Markdown with three constructors Converter , HookCollection and Editor.

AnEditor object edit use getConverter() method to get a Converter object and has hooks property, which is an HookCollection object.

AnConverter object has makeHtml(text) method and hooks property, which is also an HookCollection object.

An HookCollection object has methods chain(hookname,fun). For registers fun as the next plugin on the given hook hookname.

When we are editing on StackEdit, the raw Markdown text is passed as a string variable text to editor.getConverter().makeHtml(text). Its return value will be treated as input of editor.hooks.onPreviewRefresh(text).

In makeHtml it will run though the pseudo code

hooks=editor.getConverter().hooks
text=hooks.preConversion(text)
do_something(text)
text = _RunBlockGamut(text)
do_something(text)
text = pluginHooks.postConversion(text)

In _RunBlockGamut(text), it will run though the code

text = hooks.preBlockGamut(text, blockGamutHookCallback)
do_something()

where in blockGamutHookCallback() it will run _RunBlockGamut(). Among this functions preConversion(), preBlockGamut(), postConversion(), onPreviewRefresh() are hooks if not chained.

For hooking function except preBlockGamut() are equal to identity=function(x){return x}.

preBlockGamut() is special. It has two arguments with the second argument being a callback function.

To sum up, the raw Markdown file text will go though the followings process before appears in browser:

text=preConversion(text)
...
text =_RunBlockGamut(text)
...
text, smaller_text=preBlockGamut(text)
//preBlockGamut decide to call _RunBlockGamut(smaller_text) or not.
...
text=postConversion(text)
...
text=onPreviewRefresh(text)

The input of the preConversion(text) is the Markdown text. The output of the postConversion(text) is the cooked HTML text. onPreviewRefresh(text) do a final treatment for the cooked HTML text preBlockGamut(text)treat the _RunBlockGamut() treated text and decided if some part of text small_textneed to be treated by _RunBlockGamut() again.

3. Benweet and Vanabel’s treatment

In this thread, Benweet and Vanabel’s solve the following

  • LaTex style for theorem-like environment in StackEdit
  • Automatic numbering for theorem-like environment in StackEdit
    And Benweet implement a cross reference style like\ref{thm:1}.

What they did is the following. (I delete some of their code to simplify the explanation)

preConversion are chained with a function to change \begin…\end to /begin…/end to avoid MathJax processing.

 converter.hooks.preConversion = function (text) {
            text = text.replace(/\\begin{(\w+)}([\s\S]*?)\\end{\1}/g, function (wholeMatch, m1, m2) {
                if(!environmentMap[m1]) return wholeMatch;
                return '/begin{' + m1 + '}' + m2 + '/end{' + m1 + '}';
            });

preBlockGamut are chained with a function do the following

  • put the inside of /begin{m1}…/end{m1} which is m2 into a <div><\div> container with class latex_+environmentMap[m1](for instance latex_thm
  • add a <span></span> tag of class latex_titlewith empty value inside of the container.
  • Use _RunBlockGamut() to treat m2, which is a smaller text.
  • Do a Benweet’s style cross reference by converting \ref{ m} to a hyperlink.
converter.hooks.chain("preBlockGamut", function (text, blockGamutHookCallback) {
            text = text.replace(/\\ref{(\w+):(\d+)}/g, function (wholeMatch, m1, m2) {
                if(!environmentMap[m1]) return wholeMatch;
                return '<a class="latex_ref" href="#' + m1 + ':' + m2 + '">' + environmentMap[m1].title + ' ' + m2 + '</a>';
            });
            return text.replace(/\/begin{(\w+)}([\s\S]*?)\/end{\1}/g, function (wholeMatch, m1, m2) {
                if(!environmentMap[m1]) return wholeMatch;
                var result = '<div class="latex_' + m1 + '"><span class="latex_title"></span>' + blockGamutHookCallback(m2);
                return result + '</div>';
            });

onPreviewRefresh are chained with a function do the following

  • Set a counter thmCounter.num
  • For all container with class name of form latex_thm, set the value of <span></span> to be Theorem with its numberingthmCounter.num.
  • Set each reference with correct number using thmCounter.num
editor.hooks.chain('onPreviewRefresh', function() {
            thmCounter.num = 0;
            excsCounter.num = 0;
            _.each(previewContentsElt.querySelectorAll('[class^="latex_"]'), function(elt) {
                var key = elt.className.match(/^latex_(\S+)/)[1];
                var environment = environmentMap[key];
                if(!environment) return;
                var title = environment.title;
                if(environment.counter) {
                    environment.counter.num++;
                    title += ' ' + environment.counter.num;
                    elt.id = key + ':' + environment.counter.num;
                }
                elt.querySelector('.latex_title').innerHTML = title + '.';
            });
        });

After this three codes, They add some CSS style to make the theorem environments more looks like LaTex ones.

The division of code into three hooking function is for the following reasons.

  • Numbering of the theorems has to be implemented after preBlockGamut() since the input of preBlockGamut() may not be the whole text, as pointed out in the previous section.
  • LaTex style theorem environment has to be treated as early as possible. The ideal situation is that it is implemented at preConversion stage. But in order the inner text of theorem can still utilize some Markdown syntax, say lists or hyperlinks. The earliest stage that it can be implemented is at preBlockGamut().

4. My adaption

The purpose for me to make a adaption is that I want to have a more free cross reference style \ref{ FTA} rather than of a fixed format like \ref{thm:index}. Then I need to solve the following problems.

\ref{thm:index} already indicates which theorem type it will refer. \ref{ FTA} does not. So there is no need to implement a \label{} command inside of the theorem environment.

\ref{thm:index} already indicated correct index of a theorem. \ref{ FTA} does not. I have to calculate the correct index of the referred theorem on the onPreviewRefresh stage.

The way I implement these two function is the following way.
On preCOnversion() stage, for each theorem environment, I detect the if there is command like \label{ tag}, generate a h6 header atop of this theorem with value of tag

On preBlockGamut() stage, I move the tag in a span tag with class latex_label.

On onPreviewRefresh() stage, when treating theorem environment, I exact the value tagof latex_label, theorem type and index and put them into a global dictionary labelmap. I use this information to convert \ref{tag }into a correct theorem type and index, like here Theorem 1.

That’s it for this post. One may check code in the appendix.

5. Appendix

Add the following code in the UserCustom extension of StackEdit v4. If this code does not work, check GitHub for the latest version.

userCustom.onPagedownConfigure = function (editor) {
    var thmCounter  = { num: 0 };
    var excsCounter = { num: 0 };
    var secCounter = { num: 0 };
    var subsecCounter = { num: 0 };
    var subsubsecCounter = { num: 0 };
    var environmentMap = {
        thm:   { title: "Theorem"    ,counter: thmCounter  },
        lem:   { title: "Lemma"      ,counter: thmCounter  },
        cor:   { title: "Corollary"  ,counter: thmCounter  },
        prop:  { title: "Propersition",counter: thmCounter  },
        def:  { title: "Definition" ,counter: thmCounter  },
        rk:   { title: "Remark"     ,counter: thmCounter  },
        prob:  { title: "Problem"    ,counter: excsCounter },
        ex:  { title: "Exercise"   ,counter: excsCounter },
        eg: { title: "Example"    ,counter: thmCounter },
        pf: { title: "Proof" }
    };
    var labelMap={};
    var converter = editor.getConverter();
    // Save the preConversion callbacks stack
    var preConversion = converter.hooks.preConversion;
    converter.hooks.chain("preConversion",function (text) {


        // Change \begin...\end to /begin.../end to avoid MathJax processing
        var re=/\\\\begin{(\w+)}([\s\S]*?)\\\\end{\1}/g;
        var labelre=/([\s\S]*?)\\\\label{(\w+)}([\s\S]*?)/;
        text=text.replace(re, function (wholeMatch, m1, m2) {
          label=m2.match(labelre)
          if (! label) return wholeMatch;
          labelMap[label]=m1;
          m2=m2.replace(labelre,function(wholeMatch,p1,p2,p3){
            return p1+'/label{'+p2+'}'+p3;
          });
          console.log(label)
          return '######   {#'+label[2]+'}'+'\n'+'\\\\begin{' + m1 + '}' + m2 +'\\\\end{' + m1 + '}';
        });

        text = text.replace(re, function (wholeMatch, m1, m2) {
          if(!environmentMap[m1]) return wholeMatch;
          // At this stage we need to keep the same number of characters for accurate section parsing
          return '/begin{' + m1 + '}' + m2 + '/end{' + m1 + '}';
        });

        // Transform \title and \section into markdown title to take benefit of partial rendering

        text = text.replace(/\\(\w+){([^\r\n}]+)}/g, function (wholeMatch, m1, m2) {
            // At this stage we need to keep the same number of characters for accurate section parsing
            if (m1 == 'section') {
                secCounter['num']+=1;
                // \section{} has to be replaced by 10 chars
                return '\n###     ' +secCounter['num'].toString()+'. '+ m2 + '\n';//secCounter
            }
            if (m1 == 'subsection') {
                subsecCounter['num']+=1;
                // \subsection{} has to be replaced by 13 chars
                return '\n####       ' +subsecCounter['num'].toString()+'. '+ m2 + '\n';
            }
            if (m1 == 'subsubsection') {
                subsubsecCounter['num']+=1;
                // \subsubsection{} has to be replaced by 16 chars
                return '\n#####         ' +subsubsecCounter['num'].toString()+'. '+ m2 + '\n';
            }
            if (m1 == 'title') {
                // \title{} has to be replaced by 8 chars
                return '\n##    ' + m2 + '\n';
            }
            return wholeMatch;
        });


        return text;

    });
    converter.hooks.chain("preBlockGamut", function (text, blockGamutHookCallback) {

        text = text.replace(/\\ref{(\w+)}/g, function (wholeMatch, m1) {

            return '<a class="latex_ref" href="">' + m1 + '</a>';
        });
        text = text.replace(/\\(author|date){([\s\S]*?)}/g, '<div class="latex_$1">$2</div>');

        var re = /[\s\S]*?\/label{(\w+)}[\s\S]*/;
        var re2=/([\s\S]*?)\/label{(\w+)}([\s\S]*)/g;
        text= text.replace(/\/begin{(\w+)}([\s\S]*?)\/end{\1}/g, function (wholeMatch, m1, m2) {
            if(!environmentMap[m1]) return wholeMatch;
            var label="";
            if (m2.match(re)){
                label=m2.match(re)[1];
                m2=m2.replace(re2,function(wholeMatch, m1, m2,m3){
                  return m1+m3;
                });
            }


            var result = '<div class="latex_' + m1 +'"><span class="latex_title"></span>'+'<span class="latex_label">'+label+'</span>' + blockGamutHookCallback(m2);
            if (m1 == "proof") {
              result += '<span class="latex_proofend" style="float:right">$2$</span>';
            }
            result+='</div>';
            return result;
        });

        return text
    });
    var previewContentsElt = document.getElementById('preview-contents');
    editor.hooks.chain('onPreviewRefresh', function() {
        thmCounter.num = 0;
        excsCounter.num = 0;
        _.each(previewContentsElt.querySelectorAll('[class^="latex_"]'), function(elt) {
            var key = elt.className.match(/^latex_(\S+)/)[1];
            var environment = environmentMap[key];
            if(!environment) return;
            var title = environment.title;
            if(environment.counter) {
                environment.counter.num++;
                title += ' ' + environment.counter.num;
                elt.id = key + ':' + environment.counter.num;
            }
            elt.querySelector('.latex_title').innerHTML = title + '.';
            x=elt.querySelector('.latex_label').innerHTML;
            elt.querySelector('.latex_label').innerHTML="";
            labelMap[x]={num:environment.counter.num,name:environment.title,label:x};
        });

        _.each(previewContentsElt.querySelectorAll('[class^="latex_ref"]'), function(elt) {


            var label =labelMap[elt.innerHTML];
            if(!label) return;
            elt.getAttributeNode("href").value='#'+elt.innerHTML;
            //href="#' + m1 + ':' + m2 + '">'
            elt.innerHTML=label.name+' '+label.num;
        });

    });
};

userCustom.onReady = function () {
    var style = [
        '.latex_thm, .latex_lem, .latex_cor, .latex_defn, .latex_prop, .latex_rem {',
        '    font-style:italic;',
        '    display: block;',
        '    margin:15px 0;',
        '}',
        '.latex_prob, .latex_examp, .latex_excs, .latex_proof {',
        '    font-style:normal;',
        '    margin: 10px 0;',
        '    display: block;',
        '}',
        '.latex_title {',
        '    float:left;',
        '    font-weight:bold;',
        '    padding-right: 10px;',
        '}',
        '.latex_proofend {',
        '    float:right;',
        '}',
    ].join('\n');
    $("head").append($('<style type="text/css">').html(style));
};