feat(parser): rewrite lexer to make it faster (#50)

* feat(parser): first iteration of new lexer * feat(parser): convert token string props to number props * refactor(parser): optimize char grabber * refactor(parser): working on new lexer * refactor(parser): convert token string props to number props * refactor(parser): rebuild lexer, add tag attrs parsing * refactor(parser): rework word parsing and tag parsing * refactor(parser): rework to pass tests * refactor(parser): rework tag parsing * refactor(parser): rework escape tags parsing * refactor(parser): rework tests * refactor(parser): all test pass * refactor(parser): make lexer faster by move mode switching in loop * refactor(parser): remove all state map objects * refactor(parser): order of parsing states * refactor(parser): state switching without return * refactor(parser): rename buffers to chars * refactor(lexer): reduce function calls * feat(lexer): add new parser tests and code to pass it * fix(utils): remove unused variable in char grabber * feat(lexer): add test for new lexer bug * chore(*): add lexer and lexer2 to benchmark * chore(lexer): add some debug info for char grabber * feat(parser): add new test for single attributes without values * fix(lexer): paired tags tests * refactor(lexer): comment breaking changes tests for future releases * feat(core): improve tests * refactor(parser): add more tests, reduce char grabber size * refactor(parser): reduce utils size * refactor(parser): remove unused code from tag parsing code * refactor(parser): remove unused code from word to tag transforming code * chore(benchmark): fix benchmark imports
2026-06-14 18:42:24 +03:00 · 2020-12-09 01:03:48 +02:00
parent fda6ddd6ee
commit 772d422d77
13 changed files with 998 additions and 359 deletions
@@ -10,12 +10,12 @@ const TOKEN_VALUE_ID = 'value'; // 1;
 const TOKEN_COLUMN_ID = 'row'; // 2;
 const TOKEN_LINE_ID = 'line'; // 3;

-const TOKEN_TYPE_WORD = 'word';
-const TOKEN_TYPE_TAG = 'tag';
-const TOKEN_TYPE_ATTR_NAME = 'attr-name';
-const TOKEN_TYPE_ATTR_VALUE = 'attr-value';
-const TOKEN_TYPE_SPACE = 'space';
-const TOKEN_TYPE_NEW_LINE = 'new-line';
+const TOKEN_TYPE_WORD = 1; // 'word';
+const TOKEN_TYPE_TAG = 2; // 'tag';
+const TOKEN_TYPE_ATTR_NAME = 3; // 'attr-name';
+const TOKEN_TYPE_ATTR_VALUE = 4; // 'attr-value';
+const TOKEN_TYPE_SPACE = 5; // 'space';
+const TOKEN_TYPE_NEW_LINE = 6; // 'new-line';

 /**
 * @param {Token} token
@@ -105,14 +105,15 @@ class Token {
   * @param row
   */
  constructor(type, value, line, row) {
-    this[TOKEN_TYPE_ID] = String(type);
+    this[TOKEN_TYPE_ID] = Number(type);
    this[TOKEN_VALUE_ID] = String(value);
    this[TOKEN_LINE_ID] = Number(line);
    this[TOKEN_COLUMN_ID] = Number(row);
  }

  isEmpty() {
-    return !!this[TOKEN_TYPE_ID];
+    // eslint-disable-next-line no-restricted-globals
+    return isNaN(this[TOKEN_TYPE_ID]);
  }

  isText() {