Files
markitect-main/capabilities/testdrive-jsui/node_modules/graphemer/README.md
tegwick 17c62aadaa feat: complete testdrive-jsui capability extraction with full JavaScript test integration
Extract JavaScript UI framework functionality into dedicated testdrive-jsui capability
while maintaining 100% functionality preservation and integrating JavaScript tests
into the main Python test suite.

Phase 1 (Foundation Setup) - COMPLETED:
- Created capability directory structure with proper Python package layout
- Configured pyproject.toml with Node.js subprocess dependencies
- Set up package.json with Jest + JSDOM testing framework
- Implemented Python-JavaScript bridge for seamless test integration
- Created comprehensive capability Makefile with all testing targets
- Added detailed README documentation for capability usage

Phase 2 (Integration Layer) - COMPLETED:
- Built Python test wrappers for JavaScript test execution via subprocess
- Integrated with pytest discovery system for unified test experience
- Added capability targets to main Makefile delegation system
- Verified test integration works with main test suite

Phase 3 (Safe Migration) - COMPLETED:
- Copied (not moved) all JavaScript files to capability using safe copy-first approach
- Migrated 4 core JavaScript components and 11 test files (2,840+ lines)
- Verified all tests work in new location (11 Python tests + 7 JavaScript tests passing)
- Maintained dual-track testing capability for safety during transition

Phase 4 (Framework Enhancement) - COMPLETED:
- Enhanced testing framework with Python integration and coverage reporting
- Achieved 59% Python test coverage and 100% JavaScript test coverage
- Added performance benchmarking and component documentation

Phase 5 (Production Integration) - COMPLETED:
- Added standard 'test' target to capability Makefile for discovery system compatibility
- Integrated JavaScript tests into main Makefile with new targets:
  * test-js: Run JavaScript UI tests
  * test-all: Run all tests (Python + JavaScript + Capabilities)
- Updated help documentation to include new testing workflows
- Verified capability auto-discovery works via 'make test-capabilities'

Key Achievements:
- Zero-risk migration completed with copy-first safety approach
- Full Python-JavaScript test integration with 18 total passing tests
- JavaScript UI framework successfully extracted to dedicated capability
- Enhanced CI/CD integration with unified test command interface
- Clean architecture enabling future JavaScript framework evolution

Testing Status:
-  All Python integration tests passing (11/11)
-  All JavaScript component tests passing (7/7)
-  Capability discovery integration working
-  Main test suite integration complete
-  Test coverage reporting functional (59% Python, 100% JavaScript)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-09 22:29:30 +01:00

133 lines
5.9 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Graphemer: Unicode Character Splitter 🪓
## Introduction
This library continues the work of [Grapheme Splitter](https://github.com/orling/grapheme-splitter) and supports the following unicode versions:
- Unicode 15 and below `[v1.4.0]`
- Unicode 14 and below `[v1.3.0]`
- Unicode 13 and below `[v1.1.0]`
- Unicode 11 and below `[v1.0.0]` (Unicode 10 supported by `grapheme-splitter`)
In JavaScript there is not always a one-to-one relationship between string characters and what a user would call a separate visual "letter". Some symbols are represented by several characters. This can cause issues when splitting strings and inadvertently cutting a multi-char letter in half, or when you need the actual number of letters in a string.
For example, emoji characters like "🌷","🎁","💩","😜" and "👍" are represented by two JavaScript characters each (high surrogate and low surrogate). That is,
```javascript
'🌷'.length == 2;
```
The combined emoji are even longer:
```javascript
'🏳️‍🌈'.length == 6;
```
What's more, some languages often include combining marks - characters that are used to modify the letters before them. Common examples are the German letter ü and the Spanish letter ñ. Sometimes they can be represented alternatively both as a single character and as a letter + combining mark, with both forms equally valid:
```javascript
var two = 'ñ'; // unnormalized two-char n+◌̃, i.e. "\u006E\u0303";
var one = 'ñ'; // normalized single-char, i.e. "\u00F1"
console.log(one != two); // prints 'true'
```
Unicode normalization, as performed by the popular punycode.js library or ECMAScript 6's String.normalize, can **sometimes** fix those differences and turn two-char sequences into single characters. But it is **not** enough in all cases. Some languages like Hindi make extensive use of combining marks on their letters, that have no dedicated single-codepoint Unicode sequences, due to the sheer number of possible combinations.
For example, the Hindi word "अनुच्छेद" is comprised of 5 letters and 3 combining marks:
अ + न + ु + च + ् + छ + े + द
which is in fact just 5 user-perceived letters:
अ + नु + च् + छे + द
and which Unicode normalization would not combine properly.
There are also the unusual letter+combining mark combinations which have no dedicated Unicode codepoint. The string Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘ obviously has 5 separate letters, but is in fact comprised of 58 JavaScript characters, most of which are combining marks.
Enter the `graphemer` library. It can be used to properly split JavaScript strings into what a human user would call separate letters (or "extended grapheme clusters" in Unicode terminology), no matter what their internal representation is. It is an implementation on the [Default Grapheme Cluster Boundary](http://unicode.org/reports/tr29/#Default_Grapheme_Cluster_Table) of [UAX #29](http://www.unicode.org/reports/tr29/).
## Installation
Install `graphemer` using the NPM command below:
```
$ npm i graphemer
```
## Usage
If you're using [Typescript](https://www.typescriptlang.org/) or a compiler like [Babel](https://babeljs.io/) (or something like Create React App) things are pretty simple; just import, initialize and use!
```javascript
import Graphemer from 'graphemer';
const splitter = new Graphemer();
// split the string to an array of grapheme clusters (one string each)
const graphemes = splitter.splitGraphemes(string);
// iterate the string to an iterable iterator of grapheme clusters (one string each)
const graphemeIterator = splitter.iterateGraphemes(string);
// or do this if you just need their number
const graphemeCount = splitter.countGraphemes(string);
```
If you're using vanilla Node you can use the `require()` method.
```javascript
const Graphemer = require('graphemer').default;
const splitter = new Graphemer();
const graphemes = splitter.splitGraphemes(string);
```
## Examples
```javascript
import Graphemer from 'graphemer';
const splitter = new Graphemer();
// plain latin alphabet - nothing spectacular
splitter.splitGraphemes('abcd'); // returns ["a", "b", "c", "d"]
// two-char emojis and six-char combined emoji
splitter.splitGraphemes('🌷🎁💩😜👍🏳️‍🌈'); // returns ["🌷","🎁","💩","😜","👍","🏳️‍🌈"]
// diacritics as combining marks, 10 JavaScript chars
splitter.splitGraphemes('Ĺo͂řȩm̅'); // returns ["Ĺ","o͂","ř","ȩ","m̅"]
// individual Korean characters (Jamo), 4 JavaScript chars
splitter.splitGraphemes('뎌쉐'); // returns ["뎌","쉐"]
// Hindi text with combining marks, 8 JavaScript chars
splitter.splitGraphemes('अनुच्छेद'); // returns ["अ","नु","च्","छे","द"]
// demonic multiple combining marks, 75 JavaScript chars
splitter.splitGraphemes('Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞'); // returns ["Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍","A̴̵̜̰͔ͫ͗͢","L̠ͨͧͩ͘","G̴̻͈͍͔̹̑͗̎̅͛́","Ǫ̵̹̻̝̳͂̌̌͘","!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞"]
```
## TypeScript
Graphemer is built with TypeScript and, of course, includes type declarations.
```javascript
import Graphemer from 'graphemer';
const splitter = new Graphemer();
const split: string[] = splitter.splitGraphemes('Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞');
```
## Contributing
See [Contribution Guide](./CONTRIBUTING.md).
## Acknowledgements
This library is a fork of the incredible work done by Orlin Georgiev and Huáng Jùnliàng at https://github.com/orling/grapheme-splitter.
The original library was heavily influenced by Devon Govett's excellent [grapheme-breaker](https://github.com/devongovett/grapheme-breaker) CoffeeScript library.