The entire notion that emoticons should be limited to what a committee approves (which is then mangled by corporate PR even further) is ridiculous. Just retvrn to images.
brikym 1 days ago [-]
This. But more work is needed. I tried a bunch of Discord alternatives like Matrix but very few have a fun experience with things custom emoji images that really make a chat server feel like a home.
uxellodunum 1 days ago [-]
There are clients on Matrix that support custom emojii, such as Sable and Commet.
Neither are absolutely perfect, but I know people who daily-drive one of the other (or both, which is where I'm at depending on the device).
For the most part, now that Matrix is merging those Matrix 2.0 specs finally, and the 2.0 features are already out in the wild with excellent results, it has a really good base, and as expected we've started to see clients build more into the average-consumer space to pose as alternatives to both niche and mainstream audiences such as Discord, Whatsapp, etc - Which it just wasn't/isn't able to do on Matrix 1.x (legacy).
brikym 16 hours ago [-]
That's positive. I think the other feature I wanted was embeds. I know prefetching content can be a security risk but it's super convenient.
uxellodunum 8 hours ago [-]
Sable has opt-in embeds though I haven't tested the extent of supported websites, seems to work fine for Youtube for instance.
Commet has an open PR for this but not yet implemented.
turblety 1 days ago [-]
Did they give a reason why it was declined? Was it some bureaucratic "form not filled in correct" thing, or are they actually against the concept of it?
To elaborate: it should be plain obvious that not every Emoji proposal can be accepted even though all of them are correctly filed, as there would be too many Emojis there then. So there has to be some threshold, and that threshold is mostly stipulated by vendors' willingness to process new Emoji characters for designing fonts and updating softwares in time.
wodenokoto 1 days ago [-]
Generally Unicode is for encoding all existing encodings/writing.
So you generally can’t add something because it would be cool or fun or useful, but only because it is currently in use and cannot be encoded by Unicode.
chordbug 1 days ago [-]
If this were entirely true, we'd never see new emoji added, and yet we do.
goodmythical 1 days ago [-]
That's not at all the case. Unicode began as a standard for making things like string(':)') in to a single character.
Consider all of the languages it supports. Consider: ﷽ (which isn't an emoji, but the point stands) which is an entire sentence. It was already in use in certain places and unicode decided they wanted to support it, so now they do. Previously, one would have to type out the entire sentence in the original characters, but now it is a single unicode, just like u+263a () used to be alt+1 (). The emoji was already in use long before unicode existed, and in seeing it in common use, they decided to support it.
zyx321 1 days ago [-]
That list only includes suggestions that were seriously considered and voted on.
Since it's a vote, there is no single official 'reason' for rejection. If I had to guess: it would be confusing to anyone who didn't grow up with American TV shows.
throawayonthe 4 hours ago [-]
what's the connection to american TV shows? i'm only aware of the tinfoil hat through cultural osmosis i guess, something about shielding from radio waves
it's a popular image/byword/archetype for conspiracy theorists, idk if it's a common enough symbol to justify emoji inclusion. the submitted proposals probably have analyses of that though :p
pwdisswordfishq 1 days ago [-]
Eh, it's not like there are hundreds of emojises pretty much exclusively tied to Japanese culture.
maxbond 1 days ago [-]
They were grandfathered in, not voted on. Or rather there was a vote that resulted in adopting the character sets developed by Japanese telecoms en masse.
Ekaros 1 days ago [-]
Weirdly this is in line with Unicode in general. Widespread (and not even widespread) historic use in say print results in characters getting included.
mananaysiempre 1 days ago [-]
> emojises
I don’t protest the coinage here (goodness knows my native language did worse things to English words), but I can’t stop saying it in Gollum’s voice.
brikym 1 days ago [-]
Seems like a conspiracy. Also it's so silly that pistol turned into water pistol.
poulpy123 1 days ago [-]
looking at the changes it makes me wonder:
- is there an usable font the cover all unicode ?
- if not is there really a point to include everything possible in unicode ?
- how many space is remaining for new alphabet and smileys ?
- how do they handle changes in scripts, for example if new proto-cuneiform or seal script symbols are discovered ?
wongarsu 1 days ago [-]
> if not is there really a point to include everything possible in unicode ?
Needing to load three fonts to show a single document that mixes vastly different character sets is still infinitely better than not being able to have those different characters in the same .txt or .md file at all
> how many space is remaining for new alphabet and smileys ?
Unicode can encode about 1100k code points, and about 800k of those are currently unassigned and available for future scripts or characters
xigoi 22 hours ago [-]
Also, the 1.1M limit is because of UTF-16. If UTF-16 was deprecated in favor of UTF-8, the limit could be much higher.
hulitu 9 hours ago [-]
We need UTF-32. For the future.
xigoi 9 hours ago [-]
UTF-32 already exists, but nobody uses it because it’s much less efficient for most textual data than UTF-8.
pvdebbe 6 hours ago [-]
UTFv6
tecleandor 1 days ago [-]
> how do they handle changes in scripts, for example if new proto-cuneiform or seal script symbols are discovered
They get added in the next Unicode revision.
In Unicode you have "blocks" [0] that are often bigger than the number of characters in a script, language or function. There are usually also space for new blocks between unrelated blocks.
For example, in the case of cuneiform, it was introduced in Unicode 5.0, and there have been revisions in 7.0 and 8.0 [1]
As an example of having not-exactly-a-character as Unicode "characters", it is rather rare that musical symbols are embedded in running texts (which is a primary litmus test for encoding), but musical symbols are typically rendered with existing font technology so there are needs for standardized "character" codes, as SMuFL [1] does. In fact Unicode 18 will get tons of musical symbols that have been in SMuFL for a long time but not yet in Unicode [2].
There is also GNU unifont [1] "The original intent of Unifont was to offer a simple font format with wide Unicode coverage to render something meaningful for each Unicode code point"
I would be more interested if they are ever going to cancel HAN unification. Looking at their "Factors for Exclusion" list it could be summarized by "we made some mistakes in past but are sticking to it" :D
lifthrasiir 1 days ago [-]
Han Unification was effectively "fixed" by Ideographic Variation Sequences, so no.
1 days ago [-]
account42 1 days ago [-]
You mean theoretically. Effectively, nothing is fixed yet.
lifthrasiir 1 days ago [-]
IVD works, theoretically and practically (recent versions of OpenType have an explicit support for them). It's not their fault that Japanese vendors have been not very quick to adopt them.
account42 1 days ago [-]
If a Japanese and Taiwanese person type things with their keyboards and end up with the same bytes for different logical characters then no things do not work practically for any practical definition of "practically".
lifthrasiir 1 days ago [-]
Your argument is absurd because people don't see code---they see glyphs, and using the same code for slightly different glyphs is a non-issue when they are not interchanged. (And when they are interchanged, both would see glyphs "correct" to them anyway.) Japaneses are sensitive to Han unification only because they recognize more glyph variations (Z-variants) than what Unicode originally could, and IVS is exactly a tool for ensuring exact glyphs assuming cooperative vendors. Not to mention that Han unification was already quite weakened by source separation principles in the first place.
numpad0 23 hours ago [-]
Chinese AI labs are reducing Japanese images and text out of AI models - they leave much smaller amount for text models that has to be literate in Japanese, and explicitly nuke it out of dataset for image models so that it only supports Simplified and English languages, so to avoid GIGO.
I mean, making or help making sovereign AI models is nowhere near responsibilities of Unicode, but Han Unification and sort of a default-enforced IVD support is literally adding small but non-zero amount of fuel to cultural division and xenophobia perpetuate in East Asia. I doubt blaming users would work here.
lifthrasiir 22 hours ago [-]
While I agree that Han Unification is not optimal (and fixing them is a welcoming development), it is already too late to reverse it. Even counter-proposals like TRON didn't work at all so far. IVD is the best compromise we can have in this situation.
> cultural division and xenophobia perpetuate in East Asia
By the way, I recently have seen multiple claims from Japanese Twitter users that Korea would have been better keeping Chinese characters (Hanja) in use. If this is a cultural division and xenophobia we are talking about, I will gladly take it---why on earth do they have any saying in Korea's choice of scripts? The "sinosphere" is an illusion, the fact that CJKV countries have or had shared the same set of characters is just a fun fact and not a cultural mandate or anything else like that.
numpad0 20 hours ago [-]
> IVD is the best compromise we can have in this situation.
Maybe, but no one is running an ivdfy-filter through every single Japanese documents and the issue keeps going. Maybe one way to make it happen is to make the Simplified forms singularly canonical to the CJK Unified Ideographs so to classify everything in that form as Chinese, and define Japanese script as being always flagged with IVDs, though I don't know what the storage and processing implication of that might be. But my point is that maintaining the position that users can optionally choose to not display text in a wrong language and Unification issues are merely user errors don't make any sense to me.
> Korea would have been better keeping Chinese characters (Hanja) in use.
I can't speak for all, but I, for one, do regularly encounter machine translation failures in Korean contents due to homophones even with LLM-based ones in the ways that don't happen with Japanese. It manifests as either homonym errors[1] or the MTL resorting to phonetic transcripts that I have no idea about[2]. Both happens in formal writings like newspaper Web articles in addition to casual social media posts. Since it appears that there's no way this issue could happen with "our" system, it sometimes feel like reverting to that could fix it.
1: (like "plain/plane", had the source been English and this was somehow happening)
2: (like "That arm might be fukuzatukossetsushiteru" had the source been Japanese)
extraduder_ire 15 hours ago [-]
Cancel how? There's documents encoded like that which would break it it were changed now.
Unicode takes backward compatibility like this very seriously.
pentamassiv 1 days ago [-]
Sadly it looks like it will be a dead monarch butterfly
I need a table emoji because then I could combine it with a horse emoji. This would be "Pferd Tisch" (Horse Table) in German which sounds similar to "Fertig" which translates to "done". Yes I want it only for that dumb joke.
chirsz 1 days ago [-]
Still no seahorse
JohnKemeny 1 days ago [-]
If the seahorse emoji is introduced, we will have to train new foundation models. The costs connected to the introduction of the seahorse emoji will be in the billions.
zarzavat 1 days ago [-]
You're absolutely right—the seahorse emoji was added in Unicode version 19.0.0 after OpenAI purchased the Unicode Consortium and converted it to a for-profit corporation.
simondotau 1 days ago [-]
The seahorse being, of course, among the first commercial Unicode characters that require a subscription to use.
Symbiote 1 days ago [-]
Does anyone know why a monarch butterfly was added, when there is already a butterfly emoji?
The Toki Pona script (aka sitelen pona) needs some codepoints for its ideograms. While Toki Pona is not in Unicode, tokiponists have mostly agreed to use the U+F19xx range in the Private Use Area-A. Most fonts rendering sitelen pona uses that. But using PUA is problematic (no character properties, a lot of restrictions on the web, and constant clashes with other fonts [such as the "nerdfonts" for example]).
sourcegrift 1 days ago [-]
Personally the whole emoji thing is an unmitigated disaster. I'm okay with smileys and gestures but everything else is pointless
vintermann 1 days ago [-]
I'm okay with smileys, but Unicode wasn't the right standard to deal with it. Unicode maybe wasn't the right standard to deal with anything.
At least nothing is wiggling. Of those Unicode points which are graphical, at least all of them can still be printed on paper and won't require a screen. I wonder how long that invariant lasts.
account42 1 days ago [-]
It's classic scope creep resulting in unmanageable bloat.
trvz 1 days ago [-]
I enjoy putting emojis in folder names on my computer for easier visual identification.
Also, in passwords on websites to keep developers on their toes.
sourcegrift 1 days ago [-]
The password thing is hilarious but the former breaks some of my Unix tools
duskwuff 18 hours ago [-]
Personally, I'm all for it. It's been an incredibly effective way of urging developers to support newer Unicode standards.
gschizas 11 hours ago [-]
Yet Microsoft still refuses to do flags.
numpad0 24 hours ago [-]
Google/Apple needed it to fill the moat for Japanese phone market - for Google it was because Japanese carriers were stripping emoji from outgoing emails, and for Apple it was because iPhone as a real phone and not an Internet-connected pocket PC with voice call had to support the SoftBank emoji set.
And yeah, :slack-style-emoji-notation: is superior. It was just a historical necessity for Google/Apple.
adzm 1 days ago [-]
Personally I think the whole emoji thing is a triumph of Unicode. Being able to convey more subtext through emoji makes communication so much easier especially across language boundaries.
Bolwin 12 hours ago [-]
I've never seen emoji used for subtext. Usually they just repeat or emphasize what's in the text
Emoji proposals and status: https://unicode.org/emoji/emoji-proposals-status.html
For the most part, now that Matrix is merging those Matrix 2.0 specs finally, and the 2.0 features are already out in the wild with excellent results, it has a really good base, and as expected we've started to see clients build more into the average-consumer space to pose as alternatives to both niche and mainstream audiences such as Discord, Whatsapp, etc - Which it just wasn't/isn't able to do on Matrix 1.x (legacy).
Commet has an open PR for this but not yet implemented.
To elaborate: it should be plain obvious that not every Emoji proposal can be accepted even though all of them are correctly filed, as there would be too many Emojis there then. So there has to be some threshold, and that threshold is mostly stipulated by vendors' willingness to process new Emoji characters for designing fonts and updating softwares in time.
So you generally can’t add something because it would be cool or fun or useful, but only because it is currently in use and cannot be encoded by Unicode.
Consider all of the languages it supports. Consider: ﷽ (which isn't an emoji, but the point stands) which is an entire sentence. It was already in use in certain places and unicode decided they wanted to support it, so now they do. Previously, one would have to type out the entire sentence in the original characters, but now it is a single unicode, just like u+263a () used to be alt+1 (). The emoji was already in use long before unicode existed, and in seeing it in common use, they decided to support it.
Since it's a vote, there is no single official 'reason' for rejection. If I had to guess: it would be confusing to anyone who didn't grow up with American TV shows.
it's a popular image/byword/archetype for conspiracy theorists, idk if it's a common enough symbol to justify emoji inclusion. the submitted proposals probably have analyses of that though :p
I don’t protest the coinage here (goodness knows my native language did worse things to English words), but I can’t stop saying it in Gollum’s voice.
- is there an usable font the cover all unicode ?
- if not is there really a point to include everything possible in unicode ?
- how many space is remaining for new alphabet and smileys ?
- how do they handle changes in scripts, for example if new proto-cuneiform or seal script symbols are discovered ?
Needing to load three fonts to show a single document that mixes vastly different character sets is still infinitely better than not being able to have those different characters in the same .txt or .md file at all
> how many space is remaining for new alphabet and smileys ?
Unicode can encode about 1100k code points, and about 800k of those are currently unassigned and available for future scripts or characters
They get added in the next Unicode revision.
In Unicode you have "blocks" [0] that are often bigger than the number of characters in a script, language or function. There are usually also space for new blocks between unrelated blocks.
For example, in the case of cuneiform, it was introduced in Unicode 5.0, and there have been revisions in 7.0 and 8.0 [1]
--
[1] https://www.smufl.org/
[2] https://www.unicode.org/L2/L2025/25017-miscellaneous-musical...
There is also GNU unifont [1] "The original intent of Unifont was to offer a simple font format with wide Unicode coverage to render something meaningful for each Unicode code point"
[1] https://unifoundry.com/unifont/index.html
* Cracking face
* Left/Right thumb sign
* Monarch butterfly
* Pickle
* Lighthouse
* Meteor
* Eraser
* Net with handle
- Left and Right parenthesis with middle ring [1]
- A wiggly exclamation mark expressing mirth or laughter [1] (edit: and something I completely missed: the inverted version can express sarcasm)
- Cuneiform numerals, including lots of arranged dots that might be useful in other contexts [2]
- New variations of "measured angle" and "sector" [3]
- A transparent cube and a white cube [4]
Also a couple new combining marks
And for anyone who wants to see what the reference images for the new emojis look like:
Lighthouse: https://www.unicode.org/charts/PDF/Unicode-18.0/U180-1F680.p...
Other new Emojis: https://www.unicode.org/charts/PDF/Unicode-18.0/U180-1FA70.p...
1: https://www.unicode.org/charts/PDF/Unicode-18.0/U180-2E00.pd...
2: https://www.unicode.org/charts/PDF/Unicode-18.0/U180-12550.p...
3: https://www.unicode.org/charts/PDF/Unicode-18.0/U180-1CEC0.p...
4: https://www.unicode.org/charts/PDF/Unicode-18.0/U180-1F780.p...
I mean, making or help making sovereign AI models is nowhere near responsibilities of Unicode, but Han Unification and sort of a default-enforced IVD support is literally adding small but non-zero amount of fuel to cultural division and xenophobia perpetuate in East Asia. I doubt blaming users would work here.
> cultural division and xenophobia perpetuate in East Asia
By the way, I recently have seen multiple claims from Japanese Twitter users that Korea would have been better keeping Chinese characters (Hanja) in use. If this is a cultural division and xenophobia we are talking about, I will gladly take it---why on earth do they have any saying in Korea's choice of scripts? The "sinosphere" is an illusion, the fact that CJKV countries have or had shared the same set of characters is just a fun fact and not a cultural mandate or anything else like that.
Maybe, but no one is running an ivdfy-filter through every single Japanese documents and the issue keeps going. Maybe one way to make it happen is to make the Simplified forms singularly canonical to the CJK Unified Ideographs so to classify everything in that form as Chinese, and define Japanese script as being always flagged with IVDs, though I don't know what the storage and processing implication of that might be. But my point is that maintaining the position that users can optionally choose to not display text in a wrong language and Unification issues are merely user errors don't make any sense to me.
> Korea would have been better keeping Chinese characters (Hanja) in use.
I can't speak for all, but I, for one, do regularly encounter machine translation failures in Korean contents due to homophones even with LLM-based ones in the ways that don't happen with Japanese. It manifests as either homonym errors[1] or the MTL resorting to phonetic transcripts that I have no idea about[2]. Both happens in formal writings like newspaper Web articles in addition to casual social media posts. Since it appears that there's no way this issue could happen with "our" system, it sometimes feel like reverting to that could fix it.
1: (like "plain/plane", had the source been English and this was somehow happening)
2: (like "That arm might be fukuzatukossetsushiteru" had the source been Japanese)
Unicode takes backward compatibility like this very seriously.
https://www.emilydamstra.com/please-enough-dead-butterflies/
At least nothing is wiggling. Of those Unicode points which are graphical, at least all of them can still be printed on paper and won't require a screen. I wonder how long that invariant lasts.
Also, in passwords on websites to keep developers on their toes.
And yeah, :slack-style-emoji-notation: is superior. It was just a historical necessity for Google/Apple.