If you think is-number can be replaced with a one-liner, you don’t have the enterprise code mindset. What if the world gets more inclusive and MMXXIV, ½ and ⠼⠁ become recognized as numbers? 𒐍𓆾 were numbers in the past but what if people start assigning numeric value to other characters? Are 🖐🔟💯🆢🂵🀌🁅 numbers of the future???
/s
I’m not even all kidding, Regex implementations are split on whether “٣” matches \d.
All junior devs should read OCs comment and really think about this.
The issue is whether is_number() is performing a semantic language matter or checking whether the text input can be converted by the program to a number type.
The former case - the semantic language test - is useful for chat based interactions, analysis of text (and ancient text - I love the cuneiform btw) and similar. In this mode, some applications don’t even have to be able to convert the text into eg binary (a ‘gazillion’ of something is quantifying it, but vaguely)
The latter case (validating input) is useful where the input is controlled and users are supposed to enter numbers using a limited part of a standard keyboard. Clay tablets and triangular sticks are strictly excluded from this interface.
Another example might be is_address(). Which of these are addresses? ‘10 Downing Street, London’, ‘193.168.1.1’, ‘Gettysberg’, ‘Sir/Madam’.
To me this highlights that code is a lot less reusable between different projects/apps than it at first appears.
You may argue that writiing 2024 as “MMXXIV” and not “ⅯⅯⅩⅩⅣ” is a mistake but while typists who’d use “2OlO” for “2010” (because they grew up using cost-reduced typewriters) are dying out, you’ll never get everyone to use the appropriate Unicode for Roman numerals.
Even if they did use unicode, any codeset , glyph or language changes over time , ulimately they emerge out of communication, not the other way round.
If some culture decides they want to use the glyph “2” to mean a word “to”, they can and will, and no codeset is going to stop them. And if they get their message to their intended audience it doesnt matter that somebody else’s isnumber fuction get’s it wrong.
A person, community or standard codeset or dictionary cannot deny the accuracy or content of encrypted communication just because they can’t decipher it.
Put another way a more robust isnumber() should maybe have a second argument to specify the codeset being used, and maybe whether written words - in some defined languare - are also to be converted
Excel is going to have a Date with you, and it’s not asking further questions. If you didn’t wish to consent to have your col’n shattered, you should have preceded it with a '.
yeah, I’ve been rohypnolled by both microsoft and oracle, and general cloud shit , and various co workwers so many times now i barely even notice.
Hilariosly excel has recently started asking now, I think it says something like:
“I’ve just fucked up several columns in your csv that you went to the bother of enquoting.”
“Do you want me to reload it and i’ll try to un-fuck a few of those columns? ( whispers to audience - but probably not all of them - tee hee).”
I think my employer just needs to employ 25-50 more “delivery” managers and empower them to spend millions on a prettier barrel for us to bend over, that’ll solve it.
Maybe it’ll have flufffy handcuffs.
So the only valid digits are arabic numbers but arabic script numbers are not a valid digit? If we want programming to be inclusive then doesn’t that make sense to also include the arabic script number?
So the only valid digits are arabic numbers but arabic script numbers are not a valid digit?
Some people writing Regex implementations have that opinion. I’ve refrained from saying mine.
If we want programming to be inclusive then doesn’t that make sense to also include the arabic script number?
Maybe. IMO, number tests should be chosen/implemented based on the project’s requirements. If you want to include every Unicode character or string pattern anyone’s ever used to convey a numeric value, that would be a long and growing list. Arguably, it’s impossible: the word “elf” means a number if interpreted as German for “eleven” but not if interpreted as English for 🧝.
Yeah, but “elf” are not digits. Digits are a symbol abstracted from the language itself. Does 5 and V convey different meanings in the context of digits? And yeah, I can see why they would argue about the implementation because inclusivity is important. Especially when designing a language implementation. If you are designing it wrong, it will be very hard to extend it in the future. But for application level implementation, go nuts.
You are right, “elf” is a stretch, it does not make sense to parse it as a number. But in some languages, the string “15 240,5” is just how a number is written (yes, that’s a U+2009 THIN SPACE, you can’t stop me from using it as a thousand separator in German). Obviously, despite having a , on their numpads, German programmers still expect computers to parse numbers with decimal dots and interpret commas as list values.
Alright, maybe you misunderstood the term digits with numbers. When parsing a digit, you do not attach semantic yet to the building blocks. A \d regex parser does not care that the string “555” is not equivalent to “VVV”. All it cares about is that there is the digit “5” or “V”. In the same vein, regex parser should not try to parse IV as a single symbol.
It’s not just digits. Nobody is expecting it to understand language yet but the parser is-number still returns true for "2e3" or "0x0F". It tells you whether the string can be interpreted as a real numeric value.
Yeah, hence is-“number”. But we were talking about regex are we. A number representation can use digits but it can also not. Much like how you make a number using the word “elf”.
As I said, a digit is a symbol. Much like how we use letters to compose words, digits are used to construct numbers. When you start to repeat or reuse the symbol then it is no longer a singular symbol (what regex \d does). Hence my comments on why arabic script are one of the understandable debates since i18n is a valid concern as much as a11y is.
Lisp code is already like this. That’s why I keep trying to explain it to programmers. Try reading the book SICP, published decades ago by MIT computer researchers.
If you think
is-number
can be replaced with a one-liner, you don’t have the enterprise code mindset. What if the world gets more inclusive and MMXXIV, ½ and ⠼⠁ become recognized as numbers? 𒐍𓆾 were numbers in the past but what if people start assigning numeric value to other characters? Are 🖐🔟💯🆢🂵🀌🁅 numbers of the future???/s
I’m not even all kidding, Regex implementations are split on whether “٣” matches
\d
.All junior devs should read OCs comment and really think about this.
The issue is whether
is_number()
is performing a semantic language matter or checking whether the text input can be converted by the program to a number type.The former case - the semantic language test - is useful for chat based interactions, analysis of text (and ancient text - I love the cuneiform btw) and similar. In this mode, some applications don’t even have to be able to convert the text into eg binary (a ‘gazillion’ of something is quantifying it, but vaguely)
The latter case (validating input) is useful where the input is controlled and users are supposed to enter numbers using a limited part of a standard keyboard. Clay tablets and triangular sticks are strictly excluded from this interface.
Another example might be
is_address()
. Which of these are addresses? ‘10 Downing Street, London’, ‘193.168.1.1’, ‘Gettysberg’, ‘Sir/Madam’.To me this highlights that code is a lot less reusable between different projects/apps than it at first appears.
It’s simple ⅯⅯⅩⅩⅣis a number, MMXXIV is not.
You may argue that writiing 2024 as “
MMXXIV
” and not “ⅯⅯⅩⅩⅣ
” is a mistake but while typists who’d use “2OlO
” for “2010
” (because they grew up using cost-reduced typewriters) are dying out, you’ll never get everyone to use the appropriate Unicode for Roman numerals.Even if they did use unicode, any codeset , glyph or language changes over time , ulimately they emerge out of communication, not the other way round.
If some culture decides they want to use the glyph “2” to mean a word “to”, they can and will, and no codeset is going to stop them. And if they get their message to their intended audience it doesnt matter that somebody else’s isnumber fuction get’s it wrong.
A person, community or standard codeset or dictionary cannot deny the accuracy or content of encrypted communication just because they can’t decipher it.
Put another way a more robust isnumber() should maybe have a second argument to specify the codeset being used, and maybe whether written words - in some defined languare - are also to be converted
On the other hand “1/4/12” is not a fucking date.
Excel is going to have a Date with you, and it’s not asking further questions. If you didn’t wish to consent to have your col’n shattered, you should have preceded it with a
'
.yeah, I’ve been rohypnolled by both microsoft and oracle, and general cloud shit , and various co workwers so many times now i barely even notice.
Hilariosly excel has recently started asking now, I think it says something like: “I’ve just fucked up several columns in your csv that you went to the bother of enquoting.” “Do you want me to reload it and i’ll try to un-fuck a few of those columns? ( whispers to audience - but probably not all of them - tee hee).”
I think my employer just needs to employ 25-50 more “delivery” managers and empower them to spend millions on a prettier barrel for us to bend over, that’ll solve it. Maybe it’ll have flufffy handcuffs.
Wouldn’t surprise me if even Unicode advices against using Roman numerals depending on meaning.
It was mostly a joke (though frankly if you try any implementation more complicated than that joke you’re going to have a bad time).
deleted by creator
So the only valid digits are arabic numbers but arabic script numbers are not a valid digit? If we want programming to be inclusive then doesn’t that make sense to also include the arabic script number?
Some people writing Regex implementations have that opinion. I’ve refrained from saying mine.
Maybe. IMO, number tests should be chosen/implemented based on the project’s requirements. If you want to include every Unicode character or string pattern anyone’s ever used to convey a numeric value, that would be a long and growing list. Arguably, it’s impossible: the word “elf” means a number if interpreted as German for “eleven” but not if interpreted as English for 🧝.
Yeah, but “elf” are not digits. Digits are a symbol abstracted from the language itself. Does 5 and V convey different meanings in the context of digits? And yeah, I can see why they would argue about the implementation because inclusivity is important. Especially when designing a language implementation. If you are designing it wrong, it will be very hard to extend it in the future. But for application level implementation, go nuts.
You are right, “elf” is a stretch, it does not make sense to parse it as a number. But in some languages, the string “15 240,5” is just how a number is written (yes, that’s a
U+2009 THIN SPACE
, you can’t stop me from using it as a thousand separator in German). Obviously, despite having a,
on their numpads, German programmers still expect computers to parse numbers with decimal dots and interpret commas as list values.Alright, maybe you misunderstood the term digits with numbers. When parsing a digit, you do not attach semantic yet to the building blocks. A \d regex parser does not care that the string “555” is not equivalent to “VVV”. All it cares about is that there is the digit “5” or “V”. In the same vein, regex parser should not try to parse IV as a single symbol.
It’s not just digits. Nobody is expecting it to understand language yet but the parser
is-number
still returnstrue
for"2e3"
or"0x0F"
. It tells you whether the string can be interpreted as a real numeric value.Yeah, hence is-“number”. But we were talking about regex are we. A number representation can use digits but it can also not. Much like how you make a number using the word “elf”.
I feel like there shoul be an ISO/DIN to define this.
But elf=B=11. Kinda depends on context if 11 is a digit
As I said, a digit is a symbol. Much like how we use letters to compose words, digits are used to construct numbers. When you start to repeat or reuse the symbol then it is no longer a singular symbol (what regex \d does). Hence my comments on why arabic script are one of the understandable debates since i18n is a valid concern as much as a11y is.
Lisp code is already like this. That’s why I keep trying to explain it to programmers. Try reading the book SICP, published decades ago by MIT computer researchers.
https://invidious.privacyredirect.com/watch?v=a3t3IKlXqFU
Are you asking for treefiddy upvotes?
How many upvotes does 💲🄄Ƽ᱐ buy, really?
At least one from the loch ness monster
someone fix that goddamn islochnessmonster() function