emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* [off topic] List all non-latin characters in a buffer
@ 2022-08-16 15:19 Juan Manuel Macías
  2022-08-19 14:50 ` Uwe Brauer
  0 siblings, 1 reply; 3+ messages in thread
From: Juan Manuel Macías @ 2022-08-16 15:19 UTC (permalink / raw)
  To: orgmode

Sorry for the offtopic, but I thought this homemade function I wrote
some time ago for my work might perhaps be useful to some Orgers. When
executed in a buffer, the `list-non-latin-chars' function opens a window
displaying a list of all the non (basic) Latin characters present in
that document. Each item in the list contains the character, its Unicode
canonical name, and its hexadecimal code. For example:

殿  CJK IDEOGRAPH-6BBF  #6bbf

Also, each item is a button (created with button.el). If the button
is activated, there are currently two options: a: execute occur on that
character in the document; b : execute describe-char on that character.

By default, the characters displayed in the list correspond to any
Unicode block other than basic-latin. Which means that the zero width
space character is included, a very famous character in this mailing
list :-)

And here is the code (lexical binding is required). Of course, feedback
welcome.

Best regards,

Juan Manuel

#+begin_src emacs-lisp
  (setq ext-chars-actions-list '((?a "Occur"
				      (lambda (buf char)
					(interactive)
					(with-current-buffer buf
					  (occur char))))
				  (?b "Describe char"
				      (lambda (buf char)
					(interactive)
					(with-current-buffer buf
					  (save-excursion
					    (goto-char (point-min))
					    (when (re-search-forward char nil t)
					      (describe-char (- (point) 1)))))))))

  (defun ext-chars-choose-action (buf char)
    (let ((opt (read-char-choice (concat "Escoger acción >>\n\n"
					 (mapconcat (lambda (item)
						      (format "%c: %s"
							      (car item) (nth 1 item)))
						    ext-chars-actions-list " --- "))
				 (mapcar #'car ext-chars-actions-list))))
      (apply (nth 2 (assoc opt ext-chars-actions-list))
	     (list buf char))))

  (defvar ext-chars-list nil)

  (defun list-non-latin-chars ()
    (interactive)
    (setq ext-chars-list nil)
    (let ((buf (buffer-name)))
      (save-excursion
	(goto-char (point-min))
	(while
	    (re-search-forward "\\([^\u0000-\u007F]\\)" nil t)
	  (add-to-list 'ext-chars-list (format "%s" (match-string 1))))
	(setq ext-chars-list-final
	      (mapcar (lambda (char)
			(let
			    ((char-name (get-char-code-property (string-to-char char) 'name))
			     ;; convert to hexadecimal
			     (char-code (format "#%x" (string-to-char char))))
			  (setq char (format  "%s\s\s%s\s\s%s" char char-name char-code))))
		      ext-chars-list))
	(let ((temp-buf (format "*non latin chars in %s*" buf)))
	  (when (get-buffer temp-buf)
	    (kill-buffer temp-buf))
	  (get-buffer-create temp-buf)
	  (set-buffer temp-buf)
          ;; necessary for Arabic, Hebrew, etc.
	  (setq bidi-display-reordering nil)
          ;; insert buttons list
	  (mapc (lambda (el)
		  (let ((char (when (string-match "^\\(.\\)\s" el)
				(match-string 1 el))))
		    (insert-button (format "%s" el)
				   'action (lambda (x) 
					     (interactive) 
					     (ext-chars-choose-action buf char)))
		    (insert "\n\n")))
		ext-chars-list-final)
	  (pop-to-buffer temp-buf)
	  (goto-char (point-min))
	  (view-mode)))))
#+end_src

-- 
--
------------------------------------------------------
Juan Manuel Macías 

https://juanmanuelmacias.com

https://lunotipia.juanmanuelmacias.com

https://gnutas.juanmanuelmacias.com




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [off topic] List all non-latin characters in a buffer
  2022-08-16 15:19 [off topic] List all non-latin characters in a buffer Juan Manuel Macías
@ 2022-08-19 14:50 ` Uwe Brauer
  2022-09-09 14:15   ` Robert Pluim
  0 siblings, 1 reply; 3+ messages in thread
From: Uwe Brauer @ 2022-08-19 14:50 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 755 bytes --]

>>> "JMM" == Juan Manuel Macías <maciaschain@posteo.net> writes:
Hi Juan


> Sorry for the offtopic, but I thought this homemade function I wrote
> some time ago for my work might perhaps be useful to some Orgers. When
> executed in a buffer, the `list-non-latin-chars' function opens a window
> displaying a list of all the non (basic) Latin characters present in
> that document. Each item in the list contains the character, its Unicode
> canonical name, and its hexadecimal code. For example:

Very very nice,

Till now I used only a very simple search function

(defun my-search-no-ascii ()
"Simple function to search for no ASCII symbols in a file."
  (interactive)
  (skip-chars-forward "\001-\177"))

That is so much better!

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5673 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [off topic] List all non-latin characters in a buffer
  2022-08-19 14:50 ` Uwe Brauer
@ 2022-09-09 14:15   ` Robert Pluim
  0 siblings, 0 replies; 3+ messages in thread
From: Robert Pluim @ 2022-09-09 14:15 UTC (permalink / raw)
  To: emacs-orgmode

>>>>> On Fri, 19 Aug 2022 16:50:54 +0200, Uwe Brauer <oub@mat.ucm.es> said:

    >>>> "JMM" == Juan Manuel Macías <maciaschain@posteo.net> writes:
    Uwe> Hi Juan


    >> Sorry for the offtopic, but I thought this homemade function I wrote
    >> some time ago for my work might perhaps be useful to some Orgers. When
    >> executed in a buffer, the `list-non-latin-chars' function opens a window
    >> displaying a list of all the non (basic) Latin characters present in
    >> that document. Each item in the list contains the character, its Unicode
    >> canonical name, and its hexadecimal code. For example:

    Uwe> Very very nice,

    Uwe> Till now I used only a very simple search function

    Uwe> (defun my-search-no-ascii ()
    Uwe> "Simple function to search for no ASCII symbols in a file."
    Uwe>   (interactive)
    Uwe>   (skip-chars-forward "\001-\177"))

Equivalently you can do
 (skip-chars-forward "[[:ascii:]]")

(or use `re-search-forward' with the [:nonascii:] character class.)

Robert
-- 


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-09-09 14:18 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-16 15:19 [off topic] List all non-latin characters in a buffer Juan Manuel Macías
2022-08-19 14:50 ` Uwe Brauer
2022-09-09 14:15   ` Robert Pluim

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).