feat(notes): add guide for OCR and PDF manipulation on Linux

- Added instructions for setting up Tesseract with language support. - Documented steps for converting PDFs to images using `pdftoppm` and alternatives like `ImageMagick`. - Included examples for single and multi-page OCR processing. - Detailed methods for merging extracted text into a single file. - Added troubleshooting tips for improving OCR results and handling selectable PDFs with `pdftotext`.
feat(dns): add comprehensive DNS notes
2024-12-05 16:09:04 -05:00 · 2024-12-05 15:49:54 -05:00 · 2024-12-05 15:40:37 -05:00 · 2024-11-18 20:17:00 -05:00
4 changed files with 189 additions and 1 deletions
--- a/notes/debian_setup_aptly.md
+++ b/notes/debian_setup_aptly.md
@ -343,7 +343,7 @@ chown -R aptly:aptly /home/aptly/.ssh/
    **Adding Packages**
    ```bash
-    sudo su -l aptly
+    sudo -iu aptly
    mkdir /home/aptly/packages/
    ```
@ -372,6 +372,14 @@ chown -R aptly:aptly /home/aptly/.ssh/
    rm ~/<hostname-internet>.private-key.asc
    ```
    **Register keys with public keyservers**
    ```bash
    gpg --send-keys <gpg-repository-key-id>
    gpg --keyserver hkp://keyserver.ubuntu.com --send-keys <gpg-repository-key-id>
    gpg --keyserver hkp://pgp.mit.edu --send-keys <gpg-repository-key-id>
    ```
    **Import GPG Key**
    ```bash
--- a/notes/dns.md
+++ b/notes/dns.md
@ -0,0 +1,64 @@
 # DNS
 ## Table of Contents
 - [DNS](#dns)
  - [Table of Contents](#table-of-contents)
  - [Flush DNS Cache](#flush-dns-cache)
  - [systemctl](#systemctl)
  - [dig](#dig)
  - [nslookup](#nslookup)
 ## Flush DNS Cache
 - Clear the DNS cache to ensure that the system resolves domain names with the most up-to-date information.
 ```bash
 resolvectl flush-caches
 systemd-resolve --flush-caches
 ```
 - Use `resolvectl` for newer systems or `systemd-resolve` for older systems (pre-2020).
 ## systemctl
 - This command enables the `systemd-resolved` service if it is not already running, ensuring DNS resolution through `systemd`.
 ```bash
 systemctl enable systemd-resolved.service
 ```
 **Explanation of the command:**
 - `systemd-resolve --flush-caches`: This command clears the DNS cache maintained by `systemd-resolved`, which can help resolve issues with outdated or incorrect DNS entries.
 - After flushing the cache, it may be necessary to restart the `systemd-resolved` service to ensure proper operation.
 ```bash
 systemctl restart systemd-resolved
 service systemd-resolved restart
 ```
 - To restart the service, use `systemctl restart systemd-resolved` (preferred). The `service` command is available but is considered legacy.
 ## dig
 ```bash
 dig domain.com
 dig +short NS domain.com
 ```
 ## nslookup
 - `nslookup` is a legacy tool but still useful for querying DNS. You can also specify custom DNS servers, such as `1.1.1.1` (Cloudflare) or `8.8.8.8` (Google), to query DNS directly without using the system’s default resolver.
 ```bash
 nslookup domain.com
 nslookup -q=cname domain.com
 nslookup -q=cname domain.com 1.1.1.1
 nslookup -q=cname domain.com 8.8.8.8
 ```
 ```bash
 nslookup -q=mx domain.com
 nslookup -q=txt domain.com
 ```
--- a/notes/linux.md
+++ b/notes/linux.md
@ -7,6 +7,7 @@
  - [System Information](#system-information)
    - [Hardware Information](#hardware-information)
    - [Software Information](#software-information)
 - [Commands to Get Information About Linux Version, Kernel Version, and Release](#commands-to-get-information-about-linux-version-kernel-version-and-release)
  - [User Management](#user-management)
    - [User Information](#user-information)
    - [Super User Management](#super-user-management)
@ -42,6 +43,17 @@ To gather detailed information about your hardware, use the following commands:
 ### Software Information
 **Finding information on the Linux distribution**
 # Commands to Get Information About Linux Version, Kernel Version, and Release
 - **`lsb_release -a`**: Displays detailed information about the Linux distribution, including the distributor ID, description, release number, and codename.
 - **`cat /etc/debian_version`**: Displays the version of the Debian distribution if you're running a Debian-based system (like Ubuntu).
 - **`cat /etc/os-release`**: Displays information about the operating system, such as the name, version, and ID of the distribution.
 - **`cat /etc/*release`**: Searches for any files in the `/etc/` directory that contain the word `release` and displays their contents. This typically includes more detailed distribution information.
 - **`cat /etc/*version`**: Similar to `cat /etc/*release`, but it looks for files containing the word `version`. It can provide additional version-related details.
 - **`hostnamectl`**: Displays system information related to the hostname and other metadata about the system. This may include the operating system, kernel version, and architecture.
 **Finding Path to Binary**
 To find the location of an executable binary, use:
--- a/notes/pdf.md
+++ b/notes/pdf.md
@ -0,0 +1,104 @@
 # PDF
 ## Table of Contents
 - [PDF](#pdf)
  - [Table of Contents](#table-of-contents)
  - [OCR](#ocr)
 ## OCR
 **Install necessary tools**
 Update the package list and install Tesseract with support for the desired language. Replace `fra` with your desired language code (e.g., `eng` for English).
 ```bash
 apt update
 apt install tesseract-ocr tesseract-ocr-fra
 ```
 **Verify the installation**
 Check the installed version of Tesseract and list available languages to ensure your chosen language is installed (e.g., `fra` for French).
 ```bash
 tesseract --version
 tesseract --list-langs
 ```
 **Install a utility to convert PDF pages into images**
 For PDFs that require OCR, you need a utility to convert PDF pages into images. Install `poppler-utils`, which includes `pdftoppm`.
 ```bash
 apt install poppler-utils
 ```
 **Convert PDF to images**
 Convert the PDF into JPEG images, with each page saved as a separate file. Each page will be named sequentially (e.g., `output-1.jpg`, `output-2.jpg`, etc.).
 ```bash
 pdftoppm -jpeg your_file.pdf output
 ```
 - **Tip**: Use a dedicated output directory to avoid overwriting existing files.
 - **Alternative tools**: If `poppler-utils` isn’t available, consider using `ImageMagick`:
  ```bash
  convert -density 300 your_file.pdf output-%04d.jpg
  ```
 **Perform OCR on a single image**
 Run Tesseract OCR on an image to extract text. Specify the language using the `-l` option.
 ```bash
 tesseract output-1.jpg output-text -l fra
 ```
 The extracted text will be saved in `output-text.txt`.
 **Perform OCR on multiple images**
 For multi-page PDFs, process all images in a loop. This extracts text from each image and saves it to a separate `.txt` file.
 ```bash
 for img in output-*.jpg; do
    tesseract "$img" "${img%.jpg}" -l fra
 done
 ```
 - **Note**: The `${img%.jpg}` syntax removes the `.jpg` extension, ensuring each `.txt` file matches its corresponding image.
 **Combine all text files**
 Merge the text from all processed pages into a single file. This is useful for assembling the full content of the PDF.
 ```bash
 cat output-*.txt > complete_text.txt
 ```
 If filenames are out of order, use a sorting approach before merging:
 ```bash
 ls output-*.txt | sort -V | xargs cat > complete_text.txt
 ```
 **Troubleshooting and Tips**
 **If Tesseract doesn’t recognize text**:
 - Ensure the images have sufficient quality and resolution. Use `-r` with `pdftoppm` to increase the DPI (e.g., `-r 300` for 300 DPI).
 - Try additional Tesseract language packs for better recognition of specific text styles.
 **Using `pdftotext` for simpler PDFs**:  
 If the PDF contains selectable text (not just images), `pdftotext` from `poppler-utils` can extract text directly without OCR.
 ```bash
 pdftotext your_file.pdf output.txt
 ```
 Replace `convert` with `magick` for newer versions.
 **Verify language codes for Tesseract**:  
 You can find a list of supported languages on the [Tesseract GitHub page](https://github.com/tesseract-ocr/tesseract).
Author	SHA1	Message	Date
Fabrice Quenneville	b8b8817cd4	feat(notes): add guide for OCR and PDF manipulation on Linux - Added instructions for setting up Tesseract with language support. - Documented steps for converting PDFs to images using `pdftoppm` and alternatives like `ImageMagick`. - Included examples for single and multi-page OCR processing. - Detailed methods for merging extracted text into a single file. - Added troubleshooting tips for improving OCR results and handling selectable PDFs with `pdftotext`.	2024-12-05 16:09:04 -05:00
Fabrice Quenneville	115eec5c62	feat(dns): add comprehensive DNS notes - Documented commands for flushing DNS cache using `resolvectl` and `systemd-resolve`. - Included instructions for enabling and restarting `systemd-resolved` service. - Added usage examples for `dig` and `nslookup` to query DNS records. - Provided contextual explanations and legacy tool considerations.	2024-12-05 15:49:54 -05:00
Fabrice Quenneville	cd1db53397	Add commands to find information about Linux distribution, version, and kernel Included various commands such as `lsb_release -a`, `cat /etc/debian_version`, and `hostnamectl` to document ways of retrieving system and distribution details.	2024-12-05 15:40:37 -05:00
Fabrice Quenneville	38da2a4315	Slight additions of missing commands in aptly setup notes.	2024-11-18 20:17:00 -05:00