Compare commits

...

4 Commits

Author SHA1 Message Date
b8b8817cd4 feat(notes): add guide for OCR and PDF manipulation on Linux
- Added instructions for setting up Tesseract with language support.
- Documented steps for converting PDFs to images using `pdftoppm` and alternatives like `ImageMagick`.
- Included examples for single and multi-page OCR processing.
- Detailed methods for merging extracted text into a single file.
- Added troubleshooting tips for improving OCR results and handling selectable PDFs with `pdftotext`.
2024-12-05 16:09:04 -05:00
115eec5c62 feat(dns): add comprehensive DNS notes
- Documented commands for flushing DNS cache using `resolvectl` and `systemd-resolve`.
- Included instructions for enabling and restarting `systemd-resolved` service.
- Added usage examples for `dig` and `nslookup` to query DNS records.
- Provided contextual explanations and legacy tool considerations.
2024-12-05 15:49:54 -05:00
cd1db53397 Add commands to find information about Linux distribution, version, and kernel
Included various commands such as `lsb_release -a`, `cat /etc/debian_version`, and `hostnamectl` to document ways of retrieving system and distribution details.
2024-12-05 15:40:37 -05:00
38da2a4315 Slight additions of missing commands in aptly setup notes. 2024-11-18 20:17:00 -05:00
4 changed files with 189 additions and 1 deletions

View File

@ -343,7 +343,7 @@ chown -R aptly:aptly /home/aptly/.ssh/
**Adding Packages** **Adding Packages**
```bash ```bash
sudo su -l aptly sudo -iu aptly
mkdir /home/aptly/packages/ mkdir /home/aptly/packages/
``` ```
@ -372,6 +372,14 @@ chown -R aptly:aptly /home/aptly/.ssh/
rm ~/<hostname-internet>.private-key.asc rm ~/<hostname-internet>.private-key.asc
``` ```
**Register keys with public keyservers**
```bash
gpg --send-keys <gpg-repository-key-id>
gpg --keyserver hkp://keyserver.ubuntu.com --send-keys <gpg-repository-key-id>
gpg --keyserver hkp://pgp.mit.edu --send-keys <gpg-repository-key-id>
```
**Import GPG Key** **Import GPG Key**
```bash ```bash

64
notes/dns.md Normal file
View File

@ -0,0 +1,64 @@
# DNS
## Table of Contents
- [DNS](#dns)
- [Table of Contents](#table-of-contents)
- [Flush DNS Cache](#flush-dns-cache)
- [systemctl](#systemctl)
- [dig](#dig)
- [nslookup](#nslookup)
## Flush DNS Cache
- Clear the DNS cache to ensure that the system resolves domain names with the most up-to-date information.
```bash
resolvectl flush-caches
systemd-resolve --flush-caches
```
- Use `resolvectl` for newer systems or `systemd-resolve` for older systems (pre-2020).
## systemctl
- This command enables the `systemd-resolved` service if it is not already running, ensuring DNS resolution through `systemd`.
```bash
systemctl enable systemd-resolved.service
```
**Explanation of the command:**
- `systemd-resolve --flush-caches`: This command clears the DNS cache maintained by `systemd-resolved`, which can help resolve issues with outdated or incorrect DNS entries.
- After flushing the cache, it may be necessary to restart the `systemd-resolved` service to ensure proper operation.
```bash
systemctl restart systemd-resolved
service systemd-resolved restart
```
- To restart the service, use `systemctl restart systemd-resolved` (preferred). The `service` command is available but is considered legacy.
## dig
```bash
dig domain.com
dig +short NS domain.com
```
## nslookup
- `nslookup` is a legacy tool but still useful for querying DNS. You can also specify custom DNS servers, such as `1.1.1.1` (Cloudflare) or `8.8.8.8` (Google), to query DNS directly without using the systems default resolver.
```bash
nslookup domain.com
nslookup -q=cname domain.com
nslookup -q=cname domain.com 1.1.1.1
nslookup -q=cname domain.com 8.8.8.8
```
```bash
nslookup -q=mx domain.com
nslookup -q=txt domain.com
```

View File

@ -7,6 +7,7 @@
- [System Information](#system-information) - [System Information](#system-information)
- [Hardware Information](#hardware-information) - [Hardware Information](#hardware-information)
- [Software Information](#software-information) - [Software Information](#software-information)
- [Commands to Get Information About Linux Version, Kernel Version, and Release](#commands-to-get-information-about-linux-version-kernel-version-and-release)
- [User Management](#user-management) - [User Management](#user-management)
- [User Information](#user-information) - [User Information](#user-information)
- [Super User Management](#super-user-management) - [Super User Management](#super-user-management)
@ -42,6 +43,17 @@ To gather detailed information about your hardware, use the following commands:
### Software Information ### Software Information
**Finding information on the Linux distribution**
# Commands to Get Information About Linux Version, Kernel Version, and Release
- **`lsb_release -a`**: Displays detailed information about the Linux distribution, including the distributor ID, description, release number, and codename.
- **`cat /etc/debian_version`**: Displays the version of the Debian distribution if you're running a Debian-based system (like Ubuntu).
- **`cat /etc/os-release`**: Displays information about the operating system, such as the name, version, and ID of the distribution.
- **`cat /etc/*release`**: Searches for any files in the `/etc/` directory that contain the word `release` and displays their contents. This typically includes more detailed distribution information.
- **`cat /etc/*version`**: Similar to `cat /etc/*release`, but it looks for files containing the word `version`. It can provide additional version-related details.
- **`hostnamectl`**: Displays system information related to the hostname and other metadata about the system. This may include the operating system, kernel version, and architecture.
**Finding Path to Binary** **Finding Path to Binary**
To find the location of an executable binary, use: To find the location of an executable binary, use:

104
notes/pdf.md Normal file
View File

@ -0,0 +1,104 @@
# PDF
## Table of Contents
- [PDF](#pdf)
- [Table of Contents](#table-of-contents)
- [OCR](#ocr)
## OCR
**Install necessary tools**
Update the package list and install Tesseract with support for the desired language. Replace `fra` with your desired language code (e.g., `eng` for English).
```bash
apt update
apt install tesseract-ocr tesseract-ocr-fra
```
**Verify the installation**
Check the installed version of Tesseract and list available languages to ensure your chosen language is installed (e.g., `fra` for French).
```bash
tesseract --version
tesseract --list-langs
```
**Install a utility to convert PDF pages into images**
For PDFs that require OCR, you need a utility to convert PDF pages into images. Install `poppler-utils`, which includes `pdftoppm`.
```bash
apt install poppler-utils
```
**Convert PDF to images**
Convert the PDF into JPEG images, with each page saved as a separate file. Each page will be named sequentially (e.g., `output-1.jpg`, `output-2.jpg`, etc.).
```bash
pdftoppm -jpeg your_file.pdf output
```
- **Tip**: Use a dedicated output directory to avoid overwriting existing files.
- **Alternative tools**: If `poppler-utils` isnt available, consider using `ImageMagick`:
```bash
convert -density 300 your_file.pdf output-%04d.jpg
```
**Perform OCR on a single image**
Run Tesseract OCR on an image to extract text. Specify the language using the `-l` option.
```bash
tesseract output-1.jpg output-text -l fra
```
The extracted text will be saved in `output-text.txt`.
**Perform OCR on multiple images**
For multi-page PDFs, process all images in a loop. This extracts text from each image and saves it to a separate `.txt` file.
```bash
for img in output-*.jpg; do
tesseract "$img" "${img%.jpg}" -l fra
done
```
- **Note**: The `${img%.jpg}` syntax removes the `.jpg` extension, ensuring each `.txt` file matches its corresponding image.
**Combine all text files**
Merge the text from all processed pages into a single file. This is useful for assembling the full content of the PDF.
```bash
cat output-*.txt > complete_text.txt
```
If filenames are out of order, use a sorting approach before merging:
```bash
ls output-*.txt | sort -V | xargs cat > complete_text.txt
```
**Troubleshooting and Tips**
**If Tesseract doesnt recognize text**:
- Ensure the images have sufficient quality and resolution. Use `-r` with `pdftoppm` to increase the DPI (e.g., `-r 300` for 300 DPI).
- Try additional Tesseract language packs for better recognition of specific text styles.
**Using `pdftotext` for simpler PDFs**:
If the PDF contains selectable text (not just images), `pdftotext` from `poppler-utils` can extract text directly without OCR.
```bash
pdftotext your_file.pdf output.txt
```
Replace `convert` with `magick` for newer versions.
**Verify language codes for Tesseract**:
You can find a list of supported languages on the [Tesseract GitHub page](https://github.com/tesseract-ocr/tesseract).