笔记 - Python

TOC

创建python虚拟环境

起因

我之前使用homebrew下载了python。所以python安装在了homebrew里边，我现在要使用pip3安装一些python软件包，结果给我提示：

xdl@MacBook-Air ~/Desktop/vocal % pip3 install -r requirements.txt 
error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try brew install
    xyz, where xyz is the package you are trying to
    install.
    
    If you wish to install a Python library that isn't in Homebrew,
    use a virtual environment:
    
    python3 -m venv path/to/venv
    source path/to/venv/bin/activate
    python3 -m pip install xyz
    
    If you wish to install a Python application that isn't in Homebrew,
    it may be easiest to use 'pipx install xyz', which will manage a
    virtual environment for you. You can install pipx with
    
    brew install pipx
    
    You may restore the old behavior of pip by passing
    the '--break-system-packages' flag to pip, or by adding
    'break-system-packages = true' to your pip.conf file. The latter
    will permanently disable this error.
    
    If you disable this error, we STRONGLY recommend that you additionally
    pass the '--user' flag to pip, or set 'user = true' in your pip.conf
    file. Failure to do this can result in a broken Homebrew installation.
    
    Read more about this behavior here: <https://peps.python.org/pep-0668/>

note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.

deepseek给我回复：

这个错误是因为你的 Python 环境是由系统（macOS）或包管理器（Homebrew）管理的，出于安全考虑，pip 默认不允许直接安装可能影响系统稳定性的包

先说明一下：

xdl@MacBook-Air ~ % where python3
/opt/homebrew/bin/python3
/usr/bin/python3
xdl@MacBook-Air ~ % which python3
/opt/homebrew/bin/python3
xdl@MacBook-Air ~ % where pip3
/opt/homebrew/bin/pip3
/usr/bin/pip3
xdl@MacBook-Air ~ % which pip3
/opt/homebrew/bin/pip3

可以看到我如果直接输python3或者pip3，调用的是homebrew下边的，不是系统的，所以有上边显示的错误提醒。

解决办法

deepseek给了相应的解决方法，这里整理一下：

1. `pipx`（适合安装 Python 命令行工具）

用途

专门用于安装 Python 命令行工具（如 black、poetry、youtube-dl、pipenv 等）。
每个工具运行在独立的虚拟环境中，避免依赖冲突。
适用于全局安装（但仍然是隔离的），方便直接在终端调用。

优点

✅ 隔离安装：每个工具都有自己的虚拟环境，不会污染全局 Python 环境。
✅ 全局可用：安装后可以直接在终端运行（如 black –version）。
✅ 自动管理：pipx 自动处理虚拟环境，不需要手动激活。
✅ 安全：不会影响系统 Python 或其他项目。

缺点

❌ 不适合项目依赖：不能直接用于安装 requirements.txt 或管理项目级依赖。
❌ 仅适用于 CLI 工具：不能用于 Python 库（如 numpy、requests）的开发环境。

适用场景

安装 Python 命令行工具（如 black、poetry、pipenv、youtube-dl）。

替代 pip install –user，避免污染全局环境。

示例

# 安装 pipx
brew install pipx
pipx ensurepath

# 安装一些 CLI 工具
pipx install black    # 代码格式化工具
pipx install poetry   # 依赖管理工具
pipx install youtube-dl  # 下载工具

# 运行
black --version
poetry --help

2. `venv`（适合管理项目级 Python 环境）

用途

为每个 Python 项目创建独立的虚拟环境，管理项目依赖（如 flask、django、numpy）。
需要手动激活环境，依赖安装在该环境内。

优点

✅ 项目隔离：每个项目有自己的依赖，不会互相冲突。
✅ 适合开发：可以安装 requirements.txt 或使用 pip 自由管理包。
✅ 标准化：Python 官方推荐方式，兼容性强。

缺点

❌ 需要手动激活：每次进入项目目录需要 source venv/bin/activate。
❌ 不适用于全局工具：不能直接在终端调用（除非激活环境）。

适用场景

开发 Python 项目（如 Web 后端、数据分析、脚本）。
需要管理 requirements.txt 或 pyproject.toml 依赖。

示例

# 创建项目文件夹
mkdir my_project && cd my_project

# 创建虚拟环境
python3 -m venv venv

# 激活环境
source venv/bin/activate  # macOS/Linux
# venv\Scripts\activate   # Windows

# 安装依赖
pip install -r requirements.txt

# 退出环境
deactivate

3. `pipx` vs `venv` 对比

特性	`pipx`	`venv`
用途	安装 Python 命令行工具（如 `black`）	管理项目依赖（如 `flask`、`numpy`）
隔离性	每个工具独立环境	每个项目独立环境
全局可用	✅（直接终端运行）	❌（需先激活环境）
适合开发项目	❌	✅
适合 CLI 工具	✅	❌（除非手动安装进 venv）
是否需要激活	❌（自动管理）	✅（需 `source venv/bin/activate`）
依赖冲突风险	低（每个工具独立）	低（每个项目独立）

4. 如何选择？

用 pipx 安装：
- 所有 Python 命令行工具（如 black、poetry、youtube-dl）。
- 替代 pip install –user，避免污染全局环境。
用 venv 管理：
- 你的 Python 项目依赖（如 django、numpy）。
- 需要 requirements.txt 或 pyproject.toml 的项目。

实例 1：获取单词音频

English的单词音标和发音都取自 vocabulary.com 。

为了获取相应的内容，GPT替我写了一个python程序用来获取相关内容，这里详细记录下整个过程。

1. 文件结构

我把它当作一个项目，新建一个文件夹 vocabulary-scraper，终端打开它，目录结构如下：

vocabulary-scraper
    |
    | - requirements.txt
    | - word_list.txt
    | - fetch_pronunciation.py

2. 文件内容

word_list.txt这个文件存放需要获取信息的单词，一行一个。我是从The_Oxford_3000.pdf(125kb)这个PDF文件提取出单词(用deepseek提取的，最后提取出的是2970个单词，估计有30个没提取到，另外，deepseek给我回复说不能直接生成一个txt文件供我下载，只能放在代码框最后复制);
```
a
abandon
alility
# more other words
```
requirements.txt这个文件存放python程序需要的其他库文件，以便安装。
```
requests
beautifulsoup4
PyYAML
```

fetch_pronunciation.py文件是获取单词信息的python程序。

import requests
from bs4 import BeautifulSoup
import yaml
import time

def fetch_pronunciation(word):
    url = f"https://www.vocabulary.com/dictionary/{word}"
    headers = {
        "User-Agent": "Mozilla/5.0"
    }

    response = requests.get(url, headers=headers)
    if response.status_code != 200:
        print(f"[Error] Failed to fetch: {url}")
        return None

    soup = BeautifulSoup(response.text, "html.parser")
    result = {}

    video_blocks = soup.select(".video-with-label")
    for block in video_blocks:
        region = block.select_one(".region-label").text.strip().lower()
        phonetic = block.select_one("span").text.strip()
        video_tag = block.select_one("video source")
        video_link = video_tag["src"] if video_tag else None

        if region in ["us", "uk"]:
            result[region] = {
                "phonetic_symbol": phonetic,
                "pronuciation_link": video_link
            }

    return result if result else None

def main():
    input_file = "word_list.txt"
    output_file = "pronunciations.yaml"
    all_data = {}

    with open(input_file, "r", encoding="utf-8") as f:
        words = [line.strip() for line in f if line.strip()]

    for i, word in enumerate(words):
        print(f"[{i+1}/{len(words)}] Fetching: {word}")
        data = fetch_pronunciation(word)
        if data:
            all_data[word] = data
        else:
            print(f"  ↳ No data found for: {word}")
        time.sleep(1)  # 避免访问过快被封

    with open(output_file, "w", encoding="utf-8") as out:
        yaml.dump(all_data, out, allow_unicode=True, sort_keys=False)

    print(f"\n✅ All done! Results saved to: {output_file}")

if __name__ == "__main__":
    main()

3. 执行过程

# 创建虚拟环境
python3 -m venv venv

# 激活环境
source venv/bin/activate  # macOS/Linux
# venv\Scripts\activate   # Windows

# 安装依赖
pip install -r requirements.txt

# 运行程序
python fetch_pronunciation.py

# 退出环境
deactivate

4. 执行结果

现在的文件结构如下：

vocabulary-scraper
    | - venv/
    | - requirements.txt
    | - word_list.txt
    | - fetch_pronunciation.py
    | - pronunciations.yaml

可以看到多了一个venv的文件夹，这是创建的虚拟环境，还有一个python程序生成pronunciations.yaml文件，用来存放获取到的单词信息。

a:
  us:
    phonetic_symbol: /eɪ/
    pronuciation_link: https://sd-pronunciation-processed-videos.sdcdns.com/desktop/lang_en_pron_8_speaker_5_syllable_all_version_51.mp4
  uk:
    phonetic_symbol: /eɪ/
    pronuciation_link: https://sd-pronunciation-processed-videos.sdcdns.com/desktop/lang_en_pron_4311_speaker_8_syllable_all_version_50.mp4
abandon:
  us:
    phonetic_symbol: /əˈbændən/
    pronuciation_link: https://sd-pronunciation-processed-videos.sdcdns.com/desktop/lang_en_pron_9964_speaker_5_syllable_all_version_51.mp4
  uk:
    phonetic_symbol: /əˈbændən/
    pronuciation_link: https://sd-pronunciation-processed-videos.sdcdns.com/desktop/lang_en_pron_9965_speaker_8_syllable_all_version_50.mp4

# more words

实例 2：从数字生成短的唯一标识符

类似youtube视频网址，比如https://example.com/h7dQIX这种。

使用是名为Sqids 的工具。

1. 文件结构

新建一个文件夹 sqids，终端打开它，目录结构如下：

sqids
    |
    | - run.py

xdl@MacBook-Air ~ % cd  /Users/xdl/Documents/github/sqids
xdl@MacBook-Air ~/Documents/github/sqids % python3 -m venv venv
xdl@MacBook-Air ~/Documents/github/sqids % source venv/bin/activate
(venv) xdl@MacBook-Air ~/Documents/github/sqids % pip install sqids
Collecting sqids
  Downloading sqids-0.5.2-py3-none-any.whl (8.9 kB)
Installing collected packages: sqids
Successfully installed sqids-0.5.2

这里没有requirements.txt文件，是因为只需要下载一个外部库sqids，所以可以直接输入命令就行，不用使用txt文件（可以列明多个需要的外部库）。

2. 文件内容

run.py的内容如下：

import argparse
from sqids import Sqids

def main():
    # 初始化 Sqids（修正后的字母表）
    sqids = Sqids(
        min_length=10,
        alphabet="F69xnXMkBNcuhs1AvjW3Co7l2RePyY8DwaU0TztfHQrqSVKdpi4mLGIJOgb5ZE",
        blocklist={"fuck", "shit"}  # 可选：屏蔽不雅词
    )

    # 设置命令行参数解析
    parser = argparse.ArgumentParser(
        description="Sqids 编解码工具 (min_length=10, 自定义字母表)",
        formatter_class=argparse.RawTextHelpFormatter
    )
    group = parser.add_mutually_exclusive_group(required=True)
    group.add_argument("-e", "--encode", nargs="+", type=int, 
                      help="编码一个或多个数字（例如：-e 100 或 -e 100 200 300）")
    group.add_argument("-d", "--decode", type=str, 
                      help="解码 Sqid 字符串（例如：-d 'XMjW3Co7l2'）")
    args = parser.parse_args()

    # 编码或解码
    if args.encode:
        try:
            id = sqids.encode(args.encode)
            print(id)  # 直接输出编码结果
        except Exception as e:
            print(f"Error: {str(e)}")
    elif args.decode:
        try:
            numbers = sqids.decode(args.decode)
            if numbers:
                print(numbers[0] if len(numbers) == 1 else numbers)  # 直接输出解码结果
            else:
                print("Invalid Sqid")
        except Exception as e:
            print(f"Error: {str(e)}")

if __name__ == "__main__":
    main()

说明：

我这里是想要实现可以传递参数，如果不需要这些，可以参考Sqids-Python的github网站说明，更加清晰明了；
min_length=10 指明了需要生成的位数；
alphabet="F69xnXMkBNcuhs1AvjW3Co7l2RePyY8DwaU0TztfHQrqSVKdpi4mLGIJOgb5ZE" 这个相当于密码本，你可以自己改，但注意，这里的每个字符都是不重复的，也不允许重复，所以你如果要改的话，可以改动各个字符的顺序。

3. 执行程序

(venv) xdl@MacBook-Air ~/Documents/github/sqids % python run.py -e 10000 1   
RyaFuZKBGw
(venv) xdl@MacBook-Air ~/Documents/github/sqids % python run.py -d RyaFuZKBGw
[10000, 1]

这个结果就是我需要的，所以整个过程是成功的。说明:

-e for encode, 编码;
-d for decode, 解码;
这两个参数都是我自己在run.py程序中定义的，你要是愿意当然也可以自己改。

4. 退出环境

(venv) xdl@MacBook-Air ~/Documents/github/sqids % deactivate
xdl@MacBook-Air ~/Documents/github/sqids %

笔记 - Python

创建python虚拟环境

起因

解决办法

1. pipx（适合安装 Python 命令行工具）

2. venv（适合管理项目级 Python 环境）

3. pipx vs venv 对比

4. 如何选择？

实例 1：获取单词音频

1. 文件结构

2. 文件内容

3. 执行过程

4. 执行结果

实例 2：从数字生成短的唯一标识符

1. 文件结构

2. 文件内容

3. 执行程序

4. 退出环境

1. `pipx`（适合安装 Python 命令行工具）

2. `venv`（适合管理项目级 Python 环境）

3. `pipx` vs `venv` 对比