ローカルLLM Qwen 3 x MCPを使った業務改善の可能性

こんにちは、ウォンテッドリーでデータサイエンティストをしている林 (@python_walker) です。

ウォンテッドリーでは、プロダクトや業務に生成AIを活用する取り組みを積極的に行なっています。この記事では、私が個人的にQwen 3を使っていくつかの項目について検証を行なった結果ついて共有できればと思っています。

背景
この記事で紹介すること・しないこと
検証環境
検証
検証項目
検証：一つのアクションを意図通りに実行させることができるか
検証：複数のアクションを必要とする指示を実行させることができるか
検証：業務で必要になりそうな操作でテスト
まとめ

背景

LLMが盛り上がってきて久しいですが、ChatGPTやClaude、Geminiのようなサービスに加え、最近では手元のPC上で動作させることができる言語モデル、いわゆる "ローカルLLM" というワードをよく聞くようになってきた気がします。これはローカルLLM自体の性能向上によるものももちろん大きいですが、AppleによるMLXの整備や、とてつもない容量とバンド幅をもつMac Studioの登場、一般向けのGPUの性能向上など、環境面での発展も大きいでしょう。

ローカルLLMの魅力といえば、入力データに対する制約がオンラインでのサービスと比べたときに圧倒的に少ないことだと思います。ChatGPTなどのオンラインサービスは便利ですが、特に業務で利用する際には機密情報だったり、ユーザーの個人情報を入力することに対して強い制約がかかります。一方でローカルLLMは入力に対する処理が手元のPC内で完結しますので、こういった制約は相対的に弱くなります。

これまでローカルLLMはオンラインサービスと比較するとどうしても性能的に見劣りしており、あまり広く利用されてこなかったような状況となっていましたが、最近ではいくつかのオンラインモデルを越えるモデルも出現しはじめており、一気に注目度があがってきたように思います。特に最近 Alibaba から発表された Qwen 3 はその性能の高さから大きな注目を集めていました。

私もこれまで Gemma や Phi などのモデルは少し触れるようなことはしてきたのですが、あまりきちんと検証してみるといったことはしてきませんでした。今回 Qwen 3 を軽く試してみて、「これは使えるかもしれない」と感じて色々自分で試すということをしていたので、せっかくならその結果を共有できればと思い、記事を執筆しました。特に、Qwen 3は最近話題のMCPのサポートが強化されているということが公式ページで宣伝されているので、この内容について検証結果を共有します。

Qwen3: Think Deeper, Act Faster

QWEN CHAT GitHub Hugging Face ModelScope Kaggle DEMO DISCORD Introduction Today, we are excited to announce the release of Qwen3, the latest addition to the Qwen family of large language models. Our flagship model, Qwen3-235B-A22B, achieves competitive re

https://qwenlm.github.io/blog/qwen3/

この記事で紹介すること・しないこと

私がこの記事を執筆している時点で Qwen 3 は発表されてから少し時間が経っていますので、コードを書かせたりベンチマークを取ったりと、様々な素晴らしい記事が公開されています。ですので、この記事ではこのような内容にはあまり触れず、Qwen 3 の "Tool Use" に関する性能を中心に、定性ベースで紹介できればと思っています。Tool は MCP (Model Context Protocol) を通じて利用することを想定します。

検証環境

これからご紹介する検証結果は、以下のような環境で行なったものです。

MacBook Pro (M3 Max; RAM 36 GB)
Python 3.12.7
モデル：mlx-community/Qwen3-30B-A3B-4bit
主要ライブラリ
- mlx-lm 0.24.0
- transformers 4.51.3
- mcp 1.7.1

検証に利用したソースコードは以下で確認できます

GitHub - Hayashi-Yudai/qwen3_tool_user_test_samples

Contribute to Hayashi-Yudai/qwen3_tool_user_test_samples development by creating an account on GitHub.

https://github.com/Hayashi-Yudai/qwen3_tool_user_test_samples

検証

検証項目

一つのアクションを意図通りに実行させることができるか
複数のアクションを必要とする指示を実行させることができるか
業務で必要になりそうな操作でテスト

検証：一つのアクションを意図通りに実行させることができるか

検証用に今の時間を返すだけの簡単なツールを用意します。

@mcp.tool()
def current_datetime() -> str:
    """Get the current date and time."""
    from datetime import datetime

    now = datetime.now()
    return now.strftime("%Y-%m-%d %H:%M:%S")

これを Qwen につなぎこみます。ここら辺はよくあるMCP Clientの書き方とあまり変わりありません。

async with AsyncExitStack() as exit_stack:
    r, w = await exit_stack.enter_async_context(
        stdio_client(StdioServerParameters(**config["local_server"]))
    )

    session = await exit_stack.enter_async_context(ClientSession(r, w))
    await session.initialize()

    tools = await session.list_tools()

    client, tokenizer = load("mlx-community/Qwen3-30B-A3B-4bit")
    mlx_tools = [
        {
            "type": "function",
            "function": {
                "name": tool.name,
                "description": tool.description,
                "parameters": tool.inputSchema,
            },
        }
        for tool in tools.tools
    ]

    messages = [
        {
            "role": "user",
            "content": "今日の日付を教えて",
        }
    ]
    request = tokenizer.apply_chat_template(
        messages,
        tools=mlx_tools,
        add_generation_prompt=True,
    )
    output = stream_generate(client, tokenizer, prompt=request, max_tokens=4096)

ここからが他のLLMとは異なる部分なのですが、Qwen 3は "Tool Call" を応答文字列の中に混ぜて送りかえしてきます。応答の形式としては以下のような形です。

<think>
/* 推論をONにしているときには、ここに推論の過程が入ってきます */
</think>

... /* ここが人間向け(?)の回答です */

<tool_call>
/* tool call がある場合には、ここにJSON形式で値が入ります */
</tool_call>

なので、Tool Call を文字列の中から拾いあげて実行してやる必要があります。

all_text = ""
for chunk in output:
    all_text += chunk.text
    print(chunk.text, end="", flush=True)
print("")

match_tool = re.search(r"<tool_call>(.*?)</tool_call>", all_text, re.DOTALL)
if match_tool:
    json_str = match_tool.group(1)
    data = json.loads(json_str)
    result = await session.call_tool(
        data["name"],
        data["arguments"],
    )

    # ドキュメント的にはこの形式でメッセージを書くべきなのだが、なぜかエラーが出る
    # message = {
    #     "role": "assistant",
    #     "tool_calls": [
    #         {
    #             "type": "function",
    #             "function": {
    #                 "name": data["name"],
    #                 "arguments": data["arguments"],
    #             },
    #         }
    #     ],
    # }
    # なのでかわりに普通にAssistantからのメッセージ(`all_text`)を全部いれる
    message = {
        "role": "assistant",
        "content": all_text,
    }
    messages.append(message)
    messages.append(
        {
            "role": "tool",
            "name": data["name"],
            "content": result.content[0].text,
        }
    )

あとは普通のFunction Callingのように、再度LLMにメッセージを送りかえすだけです。

長々とコードの説明をしてしまいましたが、ここから検証の結果です。上に書いたプロンプトで実行した結果は以下です。

<think>
Okay, the user is asking for today's date. Let me check the tools provided. There's a function called current_datetime that returns the current date and time. Since the user wants the date, I should call that function. The function doesn't require any parameters, so I just need to invoke it. I'll make sure to structure the tool call correctly within the XML tags.
</think>

<tool_call>
{"name": "current_datetime", "arguments": {}}
</tool_call>
/* ここまでが1回目のリクエストの応答です */                                                                                                             server.py:534
<think>
Okay, the user asked for today's date, and I called the current_datetime function. The response came back as 2025-05-04 11:22:37. So the date is May 4, 2025. I should present that in a natural way. Let me check if the user needs the time as well, but since they only asked for the date, maybe just the date is enough. I'll format it as "Today's date is 2025-05-04." That's clear and straightforward. No need for extra information unless they ask for the time too.
</think>

Today's date is 2025-05-04.

ちゃんとツールを使って正しい応答を返してくれています (なぜか英語になってますが)。次に、もう少しプロンプトを間接的にしてみて、それでもツールを使ってくれるかテストしてみます。

/* プロンプト：今の東京の季節は? */
<think>
Okay, the user is asking about the current season in Tokyo. I need to figure out the season based on the current date. But wait, I don't have a function to get the current date. Wait, there's a function called current_datetime. Oh, right, that's provided. So I should call current_datetime to get the current date and time, then determine the season from that.

But the user is asking for the season, which is determined by the month. In the Northern Hemisphere, seasons are: Spring (March-May), Summer (June-August), Fall (September-November), Winter (December-February). So once I have the current month from the datetime, I can check which season it is.

So first, I need to call the current_datetime function. Then, parse the date to get the month. Then map that to the season. Let me make sure the function is called correctly. The function has no parameters, so the tool call is straightforward. Then, after getting the date, process it to find the season. Alright, let's proceed with that.
</think>

<tool_call>
{"name": "current_datetime", "arguments": {}}
</tool_call>
                                                                                                              server.py:534
<think>
Okay, the user asked about the current season in Tokyo. I called the current_datetime function and got the date 2025-05-03. Now, I need to determine the season based on that date.

First, check the month. May is part of spring in the Northern Hemisphere. The seasons are typically: March to May is spring, June to August is summer, September to November is fall, and December to February is winter. Since May is within the spring months, the answer should be spring. 

I should confirm that the date is in the correct hemisphere. Since Tokyo is in the Northern Hemisphere, the standard seasons apply. So, the current season in Tokyo is spring. I'll present that to the user clearly.
</think>

現在の東京の季節は**春**です。  
5月は春の期間（3月～5月）に該当するため、花見や桜の季節ですね。🌸

/* プロンプト：クリスマスまであとどれくらい？ */
<think>
Okay, the user is asking "クリスマスまであとどれくらい？" which means "How much time is left until Christmas?" I need to figure out the current date and time to calculate the remaining time until December 25th.

First, I should check if there's a function available to get the current datetime. Looking at the tools provided, there's a function called current_datetime with no parameters. So I can call that to get the current date and time.

Once I have the current datetime, I can calculate the difference between now and December 25th of the current year. If today is after December 25th, then the next Christmas would be next year. But since the user is asking about "クリスマス" which is the holiday, it's possible they mean the upcoming one. But I should handle both cases.

Wait, but the function current_datetime doesn't take any arguments, so I just need to call it. Then, after getting the current time, I can compute the remaining time. But since the user might be asking for the time left until the next Christmas, I need to check if the current date is before or after December 25th. If it's after, then the remaining time would be until next year's December 25th.

But since I can't perform calculations here, I need to call the current_datetime function first. Then, once I have the current date, I can determine the remaining time. However, the user's question is in Japanese, so the answer should be in Japanese as well. But the assistant's response should be in the same language as the user's query, which is Japanese here.

So the steps are: call current_datetime, then calculate the time left until Christmas based on the current date. But since I can't do the calculation myself, I need to inform the user that I need to get the current time first. Wait, but the user is expecting an answer. However, according to the tools provided, I can only use the current_datetime function. So I need to call that function first, then use the result to compute the remaining time. But since the assistant is supposed to generate the tool call, I should output the tool call for current_datetime.

So the correct action here is to call the current_datetime function. Then, once I have the current date, I can calculate the remaining time until Christmas. But since the user's question is about the time left, the assistant needs to first get the current datetime. Therefore, the tool call is necessary here.
</think>

<tool_call>
{"name": "current_datetime", "arguments": {}}
</tool_call>
                                                                                                             server.py:534
<think>
Okay, the user asked how much time is left until Christmas. I called the current_datetime function and got the current date as 2025-05-03. Now, I need to calculate the remaining time until December 25th, 2025.

First, I'll check if today's date is before December 25th. Since May 3rd is before December 25th, the next Christmas is in 2025. 

Next, I'll calculate the difference between December 25th, 2025, and May 3rd, 2025. Let's see: from May 3rd to December 25th is 7 months and 22 days. Converting that into days, months, and hours. But the user might want a more precise answer in days, hours, etc.

Wait, but the exact calculation would be: from May 3, 2025, to December 25, 2025. Let's count the days. May has 31 days, so from May 3 to May 31 is 28 days. June has 30, July 31, August 31, September 30, October 31, November 30, and December 1 to 25 is 25 days. Adding those up: 28 (May) + 30 (June) + 31 (July) + 31 (Aug) + 30 (Sep) + 31 (Oct) + 30 (Nov) + 25 (Dec) = 28+30=58, +31=89, +31=120, +30=150, +31=181, +30=211, +25=236 days. So 236 days left.

But I should present this in a user-friendly way, maybe stating the number of days, months, and days. Alternatively, just the total days. The user might want a simple answer. So the response would be that there are 236 days left until Christmas 2025.

I should also mention that if the current date is after December 25th, the next Christmas would be in the next year, but since May is before December, it's straightforward.

So the final answer is 236 days until Christmas.
</think>

2025年12月25日までの残り日数は**236日**です。  
現在の日付（2025年5月3日）から計算しています。🎄

/* プロンプト：5月は何日まである? */
<think>
Okay, the user is asking "5月は何日まである?" which translates to "How many days are there in May?" I need to figure out the number of days in May. May is the fifth month of the year. Normally, months have either 30 or 31 days. April has 30, May has 31. Wait, right, May has 31 days. But wait, maybe the user is asking about the current year's May, but since the question is general, it's probably asking for the standard number of days. However, maybe they want to know for this year specifically. But the function provided is current_datetime, which gets the current date and time. But the function doesn't take any parameters. However, the user's question is about the number of days in May, which is a fixed number. Unless they're asking for the remaining days in the current May. Wait, if today is in May, then the answer would be 31 minus the current day. But without knowing the current date, I can't calculate that. But the function current_datetime can provide the current date. So maybe I need to call current_datetime to get the current day and month, then calculate the remaining days in May. But the function's parameters are empty, so I can't pass anything. So first, I need to call current_datetime to get the current date. Then, based on that, determine how many days are left in May. But since the user's question is "5月は何日まである?", which is "How many days does May have?", it's possible they just want the total days in May, which is 31. But maybe they mean how many days are left in the current month. If the current date is in May, then the answer would be 31 minus the current day. But if the current date is not in May, then maybe the answer is 31. But without knowing the current date, I need to use the function to get the current date. So first, I should call current_datetime to get the current date and time. Then, check if the current month is May. If it is, calculate the remaining days. If not, then the answer is 31. But since the user is asking in Japanese, maybe they are asking for the total days in May, which is 31. But maybe they want the remaining days. But without the current date, I can't know. So the correct approach is to call the current_datetime function to get the current date, then process it. So the first step is to call current_datetime. Then, once I have the current date, I can determine the answer. So I need to make a tool call for current_datetime.
</think>

<tool_call>
{"name": "current_datetime", "arguments": {}}
</tool_call>
                                                                                                              server.py:534
<think>
Okay, the user asked "5月は何日まである?" which means "How many days are there in May?" I called the current_datetime function and got the response "2025-05-03 16:01:06". So the current date is May 3, 2025. Since May has 31 days, the remaining days would be 31 - 3 = 28. But I should check if the user wants the total days in May (which is 31) or the remaining days from the current date. The question is a bit ambiguous. However, since the current date is in May, it's possible they want the remaining days. But maybe they just want to know the total. To be safe, I'll explain both. But the function response gives the current date, so I should calculate the remaining days. So 31 - 3 = 28. But I need to confirm if the user wants that. Alternatively, maybe they just want to know that May has 31 days. But since the current date is provided, the answer should be 28 days left. I'll go with that.
</think>

5月は31日まであるため、現在の日付（2025年5月3日）から計算すると、残りは28日です。  
つまり、5月3日現在で、あと28日間あります。

全てのケースでツールを使っています。直接的にツールを使ってほしいという感じを出さなくても、必要に応じて使ってくれる柔軟性を持っているようです。ただ、最後の例だけはツールを使わないことをテストしたかったのですが、利用して回答を生成しています。私が求めた回答に加えて、あと何日5月が残っているかを教えてくれるために使ったようです。人間の場合にはよく「ハンマーを持つとすべてが釘に見える」ということが言われますが、LLMの場合にも同じような傾向があるのでしょうか?

総じて、Qwen 3 はツールを上手く利用してくれることがわかりました。

検証：複数のアクションを必要とする指示を実行させることができるか

上で検証した内容は、1回ツールを利用すれば求めている回答を作成することができましたが、複数回のツール利用をしなければいけないケースはどうでしょうか?ここでは、以下の2つのツールを利用して求めることを実現できるか検証しました

current_datetime: 現在時刻を取得するツール
filesystem: ローカルにあるファイルを操作するツール

少し長いですが、応答は以下のような感じになり、指示したとおりにファイルを作成してくれました

/* プロンプト：~/Desktopにtest.txtというファイルを作成したあと、その中に今日の日付を記入してください。作業が終わった後はチェックも忘れずにしてください。 */

<think>
Okay, let's break down the user's request. They want to create a file called test.txt on their Desktop and then write today's date into it. After that, they want a check to confirm it's done.

First, I need to figure out the correct paths. The user mentioned "~/Desktop", which in Linux is the Desktop directory. But since the functions available don't have a specific one for the desktop, I'll assume that the path is something like "/home/user/Desktop" or similar. However, the exact path might depend on the system, but the user used the tilde, so maybe the function can handle that. But looking at the functions, the filesystem__write_file requires a path. So I'll use the provided path as is, but maybe the actual path is "~/Desktop/test.txt". Wait, but the user said "作成したあと" which means after creating, so first create the file, then write the date.

Wait, the functions available are: create_file (filesystem__write_file?), but the functions listed include filesystem__write_file which creates a new file or overwrites. So first, I need to create the file. But does the user want to create it first, then write the date? Or is writing the file the same as creating it? Because filesystem__write_file will create the file if it doesn't exist. So maybe I can just write the date directly.

But the user said "作成したあと", which is "after creating". So maybe they want to first create the file (using create_file function?), but looking at the functions, there's no create_file. The available functions are: filesystem__write_file, which can create a new file. So the first step is to use filesystem__write_file with the path "~/Desktop/test.txt" and content as the current date. But wait, the user wants to create the file first, then write the date. But if the file doesn't exist, writing will create it. So maybe the steps are: 1. Use filesystem__write_file to create the file with the current date. Then, maybe check if it's done. But the user also says "作業が終わった後はチェックも忘れずにしてください" which means after finishing the work, don't forget to check. So after writing, check if the file exists and has the date.

But how to check? The user might want to read the file to confirm. So first, write the file, then read it to check. But the functions available include filesystem__read_file, which can read the file's content. So the steps would be:

1. Get the current date using local_server__current_datetime.
2. Use filesystem__write_file to create the file at ~/Desktop/test.txt with the current date as content.
3. Use filesystem__read_file to read the file and confirm the content.

But the user's instruction is to create the file and then write the date. But writing the date would be done via the write_file function. So maybe the steps are:

- Call filesystem__write_file with path "~/Desktop/test.txt" and content as the current date.

But first, need to get the current date. So first, call local_server__current_datetime to get the date, then use that in the write_file.

Then, after writing, maybe check by reading the file. So call filesystem__read_file with the same path to verify.

So the tool calls would be:

1. Call local_server__current_datetime to get the current date.
2. Use the date from step 1 to call filesystem__write_file with the path and content.
3. Call filesystem__read_file to check the content.

But the user might not need the check, but they said to do the check after finishing. So the assistant should perform all steps.

But the user's message is in Japanese, so the path might be different. But the functions are based on the system's allowed directories. The user's Desktop path might be accessible. But the functions like filesystem__write_file require the path to be within allowed directories. However, the user didn't mention any restrictions, so assuming that ~/Desktop is allowed.

So the plan is:

- Get current date.
- Write to the file.
- Read the file to check.

But the assistant needs to make the tool calls in order. So first, call local_server__current_datetime, then use that result to write the file, then read it.

But the assistant's response should be the tool calls. So first, call local_server__current_datetime, then filesystem__write_file, then filesystem__read_file.

But the user might not have specified the exact path. The user said "~/Desktop", which is a home directory path. But the actual path might be something like "/home/user/Desktop/test.txt". However, the assistant should use the path as given by the user, which is "~/Desktop/test.txt".

So the first tool call is to get the current date. Then, use that date to write the file. Then read the file to check.

So the steps in tool calls:

1. Call local_server__current_datetime.
2. Call filesystem__write_file with path "~/Desktop/test.txt" and content as the date obtained.
3. Call filesystem__read_file with the same path to verify.

But the assistant needs to generate the tool calls in order. However, the assistant can't execute the functions, just call them. So the first tool call is local_server__current_datetime, then the write, then the read.

But the user's instruction is to create the file and then write the date. But writing the date would be the same as creating the file. So maybe the assistant can directly write the date into the file, but first, get the date.

So the assistant's response would be the three tool calls. But the user might expect that the assistant does all steps. However, the assistant can only make one tool call per response. But in this case, the user's message is a single query, so the assistant needs to plan the steps.

But according to the instructions, for each function call, return a json object. So the assistant would first call local_server__current_datetime, then after getting the date, call write_file, then read_file.

But since the assistant is generating the tool calls in the response, it needs to output them in order. However, the assistant can't execute them step by step, so it's possible that the assistant would generate all necessary tool calls in one go, but according to the problem statement, the assistant should return one tool call per response. But since this is a single query, the assistant would need to plan all steps.

But the user's message is a single instruction, so the assistant would generate the necessary tool calls in sequence. However, the assistant's response here is to generate the first tool call, then wait for the response, then proceed. But in this case, the user is expecting the assistant to handle it step by step.

But since the user is asking for the entire process, the assistant would need to first get the current date, then write the file, then read it. So the first tool call is local_server__current_datetime. Then, once that's done, the assistant would use the date to write the file. Then, read the file to check.

But in the context of this problem, the assistant is to generate the tool calls in the XML tags. So the first step is to call local_server__current_datetime. Then, after that, the assistant would proceed to write the file with the obtained date. Then read it.

But since the user's message is a single query, the assistant would need to generate all necessary tool calls. However, the assistant can only make one tool call per response. But in this case, the assistant is to generate the first tool call, which is to get the current date.

So the first tool call is local_server__current_datetime. Then, the assistant would receive the response, then proceed to write the file, then read it.

But in the context of this problem, the assistant is to generate the tool calls as per the user's request. So the answer would be the three tool calls in sequence. However, the assistant can only output one tool call at a time. But since the user's message is a single query, the assistant would need to generate all necessary steps.

But according to the problem's instructions, the assistant should return the tool calls in XML tags. So the first step is to call local_server__current_datetime. Then, after that, the assistant would use the date to write the file. Then, read the file to check.

But the user's instruction is to create the file and write the date, then check. So the assistant's response would be the three tool calls. However, the assistant can't do that in one response. So the first tool call is to get the current date. Then, the next step would be to write the file with that date. Then, read the file to check.

But since the user is expecting the assistant to handle it, the assistant would generate the first tool call, then the next, etc. But in this case, the user's message is a single query, so the assistant would need to plan all steps.

But the problem is that the assistant is to generate the tool calls in the XML tags. So the first tool call is local_server__current_datetime. Then, after that, the assistant would generate the write_file call, then the read_file call.

But the user's message is in Japanese, so the path is "~/Desktop/test.txt". The assistant should use that path.

So the first tool call is:

{
  "name": "local_server__current_datetime",
  "arguments": {}
}

Then, once the current date is obtained, the next tool call is:

{
  "name": "filesystem__write_file",
  "arguments": {
    "path": "~/Desktop/test.txt",
    "content": "2023-10-05" (assuming the date is obtained)
  }
}

Then, after writing, the assistant would call:

{
  "name": "filesystem__read_file",
  "arguments": {
    "path": "~/Desktop/test.txt"
  }
}

But since the assistant can't execute these steps in one go, the first step is to get the current date. So the answer would be the first tool call.
</think>

<tool_call>
{"name": "local_server__current_datetime", "arguments": {}}
</tool_call>
                                                                                                             server.py:534
<think>
Okay, the user wanted to create a file named test.txt on their Desktop and write today's date into it, then check the work. First, I called the local_server__current_datetime function to get the current date, which returned 2025-05-03 17:58:09. Now, I need to create the file and write the date.

The file path is ~/Desktop/test.txt. Using the filesystem__write_file function makes sense here. I'll set the path to "~/Desktop/test.txt" and the content to the current date string. Then, after writing, I should verify by reading the file with filesystem__read_file to ensure the date is there. Let's proceed with writing the file first.
</think>

<tool_call>
{"name": "filesystem__write_file", "arguments": {"path": "~/Desktop/test.txt", "content": "2025-05-03 17:58:09"}}
</tool_call>

<think>
Okay, the user wanted to create a file on their Desktop and write the current date into it, then check. I first got the current date using the local_server__current_datetime function, which gave "2025-05-03 17:58:09". Then, I used the filesystem__write_file function to create the file at "~/Desktop/test.txt" with that content. The response said it was successful.

Now, the next step is to check if the file was created and contains the correct date. The user mentioned "チェックも忘れずにしてください" which means to don't forget to check. So I should read the file to confirm the content. The appropriate tool here is filesystem__read_file with the same path. That way, I can verify the content matches the current date. Let's call that function now.
</think>

<tool_call>
{"name": "filesystem__read_file", "arguments": {"path": "~/Desktop/test.txt"}}
</tool_call>
<think>
Okay, let's wrap this up. The user wanted to create a file on their Desktop, write the current date, and check it. I got the date, wrote it to the file, and confirmed the content. Now, I should inform them that the task is done and the file is verified. Just a simple confirmation should do.
</think>

推論の中身を見てみると、最初にどのツールをどの順番で使うか計画を立てたあとに、それを順番に実行していることがわかります。優秀ですね。

しかし、試してみた実感でいうと結構工夫は必要そうという感じでした。プロンプトの中に「作業が終わった後はチェックも忘れずにしてください」という文言があることが気になった方がいるかもしれませんが、これを入れないと current_datetime と write_file を同時に実行して変な日付を記入して終ってしまうという挙動をしていました。gpt-4oではただ「~/Desktopにtest.txtというファイルを作成したあと、その中に今日の日付を記入してください。」だけで完璧にこなしてくれたので、gpt-4oよりかは能力が低そうという感覚です。公式のベンチマークでは 30B-A3B は gpt-4o よりスコアが上なので、この結果は 4bit の量子化の影響か、タスクの特性かもしれません。

検証：業務で必要になりそうな操作でテスト

最後に、もう少し業務に近いタスクをやらせてみます。具体的には、私が個人的に管理しているリポジトリのREADMEをアップデートできるか試してみます。使うMCP Serverは github-mcp-server です。

GitHub - github/github-mcp-server: GitHub's official MCP Server

GitHub's official MCP Server. Contribute to github/github-mcp-server development by creating an account on GitHub.

https://github.com/github/github-mcp-server

以下のようなプロンプトで動作させてみます。

あなたにはこれから、GitHubリポジトリの修正を行ってもらいます。修正の手順は以下の通りです。

1. 作業用に新しいブランチを作成します。
2. 必要最小限の変更を加えます。
3. 修正をコミットします。
4. プルリクエストを作成します。
5. プルリクエストを作成したことを私に伝え、レビューを依頼します。

それでは、ここから具体的なタスクについての説明に入ります。

あなたにやってもらうのは、https://github.com/Hayashi-Yudai/aichat リポジトリのREADMEの修正です。

このリポジトリで開発しているツールでは、mlx-communityのQwen3-30B-A3B-4bitを使用することができますが、対応しているLLMのリストに入れ忘れています。これを修正してください。
あたなは対話的にこの修正を進めていくことができます。適宜利用するツールを呼び出して、少しずつ進めていきましょう。また、タスクではこの修正で必要な最小限の変更を加えることに集中し、ほかの修正は行わないようにしてください。

かなり具体的にタスクの進めかたを指示していますが、最後までタスクを完了させることはできませんでした。1のブランチを作成する部分と2の途中のファイルを読んで変更点を洗い出すところまではできるのですが、ファイルを変更する部分が上手くいっていないようでした。推論過程を見てみると、"the plan is to append "Qwen3-30B-A3B-4bit" to the list of supported models in the README. Then, commit the change and create a PR." という文字列はあるので、やる必要があることはわかっていそうだが、それを書き込むときに、まるっと全体を置き換えようとし、そのリクエストをbase64で送ろうとしてエンコード文字列が途中から変になっているという感じでした。

まとめ

ここまで、Qwen 3のツールを利用したタスク処理性能に焦点を当てて検証結果をご紹介してきました。実際に手元で検証を進めてきて、まだChatGPTやGeminiといったLLMとの差はまだまだ大きく、よく宣伝文句に使われている「gpt-4o以上」といった文言通りとはいかないように感じました。

一方で、ローカルLLMの性能は少し前と比較しても飛躍的に向上しているというのも、また検証を通じて実感したことです。本当に簡単なタスクであればQwen 3はこなすこともできますし、今回はあまり触れませんでしたが、推論モードでの回答は品質が良いと感じます。例えばですが、以下のようなタスクであれば、 Qwen 3を使えば情報を外部に送るということを比較的気にせずに実現できるのではないでしょうか? (もちろんローカルLLMといえど、MCPなどで外部との接点を作る場合には情報流出のリスクは十分に考慮すべきですが)