Part IV — The Ecosystem

The Communities

arXiv categories, conference circuits, and how to read the research landscape without drowning in it.

Why the Map Matters

AI is not one field. It's a loose federation of communities, each with its own traditions, publication venues, preferred problems, and cultural norms. What gets called "AI" in the press often comes from just one of these communities — machine learning — and the conflation creates confusion for anyone trying to orient themselves.

Understanding the boundaries matters for practical reasons. When you submit a paper, you're submitting it to a specific community. The reviewers come from that community. They have expectations about methodology, baselines, notation, and what counts as a contribution. A paper that would be well-received at an information retrieval conference might be rejected at a machine learning venue — not because it's wrong, but because it's solving the wrong kind of problem by their standards.

This chapter maps the major AI research communities, where they publish, what they care about, and how they relate to each other. Think of it as the political geography of the field.

The arXiv Taxonomy

arXiv — the preprint server hosted by Cornell University — is where most AI research appears first, often months before formal publication. It's not peer-reviewed, which means anyone can post anything, but it's the de facto venue for establishing priority and circulating ideas. Understanding its category system is the first step to navigating the field.

The computer science categories relevant to AI:

arXiv Category Full Name What It Covers
cs.AI Artificial Intelligence General AI: planning, knowledge representation, reasoning, multi-agent systems. The "classical AI" bucket.
cs.LG Machine Learning Learning algorithms, theory, optimization. Where most deep learning papers land. (Also listed as cs.ML in some references; cs.LG is the arXiv code.)
cs.CL Computation and Language Natural language processing: language models, translation, parsing, text generation, dialogue.
cs.CV Computer Vision Image recognition, object detection, segmentation, generative models for images/video.
cs.IR Information Retrieval Search engines, recommendation systems, ranking algorithms, document retrieval. Where retrieval-augmented generation (RAG) work often goes.
cs.RO Robotics Robot learning, control, manipulation, sim-to-real transfer.
stat.ML Machine Learning (Statistics) ML from the statistics side: theory, probabilistic models, Bayesian methods. More mathematical rigor.

A paper can be cross-listed on multiple categories. The "Attention Is All You Need" transformer paper, for instance, appeared under cs.CL (it was a language architecture) but could just as easily have gone under cs.LG (it's a general learning architecture). The primary category signals which community the authors consider their home.

Practical note: When you posted your paper on Thompson Sampling for retrieval weight optimization, CIKM (a cs.IR venue) was the right target. The problem is about ranking and retrieval, not about language modeling or vision. The arXiv category would be cs.IR, possibly cross-listed with cs.LG for the bandit algorithm component.

The Conference Circuit

Unlike most academic fields, AI publishes primarily at conferences rather than journals. A paper accepted at NeurIPS or ICML carries more weight than a journal publication in most AI subdisciplines. This is partly historical (the field moves too fast for the 6-18 month journal review cycle) and partly cultural (conferences provide immediate peer interaction and visibility).

The major venues, grouped by community:

Machine Learning

Natural Language Processing

Information Retrieval

General AI

Computer Vision

The AI Research Communities Major venues and their relationships Machine Learning NeurIPS, ICML, ICLR cs.LG / stat.ML NLP ACL, EMNLP, NAACL cs.CL Vision CVPR, ICCV, ECCV cs.CV Info Retrieval SIGIR, CIKM, WSDM cs.IR General AI AAAI, IJCAI cs.AI Your paper: Thompson Sampling cs.IR (retrieval) + cs.LG (bandits) Lines show methodological overlap. ML methods now penetrate every community.

How the Publication System Works

The AI publication pipeline has three stages, and understanding them matters for anyone trying to participate:

1. The arXiv Preprint

Most research hits arXiv first. There's no peer review — a moderator checks that it's not spam or obviously unscientific, but the bar is low. arXiv establishes priority (you posted it first) and lets the community see your work immediately. The downside: there's no quality filter, so arXiv is full of everything from landmark papers to badly-written hobby projects. Learning to filter is a survival skill.

arXiv requires either an institutional affiliation or an endorsement from someone who has already published in the relevant category. This is a real gatekeeping mechanism for independent researchers.1

2. Conference Submission and Peer Review

The real quality gate is conference peer review. You submit a paper (typically 8-10 pages in a specific format), and 3-4 reviewers evaluate it. The review criteria vary by venue but generally include novelty, technical soundness, significance, clarity, and experimental rigor.

Acceptance rates at top venues: NeurIPS ~26%, ICML ~25%, ICLR ~32%, ACL ~25%, SIGIR ~20%. These numbers mean that even good work gets rejected frequently — the system is noisy, and reviewers disagree with each other more than anyone would like. A common pattern is to submit, get rejected with feedback, improve the paper, and resubmit to the same or a different venue.

One important cultural note: the review process is double-blind at most venues. The reviewers don't know who wrote the paper, and the authors don't know who reviewed it. This is supposed to prevent bias, though in practice, well-known groups are often identifiable from their writing style, datasets, and prior work.

3. Conference Proceedings

Accepted papers are published in the conference proceedings, which serves as the formal, archived publication. Major ML conferences publish through organizations like PMLR (Proceedings of Machine Learning Research) or through ACM/IEEE for IR and general AI venues. Conference proceedings are indexed by Google Scholar, Semantic Scholar, and other academic search engines.

Journals

Journals exist in AI — JMLR (Journal of Machine Learning Research), TMLR (Transactions on Machine Learning Research), TACL (Transactions of the ACL), AIJ (Artificial Intelligence Journal) — but they carry less prestige than top conferences for most ML work. TMLR, launched in 2022, is notable for using a rolling review process (no deadlines) and focusing on correctness over novelty, which theoretically makes it more accessible to non-traditional contributors. The trade-off is that "less emphasis on novelty" also means "less prestige per publication" in the eyes of hiring committees.2

What Each Community Cares About

The communities aren't just distinguished by topic. They have different values, different standards for what makes a good paper, and different intellectual styles.

Community Values Typical Paper Structure
ML (NeurIPS, ICML) Novelty, generality, theoretical analysis, ablation studies New method + math + experiments on benchmark datasets
NLP (ACL, EMNLP) Linguistic insight, benchmark performance, human evaluation Model + results on standard NLP benchmarks + analysis
IR (SIGIR, CIKM) Practical relevance, evaluation methodology, reproducibility Retrieval/ranking method + offline evaluation + sometimes user studies
Vision (CVPR) State-of-the-art performance, visual results, real-world applicability Architecture + quantitative results + qualitative examples
General AI (AAAI) Breadth, problem formulation, interdisciplinary connections Varies widely — from planning algorithms to cognitive architectures

These cultural differences create real friction when communities overlap. The recent explosion of LLM-based retrieval systems, for instance, sits at the intersection of cs.CL and cs.IR. NLP researchers tend to evaluate with language-centric metrics (perplexity, BLEU); IR researchers want ranking metrics (nDCG, MAP, recall@k). A paper that satisfies one community's standards may not satisfy the other's.

The Great Convergence (and Its Tensions)

Before ~2015, these communities were relatively separate. NLP had its methods (HMMs, CRFs, syntactic parsers), vision had its (HOG features, SVMs, edge detectors), IR had its (BM25, learning to rank, query expansion). Each community had developed specialized approaches over decades.

Then deep learning ate everything.

Transformers, originally an NLP architecture, now dominate vision (ViT), retrieval (dense retrieval, cross-encoders), speech (Whisper), protein folding (AlphaFold uses attention mechanisms), and even physics simulation. Machine learning methods — particularly deep learning — have become the shared substrate across all these communities.

This convergence has two consequences. First, it means ML venues (NeurIPS, ICML, ICLR) have become the most prestigious and competitive, because their methods apply everywhere. Second, it creates tension within the application communities. NLP researchers debate whether their field has become "just applying large language models." IR researchers argue about whether neural methods have actually improved search quality or just benchmark numbers. These are real methodological debates, not just turf wars.

Key idea: The communities haven't merged — they've layered. ML provides the methods; the application communities provide the problems, evaluation frameworks, and domain knowledge. A good paper at SIGIR uses ML methods but evaluates them the way IR demands: with retrieval-specific metrics, on retrieval-specific benchmarks, answering retrieval-specific research questions.

How to Read the Field

You can't read everything. The field produces thousands of papers per month. The skill isn't comprehensive reading — it's efficient filtering.

Tools

Strategy

For someone five weeks in, the approach that works is:

  1. Follow researchers, not venues. Identify 10-20 researchers whose work is relevant to your interests. Follow them on Twitter/X, Google Scholar alerts, or Semantic Scholar. When they publish, read the abstract. When the abstract is interesting, read the introduction and conclusion. Full reads are rare — maybe 5% of what you encounter.
  2. Read survey papers first. Before diving into a new area, find a recent survey. They compress years of work into one narrative. arXiv has them for nearly every topic.
  3. Use citation counts skeptically. High citations mean the paper is influential, not that it's correct. Some of the most-cited papers have known flaws. Low citations might mean the paper is new, niche, or ahead of its time.
  4. Track your home community. Since your work is in cs.IR, follow SIGIR, CIKM, and WSDM proceedings. Read their best paper awards. Understand their benchmarks. This gives you grounding so you're not just floating in the general ML discourse.

Where You Sit

Your paper on Thompson Sampling for retrieval weight optimization — adaptive bandit algorithms for tuning how much weight to give different retrieval strategies — sits at the intersection of cs.IR and cs.LG. The problem is information retrieval (how to rank and weight retrieval methods), the method is machine learning (multi-armed bandits). CIKM was a natural target because it spans both.

The desk rejection from TMLR was about execution (hallucinated references), not positioning. The positioning was sound: TMLR accepts work across ML subdisciplines, and a retrieval optimization paper with bandit methods fits. When you resubmit — to TMLR or elsewhere — the community map matters: you want reviewers who understand both bandit algorithms and retrieval evaluation. CIKM provides those reviewers naturally. A pure ML venue might not have IR expertise among its reviewers, and a pure IR venue might not appreciate the bandit theory.


The research communities are the social infrastructure of AI. They determine what gets built, what gets rewarded, and what gets ignored. Behind the communities are the organizations that employ most of the researchers and control most of the compute. That's the subject of the next chapter: the labs.